Presentation K Means Cluster Analysis In R Studio Assignment

Scenario

You are a business analyst for an organization that distributes pizza ingredients to various restaurants. These ingredients include dough, sauces, and toppings. Your organization wants to expand its customer base but does not know what kind of restaurant or which U.S. region to target. You have been given a Clean Customer Sales Data Set that is a representative sample of the sales at your customers’ restaurants for 2015 and 2016. This also contains household income data for the restaurant zip codes. You must analyze this data to help inform marketing decisions.

In previous modules, you have performed data wrangling, applied statistical tests, and generated data visualizations. In this assignment, you will continue your analysis and conduct a k-means cluster analysis on the given data. These tests can confirm business assumptions about what types of groups exist or identify unknown groups in data sets. You will use the Rattle package in R Studio within the VDI to complete this task.

Prompt

Up to this point, analysis of your customers by region has relied on U.S. Census regions to demarcate data groups. How accurate is this manner of defining your customers, though? You will perform a k-means cluster analysis using Rattle to investigate whether there’s a better, more optimal way of dividing up your U.S. customers than the four U.S. Census regions. Perform a k-means cluster analysis using Rattle and create a PowerPoint presentation describing the outcomes. You must include the relevant screenshots in your presentation.

In the presentation to your manager, you must address the following criteria:

Exploratory Data Analysis (2 Slides)
- Provide an overview of the summary statistics you ran.
  - Run summary statistics on the variables in the data set and provide simple summaries about the sample and the measures.
  - Determine the average check for each region (defined by the U.S. Census) and other subgroups of interest?
- Point out any outliers or missing data that will have an impact on your analysis and recommendations. Review the results for outliers and missing data.
- Explain the significance of these outliers and missing data towards solving the business problem.
- Include screenshots of the results of the summary statistics.
K-Means Cluster Analysis (2 Slides)
- Run a k-means cluster analysis to generate at least 10 clusters and find groups that are not explicitly labeled in the data set.
- Review the charts produced.
  - Include the “Sum of WithinSS Over Number of Clusters” chart and explain how your interpretation of it has influenced your recommendation for the optimal number of clusters for that variable.
- Explain your views on the optimal number of clusters for the variables in the data set.
- Visualize the generated clusters.
  - Select the Discriminant button to produce a visual plot of the clusters your analysis has generated.
- Include screenshots of the charts and clusters produced.
Diagnostics and Iterations (1 Slide)
- Use appropriate diagnostics to perform iterations on the variables.
  - Repeat the process until you have arrived at what you consider to be the optimal number of clusters.
  - Explain your rationale for your recommendation.
- Include a screenshot of the optimal number of clusters.

What to Submit

Submit a 5-slide PowerPoint presentation. Sources should be cited according to APA style. Consult the Shapiro Library APA Style Guide for more information on citations.