Task 1: Store Format for Existing Stores
- Objective: Determine the optimal number of store formats and segment the 85 current stores based on sales data.
- Data Sources: StoreSalesData.csv and StoreInformation.csv files.
- Analysis Steps:
1. Sum sales data by StoreID and Year using 2015 sales data.
2. Calculate the percentage sales per category per store (category sales as a percentage of total store sales).
3. Apply K-means clustering model to segment the stores.
- Submission:
- Optimal number of store formats: The optimal number will be determined based on the analysis results.
- Number of stores in each format: The segmentation results will provide this information.
- One way the clusters differ from one another: This will be identified based on the characteristics of each cluster.
- Tableau map: A map showing the location of existing stores, color-coded by cluster, and sized by total sales.
Task 2: Store Format for New Stores
- Objective: Predict the store format for each of the 10 new stores using demographic and socioeconomic characteristics.
- Data Source: StoreDemographicData.csv file.
- Analysis Steps:
1. Develop a model using decision tree, random forest, and boosted models.
2. Use a 20% validation sample with a random seed of 3 to compare the accuracy of the models.
3. Predict the store format for each new store based on the best-performing model.
- Submission:
- Methodology: Explain the methodology used (decision tree, random forest, boosted models) and provide the reasoning behind the choice.
- Three most important variables: Identify the top three variables that explain the relationship between demographic indicators and store formats, and include a visualization.
- Store format for each new store: Provide a data table showing the format assigned to each of the 10 new stores.
Task 3: Forecasting
- Objective: Prepare a monthly sales forecast for produce for the full year of 2016, considering both existing and new stores.
- Analysis Steps:
1. Aggregate produce sales across all existing stores by month and create a forecast.
2. Forecast produce sales for the average store in each segment for new stores.
3. Multiply the average store produce sales forecast by the number of new stores in each segment.
4. Sum the new stores' produce sales forecasts for each segment to get the forecast for all new stores.
5. Sum the forecasts of existing and new stores together for the total produce sales forecast.
- Submission:
- ETS or ARIMA models: Specify the ETS(a,m,n) or ARIMA(ar,i,ma) models used for each forecast and explain the decision process.
- Forecasts table and visualization: Provide a table with forecasts for existing and new stores and a visualization that includes historical data, forecasts for existing stores, and forecasts for new stores.
Note: The final capstone project requires the completion of three tasks involving store formats, segmentation, predictive modeling, and forecasting. The submission should include detailed explanations, tables, visualizations, and any relevant insights or findings from the analysis.