Exploratory data analysis on provided transaction data alongside cleaning.
Removing digits and special characters from product names.
Checked for nulls and possible outliers.
Summarised numerical attributes of transaction data and filtered out outlier data.
Charted number of transactions over time.
Created histogram of Pack_SIZE values.
Converted similar BRAND values to common values.
Exploratory data analysis on provided customer data alongside cleaning.
Checked for nulls and possible outliers.
Performed data analysis on customer segments.
Focused on the following metrics of interest:
Who spends the most on chips (total sales), describing customers by lifestage and how premium their general purchasing behaviour is.
How many customers are in each segment.
How many chips are bought per customer by segment - What’s the average chip price by customer segment.
Created barplots of total sales grouped by each customer’s lifestage and customer status.
Calculated and plotted average price per unit, grouped by lifestage and customer status.
Calculated and plotted average number of units per customer, grouped by lifestage and customer status.
Performed Welch two-sample t-test to determine if the difference in average price per unit between MIDAGE SINGLES/COUPLES and YOUNG SINGLES/COUPLES was statistically significant.
Further insights on specific customer segment LIFESTAGE == "YOUNG SINGLES/COUPLES" and PREMIUM_CUSTOMER == "Mainstream".
Created bar plot of relative frequency of brands bought.
For each store and month, calculated total sales, number of customers, transactions per customer, chips per customer and average price per unit.
calculateCorrelation() created to calculate correlation for a measure, looping through each control store.
calculateMagnitudeDistance() created to calculate a standardised magnitude distance for a measure, looping through each control store.
Identified benchmark stores for conducting uplift testing on trial store layouts.
Created a combined score score_nSales composed of correlation and magnitude, with corr_weight to adjust the weights for correlation and magnitude in the calculation.
Created the finalControlScore metric to combine scores across the drivers by simple average of total number of sales and number of customers.
Visualised checks on trends based on the drivers.
Assessed the differences between the performances of the control store and the trial store.
Repeated the above 3 times total for stores 77, 86 and 88.
Results for trial stores 77 and 88 during the trial period show a significant difference in at least two of the three trial months but not for trial store 86.
Produced a Powerpoint report that incorporated data visualizations, key insights, and recommendations, based on the previous tasks, to be provided to the Category Manager of chips.