Quantium Data Analytics Job Simulation

Task 1

Exploratory data analysis on provided transaction data alongside cleaning.
- Removing digits and special characters from product names.
- Checked for nulls and possible outliers.
- Summarised numerical attributes of transaction data and filtered out outlier data.
- Charted number of transactions over time.
- Created histogram of Pack_SIZE values.
- Converted similar BRAND values to common values.
Exploratory data analysis on provided customer data alongside cleaning.
- Checked for nulls and possible outliers.
Performed data analysis on customer segments.
- Focused on the following metrics of interest:
  - Who spends the most on chips (total sales), describing customers by lifestage and how premium their general purchasing behaviour is.
  - How many customers are in each segment.
  - How many chips are bought per customer by segment - What’s the average chip price by customer segment.
- Created barplots of total sales grouped by each customer’s lifestage and customer status.
- Calculated and plotted average price per unit, grouped by lifestage and customer status.
- Calculated and plotted average number of units per customer, grouped by lifestage and customer status.
- Performed Welch two-sample t-test to determine if the difference in average price per unit between MIDAGE SINGLES/COUPLES and YOUNG SINGLES/COUPLES was statistically significant.
Further insights on specific customer segment LIFESTAGE == "YOUNG SINGLES/COUPLES" and PREMIUM_CUSTOMER == "Mainstream".
- Created bar plot of relative frequency of brands bought.
- Created density plot of pack sizes.

Task 2

For each store and month, calculated total sales, number of customers, transactions per customer, chips per customer and average price per unit.
calculateCorrelation() created to calculate correlation for a measure, looping through each control store.
calculateMagnitudeDistance() created to calculate a standardised magnitude distance for a measure, looping through each control store.
Identified benchmark stores for conducting uplift testing on trial store layouts.
- Created a combined score score_nSales composed of correlation and magnitude, with corr_weight to adjust the weights for correlation and magnitude in the calculation.
- Created the finalControlScore metric to combine scores across the drivers by simple average of total number of sales and number of customers.
- Visualised checks on trends based on the drivers.
- Assessed the differences between the performances of the control store and the trial store.
- Repeated the above 3 times total for stores 77, 86 and 88.
Results for trial stores 77 and 88 during the trial period show a significant difference in at least two of the three trial months but not for trial store 86.

Task 3

Produced a Powerpoint report that incorporated data visualizations, key insights, and recommendations, based on the previous tasks, to be provided to the Category Manager of chips.