Advertising ROI Forecasting

ML Linear Regression

Project Outcomes:

We will train a Linear Regression model using advertising expenditure and sales data to identify which advertising channels provide the highest return on investment (ROI). Additionally, we aim to determine which channels exhibit the most predictable relationship between money spent and sales generated (forecasting reliability).

Linear Regression models the mathematical relationship between one or more features, and a continous output. The model will create a linear line that allows us to forecast feature-to-output relationships for input values not in our original data.

After feeding our data to the linear regression model, we can retrieve a scatterplot, trendline graph, and feature importance graph.

Scatterplots showing the relationship between advertising spend by channel and resulting sales.

The top row displays observed data, and the bottom row shows predictions generated by the linear regression model.

We can quickly see that TV advertising has a more clear positive correlation with sales compared to Radio and Newspaper.

According to our model, Radio has the strongest positive correlation between money spent on advertising and sales, followed by TV, and then Newspaper.

The trendline and importance graphs may make it seem like Radio is the clear-cut best advertising option, but there is a little more nuance to the situation than is immediately apparent. When retrieving the R2 score and MSE, we find out that while Radio has the highest ROI on average, TV advertising is by far the most consistent.

Conclusion

In terms of positive correlation between money invested and sales, advertising through Radio provides the highest ROI. However, the model's R2 score and MSE indicate that its predictions for TV advertising ROI are much more reliable than those for Radio and Newspaper.

This suggests that while Radio's ROI may be higher on average, the actual ROI seen is subject to quite a bit of variance and unpredictability. For this reason, TV advertising may be more practical for its reliability and comparatively robust ROI.

Overall

Radio: Unpredictable, highest ROI on average, viable but unreliable

TV: Predictable, 2nd highest ROI, most practical option, but not as high a "ceiling" as Radio advertising

Newspaper: Unpredictable, lowest ROI, least practical of the 3 advertising methods

MSE: The lower the number, the less variance between model predictions and true values (very far off calculations are weighed more heavily)

R2: Accuracy of the variance our model predicts. The closer to "1" the closer to 100% accurate. R2 of 1 means that the spread in the data is perfectly predicted

Notably, the statistics for TV show that its linear relationship with sales is much clearer and more reliable than those for Radio and Newspaper

Additional Experiment 1: Modeling Linear Regression for each Channel Individually

Interestingly, when compared to our first experiment, we can see that the model predicts the ROI for Newspaper to actually be higher than TV when advertising expenditure is between 0-400 (but outperformed by TV after that point). ROI for Radio is predicted to be much higher than our first experiment. However, it should still be noted that the R2 score and MSE remain unchanged when compared to our first experiment, so the previous assertion that Newspaper advertising is the least viable still holds true.

According to this model, interestingly, Radio ROI on average is so much higher than TV ROI that the unpredictability may be worth it.

Additional Experiment 2: Scaling Using StandardScaler

The feature importance graph shows that Radio advertising has the strongest influence on the model's sales predictions. This means that our model assigns Radio advertising spending as the strongest predictor for sales.

Using StandardScale allows us to scale our data so that the model more accurately detects which channels have the greatest impact on final sales.

ROI trendlines are very similar to those of our first experiment, with the major point of difference being the importance scores assigned to each channel. TV has by far the highest, while Radio sits much lower, and Newspaper doesn't have any importance at all.

This experiment better portrays which features are more reliable indicators of sales. While Radio has a higher potential ROI, TV is much more reliable, similar to the conclusion drawn in our first experiment.

Revised Conclusion:

As we initially thought, TV advertising is the most consistent and predictable, and therefore, the most viable under most circumstances, supported even better when taking into account our scaled importance scores from additional experiment 2. Radio advertising might yield a higher ROI, but is overall much less consistent and difficult to predict, so risk is also high. Newspaper advertising is mostly irrelevant. If a level of inconsistency and risk is deemed ok for an advertising campaign, going with Radio advertising will almost always yield better results.