Conducting Analytics On A Radio Chart Using Python and Z-Score Analysis

Code:

import numpy as np
import statsmodels.api as sm
import pandas as pd
import matplotlib.pyplot as plt

file_path = 'FinalProject.xlsx'
df = pd.read_excel(file_path)

df.columns = [col.strip().upper() for col in df.columns]
df['LW AUDIO'] = pd.to_numeric(df['LW AUDIO'], errors='coerce')
df['TW AUDIO'] = pd.to_numeric(df['TW AUDIO'], errors='coerce')
df['TW SPINS'] = pd.to_numeric(df['TW SPINS'], errors='coerce')

df = df.dropna(subset=['LW AUDIO', 'TW AUDIO', 'TW SPINS'])

X_audio = df['LW AUDIO']
y_audio = df['TW AUDIO']

X_audio = sm.add_constant(X_audio)

model_audio = sm.OLS(y_audio, X_audio).fit()

X_spins = df['TW SPINS']
y_spins = df['TW AUDIO']

X_spins = sm.add_constant(X_spins)
model_spins = sm.OLS(y_spins, X_spins).fit()

audio_summary = model_audio.summary()
spins_summary = model_spins.summary()

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.scatter(df['LW AUDIO'], df['TW AUDIO'], color='blue', label="Data points")
plt.plot(df['LW AUDIO'], model_audio.fittedvalues, color='red', label="Fitted line")
plt.title('Regression: LW Audio vs TW Audio')
plt.xlabel('LW Audio')
plt.ylabel('TW Audio')
plt.legend()

plt.subplot(1, 2, 2)
plt.scatter(df['TW SPINS'], df['TW AUDIO'], color='green', label="Data points")
plt.plot(df['TW SPINS'], model_spins.fittedvalues, color='orange', label="Fitted line")
plt.title('Regression: TW Spins vs TW Audio')
plt.xlabel('TW Spins')
plt.ylabel('TW Audio')
plt.legend()

plt.tight_layout()
plt.show()

print("Regression Summary for LW Audio vs TW Audio:")
print(audio_summary)

print("\nRegression Summary for TW Spins vs TW Audio:")
print(spins_summary)

Output

Insights From Analysis

Key Insights

- Last week streams are a far better indicator of this week streams

- Spins don't have a high correlation to the amount of audio streams

- The Z-Score analysis shows that the data doesn't contain large amounts of outliers

Further Actions

- Focus less on amount of spins to getting more overall streams

- Identify which streaming modes lead to the same amount of streams each week

- Using the data with abnormal z-scores identify the largest disparities between those songs and the songs we are trying to push at radio

Further Questions

- Which other variable could be more aligned with streams in the data set?

- What variables have the highest correlation with songs that have the similar amount of streams each week?

- Why did the songs with outlier's perform better or worse than our songs?

Analyzing a Radio Chart in Tableau