Summary:

This analysis explores how user-curated playlists surface K-pop tracks and whether there are observable differences in engagement signals that explain exposure. Using Spotify API data for the top tracks of 27 K-pop groups, I engineered artist and track level features including popularity, label affiliation, group composition, and audio attributes. Then combined exploratory analysis, interpretable modeling, and robustness checks to identify which signals consistently differentiate exposed tracks. The results illustrate how observable engagement and structural signals may be used to better understand content exposure in algorithmically entwined discovery systems.

<aside> ⭐

Key Takeaways:

Playlist inclusion is relatively rare (only 13% of the sample), but is concentrated among tracks with high popularity scores.
Track popularity is a strong predictor of inclusion, but does not fully explain engagement differences.
Even after controlling for artist and track features, playlisted tracks show higher engagement.
Big 4 affiliation and gender show consistent associations even after adjustment, suggesting there is also a structural dynamic involved.
Playlist inclusion and popularity reinforce one another and are likely influenced by unobserved promotional, social, and algorithmic mechanisms.
Findings provide a framework for operationalizing content discovery signals to inform platform strategy, recommendation optimization, and audience development decisions. </aside>

Methods:

Data Visualization
Regression Analysis
Feature Engineering
Confounder Adjusted Analysis

Models:

Logistic Regression
Elastic Net
Covariate-Adjusted Regression

Tools/ Packages Used:

R
- corrplot tidyverse ggplot2 car caret glmnet pROC vip recipes dotwhisker ggridges
Python 3.13.3
- spotipy pandas numpy networkx

Code: