Objective: Create a classification model for Spotify Listening Personas
Goals:
- Work with real, up to date data
- Store data in the cloud
- Use relevant libraries
- Choose appropriate algorithms such as KNN, Decision Trees, Random Forest, Naive Bayes and find the “best” model (How do I know what the best model is?)
- perform validation tests on data
- outline assumptions of the model
- Be able to repeat the process on new data and upload it to the cloud (historical data cycle)
Outline:
- [x] Get data from Spotify Charts through API scraping and InfluxDB
- [ ] Clean and prepare the data, feature scaling and regularization for the models
- [ ] Explore and Visualize Data
- [ ] Ask Questions About the Data
- [ ] Use those questions to determine how you will build model, read Data 8 https://inferentialthinking.com/chapters/17/Classification.html
- [ ] Build Model using K-means, K-NN
- K-NN: set list of criteria using pandas queries to make a new column (’Mood’) classifying the type of song it is: [Chill, Upbeat, Crazy, Slow, etc]
- then use KNN to predict a song’s ‘Mood’ based on the features
- K-means: cluster the songs into k (first find optimal k) different listening personas
- [ ] Create A Dashboard for the model using Dash/Plotly and deploy it!
- Allow for new data to be read into the web-app and new predictions and metrics to be generated!
- Data Pipeline: Data collection → Wrangling → Feed it into the model → Model results via Dashboard/App
- [ ] Optimize Model?
- [ ] Explain the importance of this project: How would this be beneficial to Spotify?