Spotify's Discover Weekly: How machine learning helps in finding the music we love? - Priyanka Mehta
Hey guys! This is my very first blog and I am excited to tell you all about my latest obsession with the Spotify music app. Yes, guys! a music app. Being a music lover I always wanted that my music app just automatically knows the kind of songs that I wanted to listen and for that, I do not even want to search for every song and then play that song. It all should just happen automatically. And guess what my wish just came true with this app. Spotify really knows my taste of music and plays the songs according to my mood and love the app for this ability. Spotify has a feature called Discover weekly in which every week on Monday, each individual user gets a customized list of 30 new songs that the user has not listened to before. Interestingly the users are loving the songs. It is mgical. As if, Spotify knows the hearts of the people. People are going crazy and tweeting these great things about Spotify.
But is it all true? Is it some kind of magic? No guys, it is not magic. It is all because of the amazing recommendation engine deployed by Spotify.
First, let us see how other streming services are doing their music recommendation and how Spotify is different from others.
A brief history of online music curation
In the 2000s , Songza started off the online music curation scene using manual curation to create playlists for users. “Manual curation” meant that some team of “music experts” or other curators would put together playlists by hand that they thought sounded good, and then listeners would just listen to their playlists. Manual curation worked okay, but it was manual and simple, and therefore it did not take into account the nuance of each listener’s individual music taste.
Then came Pandora. It employed a slightly more advanced approach, instead manually tagging attributes of songs. This meant a group of people listened to music, chose a bunch of descriptive words for each track, and tagged the tracks with those words. Then, Pandora’s code could simply filter for certain tags to make playlists of similar-sounding music.
Around the same time, Echo Nest was born taking a more advanced approach. Echo Nest used algorithms to analyze the audio and textual content of music, allowing it to perform music identification, personalized recommendation, playlist creation, and analysis.
Finally, came a different approach named Last.fm, which still exists and uses a process called collaborative filtering to identify music its users might like.
Spotify's recommendation system:
Spotify doesn't actually use any single recommendation system. Instead they have mixed together the best strategies of other service providers to create their unique recommendation system for #DiscoverWeekly.
Spotify's Discover Weekly works on three types of recommendation models:
Collaborative Filtering models which work by analyzing your behavior and others’ behavior.
Natural Language Processing (NLP) models, which work by analyzing text.
Audio models, which work by analyzing the raw audio tracks themselves.
Recommendation model # 1:Collaborative Filtering
Netflix was the first company to leverage the concept of "Collaborating filtering" in its recommendation system. Now it has become the starting point to make any recommendation system. Spotify also uses this concept. Let us understand what is actually meant by Collaborative Filtering.
Collaborating Filtering means analyzing the behavior of one person, by analyzing the behavior of other individuals. let's see how it works.
Suppose we have two users: User 1 and user 2. Now User 1 likes the songs A, B, C, D and User 2 likes the songs A, C, D, E . Now collaborative filtering uses the data and says,
Both the users like A, C and D Songs. So probably you both are similar. Since both are similar users, both of you would like to listen to songs heard by one another which you might have not heard before
How this concept is actually implemented by the Spotify to calculate 'millions of users’ suggested tracks based on millions of other users’ preferences?
.................. this is all done by matrix math using python libraries.
The matrix that will come out is gigantic. In matrix X-axis or say each column represents the 30 million songs and Y-axis or each row represents the 140 million users and still growing list of both songs and users in the Spotify database.
At the matrix’s intersections, where each user meets each song, there is a 1 if the user has listened to that song, and a 0 if the user hasn’t. So, if I listened to the song “Thriller”, the place where my row meets the column representing “Thriller” is going to be a 1.
The matrix that we will get will be a sparse matrix - as the no. of songs that user might have not heard will be more than the songs that user would have heard so there will be more 0's in the matrix but the placement of 1 holds a piece of very critical information.
Then, the Python library runs the long, complicated matrix factorization formula
When it finishes, we end up with two types of vectors, represented here by X and Y. X is a user vector, representing one single user’s taste, and Y is a song vector, representing one single song’s profile.
The User/Song matrix produces two types of vectors: User vectors and Song vectors.
Now we've got 140 million user vectors — one for each user — and 30 million song vectors. The actual content of these vectors is just a bunch of numbers that are essentially meaningless on their own, but they are hugely useful for comparison.
To find which users have a music taste that is most similar to mine, collaborative filtering compares my vector with all of the other users’ vectors using a mathematical dot product. Whichever produces the lowest product is the most similar user to me.
The same goes for the Y vector, songs — you can compare a song’s vector with all the other song vectors, and find which songs are most similar to the one you’re looking at.
Recommendation model # 2:Natural Language processing
The second type of recommendation model that Spotify employs is Natural Language
Processing (NLP) models. These models track metadata, news articles, blogs, and other text around the internet.
Natural Language Processing — the ability of a computer to understand human speech as it is spoken and in itself is a huge concept. So I might not be able to explain the whole concept in this article but here is an overview on a high -level how it works.
Spotify continuously crawls the web looking for the blog posts or any other text related to music and keep recording what different users are writing about the specific songs and artist and what adjectives and language they are using and also what they are saying about the other songs and artists alongside them
The most-used terms bucket up into what Spotify calls “cultural vectors” or “top terms.” Each artist and song has thousands of daily changing top terms. Each term has a weight associated, which reveals how important the description is.
Each artist and song can have thousands of terms describing them. Then, much like in collaborative filtering, the NLP model uses these terms and weights to create a vector representation of the song that can be used to determine if two pieces of music are similar.
Recommendation model #3: Raw Audio Model
Since now we have looked at the models work with the data but this third model is something that makes Spotify much more accurate and different from the others. This model works with the actual audios and analyze them.
This model is a boon for the young musicians who have just started and published out a song. They will hardly have 50 listeners, so there will have very few other listeners to collaboratively filter it with and also there will not be anyone talking about the song on the internet, so NLP will also fail here and it will be difficult for the singer to show up in the recommendation list. Luckily, raw audio models don’t discriminate between new tracks and popular tracks, so with their help, young singers' song can end up in a Discover Weekly playlist alongside popular songs! Isn't it cool guys?
So, how does it work? with Convolutional Neural Networks
To analyze the raw audio, the track goes through the same kind of neural network that analyzes images, called Convolutional Neural Networks. It processes the raw audio and produces characteristics like time signature, key, mode, tempo, and loudness. After being processed by CNN, it produces metrics that make similar songs fall into the same category. This understanding allows Spotify to compare songs based on those key metrics. For example, someone who likes heavy metal might like songs that are more “loud.”
By combining these three models, Spotify analyzes the similarity of different songs and artists and recommends new, not-listened-to-before songs to users’ playlists every week. These models made Discover Weekly one of the most popular features of Spotify.
In the nutshell, I hope this new obsession of mine will be informative for you and might have made you more curious like me to know the science behind all technologies that left us thinking " How do they work? ".