How does Netflix or Spotify knows what you like? – A briefing on Recommender Systems

Have you ever wanted to know how Netflix, Spotify or others interactive platforms recommend you products (include here Amazon, a pioneer), well, recently I have been studying this topic, it’s an area called Recommender Systems which tries to fix a problem known as Long Tail.

The Long Tail problem refers to having a great number of products to offer to a customer and the task to find the one he is looking for. A common approach will be to recommend popular items, but is that okey all times?

Long Tail Example
Your tastes may be way different from the mass. Don´t you know what “Despacito” is?

In deed is not always accurate to recommend the most popular items, and the success of an online selling system depends on this accuracy, as a user you will lose confidence in a system that doesn’t know you and give you items you don’t want. So how Amazon, Netflix and Spotify (among others) recommend you items? I’m going to explain two simple and yet really powerful approaches that handle this task.

Collaborative Filtering

The collaborative filtering states that: Given a user A who loves items 1 and 2, and given a user B who loves item 1, the recommender system must recommend item 2 to user B. That’s a really simple logic if you think yet really effective. That’s how you start watching LOTR movies on Netflix and receive Star Wars suggestions on your dashboard.

Collaborative Filtering Visual
An unseen item is recommended to a user who shares tastes with other.

The collaborative filtering builds a matrix of User rows vs Item columns, having in each position of this matrix the rating given by a user to an item. Then the distance between users depending of the rating they have given to items is calculated with a metric, the most used measures are Jaccard, Pearson and Cosine Similarity.

Having a system with a low number of items and users makes this approach feasible to search in all the data, but this is not going to scale if (as Netflix or Spotify) you have millions of users and items. In such cases you define a limit of neighbors to search.

A cool feature that came with this approach is the boost on “serendipity“; which in the recommender systems context is the ability of the system to predict items that are going to be liked by the customer, but are totally new to him (like a heavy metal fanatic who receives a recommendation on jazz for example).

Nevertheless, there is an “Achilles heel” on this approach, that’s called cold-start, it represent the drawback to predict items to a new user, and it make sense if you think, how are you going to predict something to a user that has not rated any item and you don’t know what does he likes. Even if the customer have rated a few, it will be hard to the algorithm to find similar users. The same happens when a new item is created, no one  has rated such item, so it’s invisible.

Content based recommendation

The collaborative filtering doesn’t take into account any item feature, but the content based does. In this approach, each item is mapped to a set of values that represent it, the most common item representation is through keywords. Having this representation the system can find similar items, using information retrieval techniques as TF-IDF.

Keywords for movies
Example of movies described as keywords. LOTR and Captain America are similar because both have heroes, Dunkirk and Captain America share the “war” theme.

In practice the system will find which items have been liked (or high rated) by a customer, then it will find other n similar items to those preferred by the actual customer and such items will be presented.

This solution avoids the cold start problem, as you just need one rated item from the new customer to start the predictions. But the contend based has a drawback, there is no serendipity here, as the system only recommend “more of the same”.

Wanna Try?

The real recommender systems works as hybrids, joining many approaches as the last two. This is a new cool area with a lot of research to do and many interesting literature, I encourage you to keep reading and to go deep in this topic.

If you want a practical example here I let you one of each approach, a movie collaborative recommender in Java and a movie content recommender in Python, fell free to modify the code to your needs. Happy coding!

Leave a comment