Foundation Model for Recommender System

5 min readJun 23, 2024

Epoch 0 — The starter

The Recommender System (RecSys) serves as an engine to personalize the content to the user/customer. This system fulfills a crucial role in shaping the user's experience while using the app, as not all items suit everyone. For instance, I may enjoy listening to Paramore's songs, but you might not. Therefore, people with the same affinity as me should receive Paramore's setlist on their app, while others may receive a different setlist.

This article provides a technical review and implementation of a Foundation Model for the Recommender System, focusing primarily on a work by Spotify that has significantly influenced this article.

Let's see my recommendation list across various apps.

Images were taken from Amazon (left), Spotify (mid), and Amazon Prime (right)

Those combinations of items are only visible to me (or people with similar preferences, at least). This is known as Content Personalization. The main objective is to tailor the content to the user to stimulate more interactions. How amazing is it? So, how do we build such a system?

Yes, you guessed it! It's a Recommender System!

What exactly is it? How can we build it?

General Model for Recommender System

There are two common strategies to solve this problem:

Content-based: The system recommends items (query) based on a particular item (anchor). The reason for that recommendation could be based on certain relationships, such as co-views (viewed by the same users), co-listens, co-purchases, etc. The data structure would be item-to-item. This approach allows you to develop widgets with narrations such as "Items related to your last purchase", "Deals based on item X", "Buy this product together with your basket list", and so on.
Collaborative Filtering: The system recommends items (query) based on the user's affinity (anchor). This affinity could be derived from the user's past behavior, interaction with some products, or similarity with other users. The data structure would be user-to-item. Here are some narrations such as: "Recommended Podcast", "Movies we think you'll like", "More top picks for you", and so on.

Model Developments

At some point, many companies build their recommender systems generically. They develop a machine learning model, such as a Graph Neural Network (GNN), to solve a particular problem or narration. Amazon, research introduced by Fan (2023), uses this architecture only in e-commerce products, while they actually have other products, such as Movies (Amazon Prime). Later, they build another model to address movie recommendations. To the best of my knowledge, I didn't find a cross-learning between these two items. Or maybe because I didn't read all of their papers.

Nevertheless, building and maintaining several models requires a lot of work. Hence, the Foundation Model is all you need!

Foundation Model

FM is a large pre-trained model that serves as a base for other downstream tasks.

But why do we need it? As stated in the introduction, a company typically has several narrations to personalize the content. Instead of working in silos to build models for each theme which requires a lot of jobs, building FM could be the answer. The paper "Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks" introduces a Heterogeneous Graph Neural Network (HGNN) as a Foundation Model for any thematic recommendation at Spotify. The idea is to embed any item modality (i.e., product, movie, song, podcast) into the same mathematical latent space (embedding space) using a Graph Neural Network (GNN). Later, this item embedding will remain static and serve as a base layer for the adaptation layers (thematics narrations).

Here is how to build this model:

Graph Construction. Takes any set of items (including different types) that are being interacted with by the same users as the connected nodes in the graph. Possible interactions include co-views, co-listens, and co-purchases. The paper above proposes to use co-listening items from audiobooks and podcasts to build the graph.
Training Pair Creation. It takes equal samples for all “inter” and “cross” relationships. In the example of audiobooks and podcasts, the samples of audiobook-audiobook (inter), audiobook-podcast (cross), and podcast-podcast should be equal. Failing this requirement would lead to bias in particular items (given an imbalance problem present in the dataset).
Model Learning. The paper above utilizes GraphSAGE architecture by Hamilton (2017) to learn the node embedding based on `aggregation` and `update` from the neighborhood. Further, it takes the item's titles and descriptions as the node features and passes them to the LLM embedding layer. Hence, this HGNN can focus on learning the relationship between items.

This HGNN model has the capability to infer the embedding of any item e.g., audiobooks and podcasts.

Adaptation Layer

The most exciting part is the adaptation layers. What do you think the adaptation layer will be?

Here are a few examples:

Similar Item: Given that the user is listening to a podcast, we could recommend a similar podcast based on the current listening. This approach takes the current listening podcast embedding as an anchor to query a similar podcast via the nearest neighbors (vector) search. Podcast-to-Potcast theme.
Cross-Item Recommendation: Given the user is listening to a podcast, we could recommend a related audiobook based on the current listening podcast. The vector search is also doable, as the podcast and audiobook share the same latent space. Podcast-to-Audiobook theme.
Personalized Item: Given the user's past interaction with a set of items (podcasts and audiobooks), these items most likely represent their affinity. Hence, the system takes an aggregate function (average) of those items' embeddings as user representation. Then, it evaluates this user embedding against a set of item candidates to rank it in a particular order. The paper above uses Two-Tower Models for this adaptation. User-to-Item (podcasts/audiobooks) theme.

Personal Thoughts

The concept of the Foundation Model (FM) for the Recommendation System is remarkable. This idea suits any item recommendation as long as it has metadata such as a title and description. Ultimately, this approach allows for capitalizing on the HGNN in multiple cases, varying the displayed items to enhance the user experience. I highly recommend implementing this approach for those who want to develop their own Recommendation System on the app.

Disclaimer: The author is an independent writer. There was no affiliation with any company during the writing of this article.

References

Damianou, Andreas, et al. “Towards Graph Foundation Models for Personalization.” Companion Proceedings of the ACM on Web Conference 2024. 2024
De Nadai, Marco, et al. "Personalized audiobook recommendations at spotify through graph neural networks." Companion Proceedings of the ACM on Web Conference 2024. 2024.
Hamilton, Will, Zhitao Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Advances in neural information processing systems 30 (2017).
Fan, Ziwei, et al. "Personalized federated domain adaptation for item-to-item recommendation." Uncertainty in Artificial Intelligence. PMLR, 2023.