top of page

RP3Beta Recommender Model and Item Similarities in Python with SciPy

A step-by-step tutorial on how to build a state-of-the-art RP3Beta graph recommender from scratch in Python, understand the maths, calculate item similarity and experiment with tuning the beta parameter to control popularity bias.

rp3beta_edited.png

What is the RP3Beta algorithm?

 

In this tutorial, we are going to build an RP3Beta graph recommender from scratch in Python. This surprisingly simple model often obtains state-of-the-art performance in recommendation benchmarks due to how it calculate the item-item similarity matrix that it uses to generate recommendations. This tutorial focuses entirely on the maths and a pure Python/SciPy implementation of RP3Beta. If you'd rather learn about how to access the RP3Beta algorithm alongside a range of other methods, all wrapped up in a python package, then check out the Item and User KNN Recommender models with similaripy tutorial.

Building a high-quality recommender system often means finding the right balance between suggesting safe, popular blockbuster items and surfacing highly relevant, niche discoveries. Instead of looking purely at direct co-occurrences between items (like Cosine or Jaccard similarity), RP3Beta models the recommendation problem as a random walk on a user-item bipartite graph. To generate personalized recommendations for a specific user, it executes a 3-hop random walk followed by a popularity penalty:

  1. User to Items: Start at the Target User and divide probability mass evenly among the items they have interacted with in the past.

  2. Items to Users: Flow that probability backward to all other users who have interacted with those same items (finding similar users).

  3. Users to Items: Flow the probability forward one last time to the items those similar users engaged with (finding candidate items).

  4. The Beta Penalty: Because random walks naturally favor highly connected "popular" items, the final probabilities are divided by the item's popularity (degree) raised to the power of β to surface more relevant, niche recommendations."

 

By following the connection between users -> items -> users -> items this allows the model to capture indirect connections. If two niche items are never bought together but are both bought by users who also buy a certain popular item, the graph walk connects them. Below is an example of how the 3-step walk with the takes place with an additional step for the beta parameter.

Whereas standard graph walks naturally gravitate toward highly popular items, RP3Beta introduces a penalty step (the beta (β) parameter) at the very end of the walk. This divides the final similarity scores by the popularity (degree) of the target item. This actively suppresses blockbuster items and pushes highly relevant, long-tail items to the top of your recommendations.

 

There is actually an additional parameter, alpha (α), that controls the transition weights during the first three steps of the walk. We'll cover it in more detail but in our current example, we assume α=1, meaning the probability mass is split evenly across connections. For instance, if a user has interacted with 10 items, each item receives exactly 1/10th of the flow.

By adjusting α, we can apply an exponent to these degrees during the transition. This allows us to mathematically reduce the influence of "power users" who have interacted with thousands of items. By penalizing these high-degree nodes during the walk, we prevent their noisy, non-specific data from drowning out the clean signal of users with more niche, specific tastes.

RP3Beta in a few lines of Python


We'll start with a tiny, hard-coded 5x5 interaction matrix like in the example above. This will allow us to see exactly how the inputs are shaped before we run the matrix multiplications. With just a few lines of Python we can apply the RP3Beta algorithm to our mini interaction matrix, creating the item-item similarity matrix we'd then use to make our recommendations.

In just a handful of lines of SciPy, we've successfully transformed our raw user-item interactions into a fully functioning item-item similarity matrix. The matrix S shown above represents the final scores generated by the RP3Beta algorithm. Each row represents a starting item and the values within that row tell us the computed 'similarity' which is simply the likelihood of our random walker ending up at that specific target item after taking its journey through the graph.

You might also notice that the main diagonal of the matrix is completely filled with zeros. This is a deliberate and necessary step at the end of the algorithm. Whilst an item is mathematically perfectly similar to itself, recommending a user the exact same product they just viewed or purchased isn't particularly helpful! By zeroing out the diagonal, we ensure the algorithm focuses purely on discovering new, related items to recommend.

 

If that initial block of matrix multiplications looks a bit dense, don't worry. We are now going to break down both the maths and the code, step by step, to see exactly how RP3Beta uncovers these relationships.

RP3Beta step by step

Before we can take our random walk and calculate our item-item similarities, we need to understand the shape and connectivity of our graph. In a user-item bipartite graph, the users and items represent the nodes, whilst the interactions (the 1s in our matrix) represent the edges connecting them.

The degree of a node is simply the total number of edges connected to it. By calculating the sum along the axes of our sparse matrix, we extract two crucial pieces of information:

  • Item Degrees (di​): By summing down the columns (axis=0), we count exactly how many users interacted with each specific item. This gives us a direct measure of an item's popularity. If we look at the output for our example, the first item was interacted with by 3 users, making it the most popular item in our mini-catalogue.

  • User Degrees (du​): By summing across the rows (axis=1), we count how many items each individual user interacted with. This tells us a user's overall activity level or propensity to interact.

 

Understanding these degrees is fundamental to the mathematics of RP3Beta. When we simulate a random walk on a graph, the probability of stepping from one node to another is determined by the number of available paths. For example, if an item is connected to 100 users, a random walker starting at that item has a 1/100​ chance of walking down any specific user's path.

Therefore, to move from raw interaction counts to actual random walk probabilities, we must normalise our interaction matrix using these degree arrays. This converts our binary 0s and 1s into transition probabilities, ensuring that our multi-step walk accurately reflects the mathematical structure of the RP3Beta algorithm.

​​As we can see from the output, the degree arrays perfectly map the connectivity of our matrix. For instance, the first value in our Item Degrees array is 3.0, which corresponds exactly to the three users (users 0, 1, and 4) who interacted with the first item in column 0. Similarly, the first user in row 0 interacted with two items, giving them a User Degree of 2.0. These arrays are the foundation of our random walk. When our algorithm stands on an item node and decides which user to walk to next, it looks at the item's degree. If an item is connected to 3 users, the raw probability of walking down any specific path is 1/3​.

However, before we build our transition matrices, RP3Beta (building upon the P3Alpha algorithm) introduces the smoothing parameter we mentioned earlier called α (alpha). Instead of simply dividing by the raw degree, we raise the degree to the power of α. This clever addition allows us to tune the random walk:

  • If α=1, we get a standard random walk based on true probabilities.

  • If α<1, we smooth out the probabilities, giving weaker links slightly more weight and increasing the diversity of our walk.

 

To prepare for our matrix multiplications, we need to invert these degrees (since dividing by a degree is mathematically the same as multiplying by its inverse) and transform them into diagonal matrices. To incorporate our degrees into matrix multiplications, we use SciPy's sp.diags function to create diagonal matrices. A diagonal matrix has values along its main diagonal and zeros everywhere else.

Notice the maths inside the function below: np.power(d_I, -alpha). As any number raised to a negative power is mathematically equivalent of 1 divided by that number, this single line both applies our α parameter and calculates the inverse of the degree.

​​Since we have set α=1.0 for this step, we are simply calculating the standard inverse, representing true probabilities. Looking at the printed output for D_I_alpha, you can see the first value on the top-left of the diagonal is approximately 0.333. This matches our first item, which had a degree of 3, meaning a 1/3​ probability of walking down any of its connected paths. The third value down the diagonal is 0.5, corresponding to the third item's degree of 2 which gives a 1/2 probability. The user diagonal matrix D_U_alpha follows the exact same logic based on the user activity degrees.

 

Why go to the trouble of creating these diagonal matrices? Because in linear algebra, multiplying a standard matrix by a diagonal matrix is a blazingly fast and highly efficient way to scale its rows or columns.​ This brings us to the core mechanics of the P3alpha random walk (which forms the first phase of RP3Beta). We need to calculate the actual transition probabilities - the exact likelihood of our random walker moving from one node to another. To take these steps, we multiply our diagonal scaling matrices by our original user-item interaction matrix. The next block of code calculates two distinct steps:

  • P_I_to_U: The transition probabilities of stepping from an Item to a User. To get this, we scale the transposed interaction matrix (X_sparse.T).

  • P_U_to_I: The transition probabilities of stepping from a User back to an Item. To get this, we scale the original interaction matrix (X_sparse).

 

Let's run the multiplications and look at the resulting transition probability matrices.

Let’s break down exactly what these transition probability matrices are telling us. By multiplying our original binary interaction matrix by our inverse degree diagonal matrices, we have effectively converted our raw interaction counts into exact probabilities.

If we look at the first row of P_I_to_U (the Item-to-User transitions), we can see the value 0.3333 in the first, second and fifth columns. This means if our random walker starts at Item 0, there is a exactly a 1/3​ chance of stepping to User 0, User 1, or User 4.

Similarly, if we look at the first row of P_U_to_I (the User-to-Item transitions), we can see it's 0.5 in the first and second columns. If our walker is currently standing on User 0, there is a 1/2​ chance of stepping to Item 0 or Item 1. Now that we have the probabilities for the individual legs of our journey, we can combine them to complete our 3-step random walk. In linear algebra, we can chain these transitions together simply by taking the dot product of the two matrices.

When we run this multiplication, we generate the P3Alpha Similarity Matrix, which is the foundational precursor to RP3Beta. The matrix W contains the final probabilities of our completed random walk. The value located at Wi,j​ tells us the exact probability of starting at Item i and ending up at Item j. There are two important observations to make about the matrix:

  1. The Diagonal is Dominant: notice that the highest probabilities sit along the main diagonal (for example, 0.44 at W0,0​). This mathematically makes perfect sense: if you start at Item 0, walk to a user who bought Item 0, and then walk to an item that user bought, there is a very high likelihood you simply step right back to Item 0!

  2. The Popularity Bias: If you look at the off-diagonal values (our actual item-to-item similarities), you will notice that the scores naturally favour paths through highly connected items. Because popular items have so many edges connecting them to the user base, they act like massive hubs or gravity wells within our graph. Random walks naturally pool around them, meaning these blockbuster items will overwhelmingly dominate the top recommendations.

 

While P3Alpha is a fantastic algorithm in its own right, this inherent popularity bias means smaller, highly-relevant niche items rarely get a chance to shine. This is exactly the problem that the final β step of the RP3Beta algorithm was built to solve. The β hyperparameter dictates exactly how aggressively we want to penalise items simply for being popular. A β of 0 removes the penalty entirely (leaving us with a standard P3Alpha model), whilst a β of 1 applies a high penalty, heavily favouring long-tail, niche discoveries. In the example below, we use a moderate penalty of 0.5.

To mathematically apply this penalty, we use the exact same SciPy trick we used for our α parameter. We take our original item degrees (di​) and raise them to the negative power of β, storing the result in a new diagonal matrix, D_I_beta. Next comes a crucial detail in the linear algebra: S = W.dot(D_I_beta). Notice that we are multiplying our P3Alpha matrix (W) by D_I_beta on the right side of the dot product.

 

In matrix multiplication, multiplying by a diagonal matrix on the right scales the columns of the target matrix. Because the columns of W represent the destination items of our random walk, we are effectively dividing each transition probability by the popularity of the item we are walking to. The more popular the destination item, the more its final similarity score is squashed.

Before we can use our new matrix S for recommendations, we must clean up the main diagonal. While the mathematics of the random walk correctly identified that an item is highly connected to itself, recommending a user the exact same product they are currently viewing is a terrible user experience!

 

To do this efficiently in SciPy, we briefly convert our Compressed Sparse Row (csr) matrix into a List of Lists (lil) matrix. The lil format is specifically designed to be fast and efficient when altering matrix structures or changing individual elements. We use .setdiag(0) to wipe the main diagonal clean, and then immediately convert it back to the mathematically efficient csr format for any downstream recommendation tasks.

If we look closely at the final RP3Beta Similarity Matrix (S) and compare it to our earlier W matrix, we can see that the β penalty has indeed had an effect. The most popular items in our mini-catalogue sit in columns 0 and 1and their similarity scores have been penalised across the board. For example, at row 2, the probability of transitioning to the popular Item 1 was 0.17 in our P3Alpha matrix but it has now been suppressed down to 0.10.

Conversely, the scores for transitioning to the less popular, niche items in columns 2, 3, and 4 have become much more competitive. By tuning the α and β parameters, RP3Beta gives us exceptional control over the diversity and novelty of our recommendations, making it a remarkably powerful tool for any data scientist to build from scratch. 

Alpha vs. Beta: The Journey vs. The Destination

 

While both parameters manage how the graph handles highly connected nodes, they operate at opposite ends of the recommendation process:

  • Alpha (α) shapes the journey: It controls the transition probabilities during the walk. When α=1.0, the walker splits its path evenly (e.g., a user with 10 items sends a 1/10 probability to each). Lowering α (e.g., to 0.5) dampens this division, effectively boosting the influence of highly active users or popular items that act as bridges. It allows us to tune how much we "trust" high activity levels during the flow of probability.

  • Beta (β) penalizes the destination: Once the 3-step walk is complete, β swoops in as a final post-processing step. It divides the final scores by the target item's popularity to directly suppress "blockbuster" bias. This ensures that niche items aren't drowned out by the sheer volume of traffic that naturally gravitates toward global hits.

 

Tuning α helps the model decide whether to prioritise the specific signals of niche users/items or the broad trends of power users/popular items during the transition phase. In practice, β usually has a more dramatic impact on reducing popularity bias. We'll actually leave alpha = 1.0 throughout this tutorial.

Let's now take our RP3Beta item-item similarity generating code and wrap it into a handy function we can call so we don't have to keep writing out the full code each time. Below we'll make a similarity matrix calculating function as well as a 'get_similar_items function' that allows us to extract, for any candidate item, the most similar items to it from our matrix.

Let's apply our new function to our example interaction matrix to calculate the item-item similarities and extract the 2 most similar items for item 0:

Looking back at the very first row of our S matrix from earlier, Item 1 had a score of 0.16 and Item 2 had a score of 0.12. Our extraction function works perfectly!

Generating recommendations on real-world data

Now that we've built and tested our RP3Beta functions on a tiny dummy matrix, it's time to unleash them on a real-world dataset. We'll be using data from the H&M Personalized Fashion Recommendations competition on Kaggle. 

Because real-world e-commerce data is incredibly large, ideally we want to avoid loading everything into a dense array as we risk running out of memory. Before we calculate our similarity matrix, let's apply some pre-processing to shrink the data down to a more manageable size and get it into the correct format for SciPy.

Let's walk through exactly what the code above is doing to prepare our graph:

  1. Interpretable Item IDs: The raw data uses a unique article_id for every single product variant. This means the exact same 'Strap top' in black and white or different sizes would have different IDs. To make our recommendations much more interpretable and to naturally shrink our dataset, we swap out the article_id for the more generic prod_name and then dedupe on it (treating any colour variation as an interaction with that core product). We also rename this to itemID to keep our naming conventions standard.

  2. Binary Implicit Feedback: RP3Beta works wonderfully with implicit feedback—actions like clicks, views, or purchases. Because we are treating interactions as binary (1 for a purchase, 0 for no purchase), we do not need to worry about whether a customer bought the same item three times. We simply use drop_duplicates to keep only their very first purchase of that product.

  3. Time Filtering: To further reduce the size of the data whilst ensuring our model learns from relevant, recent trends, we filter the dataset to only include transactions from the latest 6 months.

  4. Integer Mapping: SciPy sparse matrices do not understand long string IDs or hashed customer keys. We must create a set of dictionaries (user_map and item_map) to translate every unique user and item into a zero-indexed integer. We also create an inv_item_map so we can translate our numeric recommendations back into readable, human-friendly product names later on!

  5. Building the CSR Matrix: Finally, we use SciPy's Compressed Sparse Row (csr_matrix) format. This efficiently stores our matrix by only recording the non-zero elements (our 1s), which gives us significant memory savings when dealing with matrices that are mostly zeros.

This process gives us a newly constructed interaction_matrix with shape (743,154, 24,791) which is a massive leap from our 5x5 dummy matrix! We are now working with a graph containing over 743,000 unique users and nearly 25,000 distinct items. If we attempted to store this as a standard, dense NumPy array, we would be holding over 18 billion elements in memory, the vast majority of which would be zeros, since most users only buy a tiny fraction of the catalogue. This is why we rely on SciPy's sparse matrices; they allow us to perform complex linear algebra on hundreds of thousands of users without crashing our machines.

Now that our data is prepped, we can feed this massive sparse matrix directly into our build_rp3beta_matrix() function. Once the calculation is complete, we'll write a bit of exploratory code to peek inside the resulting item-item similarity matrix. This will help us to understand its overall density, what a typical item's recommendation profile looks like and what those final mathematical scores actually mean in a real-world context.

The output above reveals several insights about how the RP3Beta algorithm behaves at scale:

  1. The Square Matrix: Our resulting S_main matrix has a shape of (24,791, 24,791). Because it is an item-item similarity matrix, both the rows and columns represent our 24,791 unique products. Each row contains the transition probabilities (which we use as the similarities) from one starting item to all other possible items in the catalogue.

  2. Matrix Density & The 3-Step Walk: Out of a staggering 614 million possible item-to-item connections, only about 36.4 million (roughly 5.9%) are non-zero. Why is this? It all goes back to the physics of our 3-step random walk. For an item to have a non-zero similarity score with another item, they must share an indirect connection through a user within exactly three steps (Item → User → Item). If two items belong to completely different customer niches and never cross paths in the graph, their similarity remains zero. This sparsity is a massive advantage as it means the algorithm naturally filters out the noise and hones in only on items that genuinely share a mathematical relationship.

  3. Making Sense of Tiny Scores: Let's look at the stats for our very first product, the 'CONAN STRUCTURE SWEATER'. The random walk found 927 other items connected to it, which is plenty to generate a top-10 recommendation list! However, you might notice that the highest similarity score is a seemingly tiny 0.0013, and the average is near zero. Don't let these tiny numbers alarm you. In a massive graph, a random walker's probability gets fractured and divided thousands of times across highly active users and heavily connected items. Add in the aggressive popularity penalty applied by our β parameter, and the final absolute probabilities become incredibly small. ​The secret to KNN graph recommenders however is that  the absolute magnitude of the score does not matter in the slightest. All we care about is the relative ranking of these scores. As long as the best, most relevant items have slightly higher scores than the irrelevant ones, our get_similar_items() function will successfully sort them and extract the perfect recommendations.

So far we've successfully calculated our massive S_main item-item similarity matrix. We know which items share mathematical relationships based on the RP3Beta graph walk. But how do we actually use this matrix to generate a personalised list of recommendations for a specific customer? One of the most elegant properties of item-based KNN models (like RP3Beta) is that once the item-item matrix is built, generating user recommendations requires just one quick matrix multiplication.

Instead of learning complex, latent embeddings for every individual user (as you would in a Matrix Factorisation model), we simply treat a user as a "basket" of the items they have already interacted with. By representing the user as a sparse row vector of 1s and 0s, we can take the dot product of this user vector and our item-item similarity matrix. Mathematically, it looks like this: UserScores = UserHistory × ItemSimilarity.

This operation effectively takes all the items the user has bought, looks up their respective rows in the similarity matrix, and adds them all together. If multiple items in the user's history strongly point to a specific new product, those similarity scores stack up, pushing that new product to the top of the recommendation list.

Let's dissect what just happened in that code block:

  1. Extracting the User Vector: We pull the very first user (target_user_idx = 0) directly out of our main interaction_matrix. This gives us a 1×24,791 sparse vector where their historical purchases are marked as 1s.

  2. The Dot Product: The line user_history_sparse.dot(S_main) is the engine of the recommender. It aggregates all the similarity scores from the RP3Beta matrix specifically for the items this user has engaged with, flattening the result into a single 1D array of scores for every item in the catalogue.

  3. The Crucial Filter: We do not want to recommend a sweater the user has already bought! By finding the indices of the user's historical items and setting their prediction scores to -np.inf (negative infinity), we guarantee they will be pushed to the very bottom of the ranking and completely excluded from the final recommendations.

  4. Efficient Extraction: Just as we did in our get_similar_items function, we utilise the highly efficient np.argpartition to grab the indices of the top 5 highest-scoring items without wasting compute sorting the other 24,786 products.

 

Looking at the output, the logic of our RP3Beta model seems to be working well. Our target user has a history rich in knitwear (sweaters), dresses, slacks, and tunics. The resulting top 5 recommendations generally reflect this eclectic fashion profile, suggesting a new knit, a striped item, and a new tunic. Because we tuned our β parameter to 0.6 earlier, we have actively suppressed the generic, blockbuster items that might normally flood this list, ensuring our user gets highly relevant, personalised fashion discoveries derived purely from the underlying structure of the graph.

In our current implementation, we are using every single non-zero connection found by the random walk. However, in professional implementations, we often perform a step called Top-K Sparsification. This involves looking at each row of our similarity matrix S and zeroing out everything except the K highest scores. The KNN recommender tutorial covers this in more detail and while it might seem like we are throwing data away, it serves two important purposes:

  1. Eliminating Noise: Many of the 36 million connections we found are "weak ties" i.e. items that are only technically connected by a single, hyper-active user. By forcing the model to only look at the K strongest neighbours, we ensure that our recommendations are driven by the most robust patterns in the data.

  2. Lightning-Fast Inference: Multiplying a user's history against a pruned matrix is significantly faster and requires much less memory. In a production environment with millions of customers, this efficiency is the difference between a recommendation taking 5 milliseconds or 5 seconds.

Extracting item-item similarities

In a standard symmetric matrix (like a basic Cosine similarity), the similarity between Item A and Item B is exactly the same as the similarity between Item B and Item A. However, because RP3Beta is built on a directed random walk and applies a destination-specific penalty, our matrix is asymmetric. The journey from A→B is not mathematically identical to the journey from B→A. 

 

This asymmetry means we must be highly intentional about how we read the matrix:

  • Reading a Row (i): The row for Item i tells us the probability of walking away from Item i and landing on other items. It answers the question: "If a customer is currently interacting with Item i, what are they most likely to interact with next?" These are our successors or our "People who liked this also liked..." recommendations.

  • Reading a Column (j): The column for Item j tells us all the probabilities of walking towards Item j from everywhere else in the graph. It answers the question: "Which items best predict that a user will ultimately want Item j?" These are our precursors.

 

To power a "Similar Items" widget on a product page, we want to read the rows. To find some similar items, let's use our integer mapping dictionaries to translate a human-readable product name into its matrix index, slice out that specific row, and pass it to our get_similar_items function:

This looks pretty good! We fed the algorithm a product called the 'Joscelyn jumpsuit'. Without knowing anything about fashion, without reading the text descriptions and without looking at a single product image, the RP3Beta graph successfully recommended:

  1. Three other jumpsuits.

  2. A pair of shorts (another casual, warm-weather garment).

  3. A headband (a logical accessory).

 

The model inferred all of this contextual nuance purely by observing the structural patterns of how users move between these items. By heavily penalising the most popular items in the catalogue, β allowed these highly specific, structurally relevant garments to float to the top of the probability scores.

Tuning beta and its effect on popularity

Let's take a moment to concretely prove the central thesis of the RP3Beta algorithm: that the β parameter directly and predictably controls popularity bias. We went thought the maths of this penalty earlier but now let's see how/if it works on a real-world catalogue of 24,000 products. To demonstrate this, we can write a short script that calculates the "popularity rank" of every item in our H&M dataset. A rank of 1 means it is the absolute best-selling, most interacted-with product in the entire catalogue, whilst a rank of 24,000 means it is an incredibly niche item with barely any interactions.

By running our target item (the 'Joscelyn jumpsuit') through the model at three different β settings (0.0, 0.5, and 1.0), we can calculate the average popularity rank of the resulting top 10 recommendations and see exactly how the algorithm's behaviour shifts.

The output illustrates exactly why RP3Beta is so highly regarded! Our target item, the 'Joscelyn jumpsuit', sits at a respectable mid-tier popularity rank of 4,081. Let's look at how the recommendations shift as we turn the β dial:

  • β=0.0 (Pure P3Alpha): With no popularity penalty applied, the average popularity rank of our recommendations is a high 1,692. Notice the third recommendation: the 'Timeless Midrise Brief'. This item is Rank 1! It's the absolute most popular item in the entire H&M catalogue! While it is mathematically true that users who bought our jumpsuit also bought these bestselling briefs, recommending the most popular item in the store isn't a great user experience as it rarely offers any personalisation.

  • β=0.5 (Moderate Penalty): As we introduce a moderate penalty, the blockbusters are successfully suppressed. The average rank drops to a much more niche 12,598. The algorithm has pushed highly relevant, structurally similar items to the top, such as the two 'Doris' jumpsuits (Rank 8,348 and 4,873). The model is now surfacing genuine discoveries.

  • β=1.0 (Maximum Penalty): At this extreme, we heavily punish any item with even a hint of popularity. The average rank plummets to nearly 20,000. The algorithm is now desperately hunting for the most obscure, long-tail connections possible, pulling up items tied at a rank of 21,896.

 

This proves that β is not just a mathematical quirk; it is a direct dial for novelty. We can tune this parameter to match specific business goals. If we want safe, guaranteed sellers, we keep β low. If we want to help customers discover hidden gems buried deep in the catalogue, crank β up!

Bonus: visualising item-item relationships with UMAP

In our previous examples, we used the rows of our item-item similarity matrix to understand how buying a specific product dictates what other products we should recommend. But we can also extract the columns of our matrix to understand which items are fundamentally composed of the same predictive patterns. Before we try this, we need to address a common dilemma when visualising asymmetric graph networks: to transpose or not to transpose?

Because UMAP expects data in the format of (n_samples, n_features), the orientation of our matrix completely changes what the visualisation represents:

  • Passing S_main (Rows): If we pass the matrix as-is, the features for each item are its outbound recommendations. Items will cluster together on the map if they tend to recommend the exact same products. You can think of this as clustering items by their successor profile.

  • Passing S_main.T (Columns / Transposed): If we transpose the matrix, the features for each item become its inbound precursors. Items will cluster together if they are frequently recommended by the exact same group of starting products. We can think of this as an item's compositional fingerprint.

 

Because the RP3Beta algorithm applies its aggressive β penalty to the destination items, these column vectors contain incredibly rich, nuanced, and popularity-penalised data. If we want to map out the immediate customer journey and visualise which items directly trigger one another, using the rows (S_main) is perfect. If our goal is to understand how the model broadly categorises items based on their shared underlying context and dependencies, transposing the matrix to use the columns (S_main.T) is a brilliant approach.

To truly understand the difference, we are going to do both! We will write a helper function to run UMAP on both the rows and the columns, and plot them side-by-side so we can compare the underlying physics of our graph.

umap rp3beta.png

Look at Plot 1 (Rows) on the left. Here, the catalogue looks like an archipelago of distinct, relatively isolated islands. This makes intuitive sense for outbound recommendations. Notice how the Baby/Children products (in red) shatter into several tight, highly self-contained sub-clusters. A customer looking at a toddler's t-shirt is overwhelmingly likely to be recommended another toddler's item, locking the random walk into those tight local neighbourhoods. Similarly, Menswear (in brown) forms its own distinct island on the middle-left, clearly separated from the massive Ladieswear/Divided (orange/green) continent at the bottom.

 

Looking at Plot 2 (Columns) on the right, the archipelago has smashed together into a single, massive, interconnected continent! Instead of neat, fragmented islands, we are presented with a dense, more unified blob. There are still some obvious groupings such as the red Baby/Children clusters or the brown Menswear group but the vast majority of the catalogue made up of Ladieswear (orange) and Divided (green) are heavily blended together throughout the centre and top of the plot.

Why does this happen? Because while a customer buying a dress might reliably be recommended another dress (creating the neat islands in Plot 1), the journey to finding that dress (Plot 2) is incredibly diverse. Customers cross-shop constantly. A user might buy socks from Menswear, a baby grow and a skirt and all of those seemingly unrelated items act as historical precursors pointing towards the same core Ladieswear items.

By visualising both orientations of our matrix, we can physically see the complex, cross-departmental reality of human behaviour. The paths leading to a purchase are messy, intertwined and highly non-linear, even if the immediate recommendations flowing out of a purchase are incredibly focused!

Summary 

In this tutorial, we've learnt about the RP3Beta algorithm by building it from the ground up. We started with the raw mathematics of a user-item bipartite graph, converting basic interaction counts into transition probabilities to simulate a 3-step random walk. By hard-coding our own SciPy implementation, we saw first-hand exactly how the standard P3Alpha algorithm naturally gravitates towards blockbuster items and how RP3Beta elegantly solves this by introducing the β popularity penalty. We then scaled our custom function up to handle a real-world H&M fashion catalogue containing millions of potential connections and generated customer recommendations and identified similar items. We also saw how  tuning the β dial directly controls the novelty and diversity of our recommendations.

Finally, we went beyond simply generating recommendations. By carefully extracting the rows and columns of our asymmetric similarity matrix, we isolated the "compositional fingerprints" of our items and visualised them using UMAP. This revealed a fascinating, interconnected map of cross-shopping behaviour, proving that our graph walk successfully learned the complex nuances of the catalogue without needing a single piece of metadata.

Where to go next? Building algorithms from scratch in pure Python and SciPy is arguably the best way to deeply understand the mathematics behind them. However, when it comes to deploying these models in a live production environment, we need raw speed, multi-threading and extreme memory efficiency. For this, check out the Item and User KNN Recommender tutorial. It covers how to access highly optimised, Cython-backed implementations of RP3Beta (alongside a whole suite of other classic and state-of-the-art similarity measures) in the similaripy package and teaches you how to systematically tune these hyperparameters using Optuna!

What is it
Similarities
UMAP
tuning beta
Few lines of scipy
Summary
Generating recs
step by step
Copyright © Step by step data science. All Rights Reserved
bottom of page