ML Model Consumption Dilemmas: Part 1
Tuning Partner Preferences
At times, pragmatically consuming ML models gets more challenging than the training process itself. In this series, we will discuss some of the unique problems we encountered in consuming our models and how we tackled those problems at Shaadi.com.
In this part, we would look at a classic tuning issue on our pool (candidate) generation system. Our System is not exactly a candidate generation system but one which helps decide the parameters for candidate generation. In the traditional sense, a candidate generation system is the precursor to the sorting algorithm in a recommendation engine. Basically, our system (Partner Preference Generator -PPG) which is a user-to-user recommender, sets the criteria or partner preferences (PP) for the user, based on which their ‘match-pool’ is generated.
PPG is a bunch of models which narrows down user preferences across multiple attributes based on collaborative filtering. Here, the predictions are purely based on similar prior user’s behavior. Most candidate generation systems work on both collaborative filtering and content-based filtering, which is essentially finding similar items based on users actual preferences. Although we don’t have a content-based filtering system yet, in the future we might build one, which would alter the PP every time the user expresses an Interest, out of his PP bounds.
Unlike most domains we don’t have a cold start problem since we have adequate information about the user right from the beginning to infer their PP (i.e., mandatory input fields during registration). Our problem is in deciding the right bounds for the PP and it is unique to us. If the bounds are too relaxed, we end up with irrelevant matches in their pool; too much restriction, on the other hand, leads to lower matchpool and eventually, drop-off.
The problem is multi-fold; here is the break-down:
- How do we decide the initial bounds for each of the attributes?
- In the event of low-matchpool, how do we broaden?
- Which attributes to consider?
- In what sequence?
- By what margin do we broaden?
- Similarly, in the event of very high-matchpool how do we restrict the PP?
In the Beginning
Since the time we started using ML based models to generate PP, our quest has been to scientifically find a solution to the bounding problem. With the earliest version, we used a mix of intuition and multiple rounds of simulations to arrive at hard-coded thresholds. Basically, we applied individual thresholds on each of the attributes to arrive at the initial PP and then broaden or shrink from there, based on the matchpool size. We used varying degrees of thresholds to broaden or shrink the PP but these thresholds were applied simultaneously across all the attributes. Although we used varying thresholds based on attribute importance, we failed to find the right sequence, or determine if broadening/shrinking was actually needed for a given attribute.
Secondly, hard-coded thresholds meant every user was treated in a similar way. We failed to realize that attribute importance could vary from person to person (or persona). For instance an NRI female might want to relax her community or age first and then location, whereas community could be crucial for some other persona and they might not want it to be relaxed.
We addressed these issues by deploying a trade-off optimizer over the models instead of the hard-coded thresholds. The trade-off optimizer consumes scores for complimentary metrics (related to relevance vs coverage) from multiple models for a given attribute and strikes a balance between them. We use harmonic mean to arrive at the optimum point.
With this approach, every user gets a personalized PP based on the trade-off optimizer scores. On getting a low matchpool, we simply relax the relevance thereby increasing coverage. The only problem is that the relaxation happened across the attribute spectrum and were not able to prioritize one over the other.
To address the above issue, we introduced Gini Index. Gini Index is used by economists among other things to identify the degree of income inequality within countries. It’s a measure of dispersion of a distribution.
We use a modified version of Gini index to find out how liberal/restrictive our users are with respect to various attributes. Lorentz curve is plotted for each of the attribute’s class coverage from a large sample set and we measure the dispersion or deviation of the curve from the line of perfect equality. The idea is, the more spread out the ‘expression of interest’ sent across the classes of a given attribute, the more liberal they are with that attribute.
Below are the Gini Coefficients based on interest coverages:
To further check whether any of the attributes is highly restrictive and hence shouldn’t be broadened, we looked at the median of ‘numbers of classes’ to which interest is sent. As you can see from the above table, Mother Tongue seems to be highly restrictive, the Gini coefficient being one of the highest and the median number of classes at 1. Which means, over 50% of the users express interest only to one mother tongue. In contrast, Annual Income is quite different, meaning the users are more liberal with it.
The above indicators are global and applies to almost 80% of our users; we can further break it down for various cohorts and also look for personas. This gives us a sequence for PP relaxation and also an estimate for the extent of relaxation required; however, our problem is not solved completely. Although we have a template that can cover 80% of the users, we don’t want to lose out on the remaining 20%. Slicing down into personas could be a solution but there are high chances we miss out on a few segments.
Refining it further and putting it all together, we have Adaptive PPG, a concept that we have been toying with for quite some time now. With Adaptive PPG, we wanted to do away with all the hard constraints over PPG and make it fluid. So every user gets a personalized PP which is not cut through any threshold or template. This was initially envisioned through a model over PPG but with Gini index it seems we don’t have to build a model.
We realized that we don’t have to apply Gini index as a template but as a function on every user’s coverage distributions, returned by the models. This helps in prioritizing the relevant attributes over others at an individual level. Maybe for some user, location is more important than the other attributes and the function would help identify that.
With this, our Partner Preference Generator is personalized at every level, right from identifying the correct bounds for PP, to prioritizing attributes for broadening.