Tom Jacques

How will you swipe next? The results will be surprising.

Desirability is a crucial factor in dating, and at OkCupid (we're hiring!), we work hard to analyze the data we have and improve the quality of the matches that you see.

We've previously written on the topic of how race and ethnicity affect the messages you get. Today we'll be revisiting the overall situation of what affects your incoming (incoming_swipe_rate) and outgoing (outgoing_swipe_rate – the rate at which you right swipe on the people you see) swipe rates in DoubleTake, and how we're using that data to improve the matching system. Your incoming_swipe_rate is the rate at which others right swipe on or like you, and outgoing_swipe_rate is the rate at which you right swipe on others.

First of all let's address that incoming_swipe_rate and desirability aren't exactly the same thing. Your incoming_swipe_rate is roughly what percent of the time you are right swiped on when you show up in another member's DoubleTake queue, which will depend on how that member perceives your photos, your profile essays, distance, match percent and any other criteria DoubleTake surfaces. Whereas your actual desirability in real life probably depends on related but different criteria. We're taking a leap of faith that they're similar enough to make sense, but from the combination of user testing, how well it predicts messages received, and other proxies, we feel it's strong enough of a correlation to be more of a hop.

Next let's address whether incoming and outgoing swipe rates are actually important for determining whom we show. There's an old joke where your friend is trying to set you up on a blind date, and when you ask how the date looks, your friend responds that they have a "great personality". It's a joke, but there's a kernel of truth to it - so let's dig it up. On OkCupid we looked at the distributions of incoming_swipe_rates of members, broken down by gentation (gender x orientation).

Taking a page out of Bayesian statistics, we can normalize the data by using empirical Bayesian shrinkage towards a prior. We can then create a cumulative distribution function to obtain the percentile from the normalized rates.

What you can see here is the observed probability of a right swipe occurring based on the swiper's outgoing swipe percentile, and the swipee's incoming swipe percentile for the user data of straight male swipers. Percentiles were used here to spread out the graph and it make visually easier to interpret:

Swipe Probability

The probability heatmaps shows a very clear relationship between these metrics. The combination of the two looks to be highly indicative of the probability of a right swipe. Ultimately what we want to do is promote successful matches, so we want to show you people above your bar that are most likely to swipe you back.

Now that we have the data to back up the importance of these metrics, we want to incorporate them into our live search functions to improve the quality of matches in DoubleTake - the fun stuff!

Our matching infrastructure is a cluster of nodes that hold relevant user data in memory in a sharded replicated cluster. What we want to do is have each member's incoming_swipe_rate and outgoing_swipe_rate updated in real time across the cluster. We can do this by starting up with the initial data needed to compute the scores (the swipes on and by each user), and subscribing to a streaming channel that updates with user swipes. One way of doing this is simply holding onto a list of recent swipes. We can cap the number we hold onto in our sample to 1000 to give us both predictable memory usage and a good set of recent swiping activity, according to the median number of swipes per day. The astute observer may be asking: If you're trying to be memory efficient why keep a sample instead of just the total counts? That's a good question — we can keep a count of the total votes, up to a given number to count towards the score (1000) and apply weight 1 / num_votes to the next swipe and (num_votes - 1) / num_votes weight to the prior score. This is a pretty good compromise for space efficiency at the loss of keeping additional information, for example what times those swipes happened. We happen to also use the timestamp data for other purposes, so using the sample is convenient, but both methods have merit.

We can also factor in the outgoing_swipe_rate of the member that made the swipe. For example, if there is an indiscriminate swiper that has an outgoing_swipe_rate of 1.0, we've gained precisely no information from that member swiping on you. It doesn't matter whom we had shown, the result would always be a right swipe. The same is true of receiving a left-swipe from someone so selective that they always pass. Given the probability that this particular swipe outcome occurred P_swipe we can give a weight of W_swipe = (1 - P_swipe) / num_votes to the swipe, and 1 - W_swipe to the prior score. In practice, adding this doesn't significantly change the results, so for simplicity we don't include it.

Now that we have these numbers, how can we use them? Ultimately as a user of OkCupid, we want to get you out going on dates with other members, but it all starts with a connection. What we want to do in DoubleTake is maximize the chance that we show you someone you'll connect with meaningfully. In order to connect through DoubleTake, you have to select your match, and they must also select you. Hopefully now the picture is starting to come together – we can approximate the likelihood that one user will swipe on another by looking at the outgoing_swipe_rate of the swiper and the incoming_swipe_rate of the swipee. We can also approximate the likelihood that you get a swipe back, and how meaningful that is. A straightforward way to calculate that is as follows:

# This is just an example of how this could be calculated
# In reality we would likely want to use another metric
# such as right swipe to conversation probability
def meaningfulness_of_swipe(swiper, epsilon):  
  # minimum meaningfulness of epsilon
  # remaining factor multiplied by (1-epsilon) to normalize range to [epsilon, 1]
  return epsilon + (1.0 - epsilon) * (1.0 - swiper.outgoing_swipe_rate)
def meaningful_return_swipe_value(swiper, swipee, epsilon):  
  p_return_swipe = (p_will_swipe_on(swipee, swiper)
    if swiper not in swipee.right_swipes else 1.0)
  meaningfulness = meaningfulness_of_swipe(swipee, epsilon)

  return p_return_swipe * meaningfulness

So how can we come up with p_will_swipe_on? Taking a look at what variables we have to play with, we know the incoming_swipe_rate outgoing_swipe_rate, and the number of swipes for each member. Two promising places to start would be a Logistic Regression, and an average or weighted average of outgoing_swipe_rate from the swiper, and incoming_swipe_rate from the swipee.

For the logistic regression, we can look to our good friend sklearn.linear_model.LogisticRegression and train it using some sample data using a combination of the computed scores, and interaction terms between them. We can train it using individual impressions and outcomes, and have it predict a probability given the input variables we used to train it.

The output of the logistic regression trained on swiper.outgoing_swipe_rate swipee.incoming_swipe_rate can be visualized much like the first heatmap:

Logistic Regression Visualization

The averaged swipe rate is trivial to compute:

def averaged_right_swipe(swiper, swipee):  
  p_swipe = (
    (
       swiper.outgoing_swipe_rate
      +swipee.incoming_swipe_rate
    ) /(2.0)
  )
  return p_swipe

The weighted average can be computed in a similarly straightforward manner:

def weighted_average_right_swipe(swiper, swipee):  
  p_swipe = (
    (
       (swiper.outgoing_swipe_rate * swiper.num_outgoing_swipes)
      +(swipee.incoming_swipe_rate * swipee.num_incoming_swipes)
    ) /(swiper.num_outgoing_swipes + swipee.num_incoming_swipes)
  )
  return p_swipe

We can compare the scores of these functions to the observed rate by binning the scores and looking at the mean of the outcomes of the results in that bin.

sigmoids

Perhaps somewhat unintuitively, what we see here is a sigmoid function rather than a straight line. This is a welcome sight, seeing a sigmoid means that we've gained information because we've produced a way to discriminate the result. The extreme case of this is to see a sideways tetris z-block _|‾, which means that the function perfectly translates at the vertical threshold into a binary event. We can fit a sigmoid to the data, and then use this sigmoid transform on the averages to get a pretty nice predictor for our p_will_swipe_on function.

Ok, so now that we have some predictor functions, which one actually performs better? First of all, how do we even evaluate that? Our model outputs a probability that an event will occur, but the event itself is discrete, it either happens or doesn't happen. One way to evaluate this is called surprise and is calculated as log(1/p), which is the same as -log(p). We can use the mean surprise to evaluate how good we are at guessing the outcomes:

  • A perfect predictor is one which believed the probability of the actual outcome was 1.0, and would have a surprise of 0.0
  • A pessimal predictor is one which believed the probability of the actual outcome was 0.0, and would have a surprise of Infinity
  • A fair coin toss predictor is one which believed the probability of the actual outcome was 0.5, and would have a surprise of -log(0.5) = 1
  • A a weighted coin toss predictor with probability p of flipping heads would have a surprise of p * -log(p) + (1-p) * -log(1-p)

We can visualize how well our predictors are performing by comparing their mean surprise to the weighted coin predictor as a baseline. If a coin is weighted heads with p < 0.5 it's equivalent to being weighted tails with 1-p, so we only need to look at 0.5 <= p <= 1. Note that p = 0.5 is equivalent to the fair coin predictor, and p = 1.0 is equivalent to a perfect predictor.

Results

There are two interesting things about this result. First is that all of our predictors are outperforming the weighted coin predictor baseline. Second is that both our Logistic Regression and our Sigmoid Transformed Averaged Swipe Rate predictors have effectively the same mean surprise. What a Logistic Regression tries to do is maximize the probability that a random data point in the set is classified correctly, it's in effect trying to minimize the surprise. Interestingly what our average of the incoming and outgoing swipe rates does is effectively the same thing – it tries to minimize the surprise of each individual swipe result by averaging the two rates, in effect maximizing the probability that swipe event will be classified correctly.

In any case, we were able to construct predictors that reduced surprise to 54.4% of the baseline, the rough equivalent of a weighted coin with p = 0.89, not bad for a relatively simple model!

Conclusion

Our hypothesis is that we can increase the number of mutual matches by showing you matches you are likely to select, who are also likely to select you (wow what a crazy idea!). We estimate these likelihoods by using each potential match pair's incoming_swipe_rate and outgoing_swipe_rate, which we update in realtime by subscribing to swiping events in our matching cluster. We'll let you know if our hypothesis was right when we get the results back from our A/B test.

If working on problems like these interest you, then you should definitely take a look at our openings.

Written by Tom Jacques, VP of Engineering, OkCupid
Data analysis by Tom Jacques, Brendon Scheinman, Zach Jablons, Brenton McMenamin