OkCupid prides itself on our data analyses. While we are famous for sharing insights on how people date, the processes we have in place that lead to these insights transcend the dating category and we hope you will be able to apply them to your own work! In this post, we will explore how novelty effects and null hypotheses are measured in our data and impact our decision making process.
The Novelty Effect
We are constantly running experiments to improve the experience for our members, however, no two members are the same. We don’t adhere to a one size fits all approach - especially when it comes to love and connecting on a deeper level. Our app is unique in that love requires reciprocity and mutual interaction. Because of the nature of these codependent connections, we have to invent new approaches for analyzing some of our experiments. Even still, when we run experiments, we’ve seen some unique experiences where seeing no detectable data yields compelling results.
Let’s take a recent UI change for example: we recently updated the colors of our rate card.
The design was more modern and on the surface, seemed to be performing at parity with our old design. So, ship it... right?! Actually, NO!
As a rule, we typically analyze all updates along old/new members as well. This helps us evaluate novelty effect (wiki) (when new changes in product cause a spike in behavior due to “newness” rather than usefulness). When we broke down the design changes by new and old members, we actually found that the appeal of the new design was driving significant lift in purchases amongst existing members. The story was different for new members; they were purchasing far less - bringing the overall results down! At the surface, it looked like we could ship the new design, but with an easy cut of the data, we learned that novelty effect was falsely inflating results.
With some quick pivoting, we shipped a second version with a new color scheme. It took only a few hours of dev time and after restarting the experiment by excluding those exposed from the first run (this is important to prevent contaminating our data set with biased users), we drove significant lift in our KPIs for both new and existing members - proving that the second design was better for the product overall.
Disproving the Null Hypothesis
Another time the data can seem a bit backwards is for “defensive” experiments. For some experiments we hope to see no significant changes in our KPIs between control and test groups. For example, to eliminate tech debt, we rewrote one of our web pages to pull information via GraphQL. Of course we QA’ed before launching, and wrote extensive unit tests to verify performance, but we don’t truly know how it will perform at scale across different browsers and devices. At OkCupid, we can use our experiments system for that!
We call this sort of experiment “defensive”. Like all experiments, we want to collect data to disprove the null hypothesis. Since we hope there will be no change between the new GraphQL page and our legacy page, in this case, we actually hope we are unable to disprove the Null Hypothesis.
How do we do this? We need to perform a power analysis to determine the amount of data we’d need to consistently disprove the null hypothesis within a minimum detectable effect size at a threshold we choose. This means we don’t want results between the two groups to be statistically significant once they pass that chosen threshold - and we can’t call the experiment until we reach that power level.
In this case, the experiment saw minimal shifts in behavior across the two groups, proven to be statistically insignificant by comparing them to power, which is to say, it was a success! Sometimes no positive data is good news. Now we feel secure in how messages on OkCupid populate with data from GraphQL.
Whether it’s disproving novelty effect or the null hypothesis, OkCupid has a robust experiment system that can be used in many ways to test and measure the impact our coders have on our members. We are always iterating to make it easier for our members to find love. And while we have a clear vision of how we do that, sometimes the data won’t tell us the whole story on it’s own.