Vespa vs. Elasticsearch for matching millions of people

Published in

OkCupid Tech Blog

19 min readSep 29, 2020

As a dating application, an integral part of the experience is in recommending to you great potential matches based on the myriad of preferences you and your potential matches have set. As you can imagine, there are many incentives to optimize this part of the experience as it is the very first step everyone starts at before getting to a match, a conversation, and beyond.

Your set preferences, however, aren’t the only factors in how we recommend to you potential matches (or in recommending other potential matches to you). If we had simply shown all the users that met your criteria without any sort of ranking, the end result would be way less matches. For example, if we didn’t try to incorporate a user’s recent activity into the results, there would be a much higher chance that you spend more of your time interacting with someone who hasn’t used the app recently. That certainly doesn’t set users up for success! Beyond simply the preferences you and others set, we leverage numerous algorithms and factors to recommend the users that we think you should see.

When serving recommendations we need to serve the best results at that point in time and allow you to continuously see more recommendations as you like or pass on your potential matches. In other apps where the content itself may not be changing often or such timeliness is less critical, this could be done through offline systems, regenerating those recommendations every so often. For example, when using Spotify’s “Discover Weekly” feature you can enjoy a set of recommended tracks but that set is frozen until the next week. In the case of OkCupid, we allow users to endlessly view their recommendations in real time. The “content” that we recommend — our users — are highly dynamic in nature (e.g. a user can join, change their preferences, profile details, location, deactivate at any time, etc.) and can change to whom and how they should be recommended, so we want to make sure that the potential matches you see are some of the best recommendations you can see at that point in time.

To tap into the various ranking algorithms while being able to continuously serve recommendations in real-time, we need to make use of a search engine that is constantly kept up-to-date with user data and provides the capability to filter and rank potential candidates.

What problems the existing matching system has

OkCupid has been utilizing a custom in-house matching system for years. We won’t go into full detail on that matching system but at a high level, imagine a map-reduce framework over shards of the user space with each shard containing in-memory some portion of relevant user data that is used in processing various filters and sorts on-the-fly. Searches fan out to all shards and ultimately the results are merged to return the top k candidates. This custom-built matching system has served the team well, so why did we decide to change this system now?

To support various recommendations-based projects over the coming years as the team grows, we knew we needed to revamp this system. One of the biggest pain points was in development as schema updates like adding a new piece of data about a user (e.g. a user’s preferred gender tags) required hundreds to thousands of lines of boilerplate code and deployment required careful coordination to ensure all parts of the system were deployed in the right order. Simply trying to add a new way to filter the user set or to add a new way to rank results required half a day of an engineer’s time to manually deploy to every shard in production and remain apprised of issues that might come up; rollbacks weren’t much faster. More importantly, it was becoming difficult to operate and scale the system since shards and replicas were manually allocated and distributed across a fleet of bare metal machines. Early in 2019 as load on the match system increased, we needed to increase search capacity so we added another replica set by manually placing service instances across multiple machines — a multi-week effort between the backend and operations teams. At this time we also started to notice performance bottlenecks in the in-house built service discovery system, message queue, etc. While these components had previously served the company well, we were reaching a point in load at which we were uncertain whether any one of these subsystems themselves could scale. We have goals to move more of our workload into a cloud environment and shifting the matching system, itself a laborious task, would also require bringing along all of these other subsystem components.

Today at OkCupid many of these subsystems are served by more robust OSS cloud-friendly options and the team has over the last two years adopted various different technologies to great success. We won’t talk about those efforts in this blog post but instead focus on the efforts we’ve taken to address the issues above en-masse by moving to a more developer-friendly and scalable search engine for our recommendations: Vespa.

It’s a match! Why OkCupid matched with Vespa

Historically OkCupid has been a small team and we knew early on that tackling the core of a search engine would be extremely difficult and complicated so we looked at open source options that we could support our use cases with. The two big contenders were Elasticsearch and Vespa.

Elasticsearch

This is a popular option with a large community, documentation, and support. There are numerous features and it’s even used by Tinder. In terms of development experience, one can add new schema fields with PUT mappings, queries can be done through structured REST calls, there is some support for query-time ranking, the ability to write custom plugins, etc. When it comes to scaling and maintenance, one only needs to determine the number of shards and the system handles distribution of replicas for you. Scaling requires rebuilding another index with higher shard counts.

One of the biggest reasons why we opted out of Elasticsearch was the lack of true in-memory partial updates. This is very important for our use case because the documents we would be indexing, our users, would need to be updated very frequently through liking/passing, messaging, etc. These documents are highly dynamic in nature, compared to content like ads or images which are mostly static objects with attributes that change infrequently, so the inefficient read-write cycles on updates were a major performance concern for us.

Vespa

This was open sourced only a few years ago and claimed to support storing, searching, ranking, and organizing big data at user serving time. Vespa supports

high feed performance through true in-memory partial updates without the need to re-index the entire document (reportedly up to 40–50k updates per second per node)
provides a flexible ranking framework allowing processing at query time
directly supports integration with machine-learning models (e.g. TensorFlow) in ranking
queries can be done through expressive YQL (Yahoo Query Language) in REST calls
the ability to customize logic via Java components

When it comes to scaling and maintenance, you never think about shards anymore — you configure the layout of your content nodes and Vespa automatically handles splitting your document set into buckets, replicating, and distributing the data. Furthermore, data is automatically recovered and redistributed from replicas whenever you add or remove nodes. Scaling simply means updating the configuration to add nodes and letting Vespa automatically redistribute this data live.

Overall Vespa appeared to support our use cases the best. OkCupid incorporates a lot of different information about users to help them find the best matches — in terms of just filters and sorts there are over 100 of each! We’ll always be adding more filters and sorts, so being able to support that workflow was important. When it came to writes and queries, Vespa was the most analogous to our existing matching system; that is, our matching system also required handling rapid in-memory partial updates and real-time processing at query time for ranking. Vespa also had a much more flexible and straightforward ranking framework; the ability to express queries in YQL as opposed to the awkward structure for Elasticsearch queries was just another nice bonus. When it came to scaling and maintenance, Vespa’s automatic data distribution capabilities were highly appealing to our relatively small team size. All in all it appeared that Vespa would provide us a better shot at supporting our use cases and performance requirements, while being easier to maintain when compared to Elasticsearch.

Elasticsearch is more widely known, and we could learn from Tinder’s usage of it, but either option would require a ton of upfront research and investigation. Vespa has been serving many production use cases, like Zedge, Flickr serving billions of images, and Yahoo Gemini Ads platform with more than one hundred thousand requests per second to serve ads to 1 billion monthly active users. That gave us confidence that it was a battle-tested, performant, and reliable option — in fact, the origins of Vespa have been around for longer than Elasticsearch.

Additionally the Vespa team has been very involved and helpful. Vespa was originally built to serve ads and content pages and as far as we know it has not yet been used for a dating platform. Our initial usage of Vespa struggled because it was such a unique use case, but the Vespa team has been super responsive and quickly optimized the system to help us handle the few issues that came up.

How Vespa works and what a search looks like at OkCupid

Before we dive into our Vespa use case, here’s a quick overview about how Vespa works. Vespa is a collection of numerous services but each Docker container can be configured to fulfill the role of an admin/config node, a stateless Java container node, and/or a stateful C++ content node. An application package containing configuration, components, ML models, etc. can be deployed via the State API to the config cluster, which handles applying changes to the container and content cluster. Feed requests and queries all go through the stateless Java container (which allows customized processing) via HTTP, before feed updates land in the content cluster or queries fan out to the content layer where the distributed query executions happen. For the most part, deploying a new application package takes only a few seconds and Vespa handles making those changes live in the container and content cluster so that you rarely have to restart anything.

What does a search look like?

The documents that we maintain in the Vespa cluster contain a myriad of attributes about a given user. The schema definition defines the fields of a document type as well as rank profiles that contain a collection of applicable ranking expressions. Suppose we have a schema definition representing a user like so:

search user {

    document user {

        field userId type long {
            indexing: summary | attribute
            attribute: fast-search
            rank: filter
        }

        field latLong type position {
            indexing: attribute
        }

        # UNIX timestamp
        field lastOnline type long {
            indexing: attribute
            attribute: fast-search
        }

        # Contains the users that this user document has liked
        # and the corresponding weights are UNIX timestamps when that like happened 
        field likedUserSet type weightedset<long> {
            indexing: attribute
            attribute: fast-search
        }
        
   }

    rank-profile myRankProfile inherits default {
        rank-properties {
            query(lastOnlineWeight): 0
            query(incomingLikeWeight): 0
        }

        function lastOnlineScore() {
            expression: query(lastOnlineWeight) * freshness(lastOnline)
        }

        function incomingLikeTimestamp() {
            expression: rawScore(likedUserSet)
        }

        function hasLikedMe() {
            expression:  if (incomingLikeTimestamp > 0, 1, 0)
        } 

        function incomingLikeScore() {
            expression: query(incomingLikeWeight) * hasLikedMe
        }

        first-phase {
            expression {
                lastOnlineScore + incomingLikeScore
            }
        }

        summary-features {
            lastOnlineScore incomingLikeScore
        }
    }
    
}

The indexing: attribute designation indicates that these fields should be maintained in-memory to allow us to get the best write and read performance on these fields.

Suppose we populated the cluster with such user documents. We could then do a search filtering and ranking on any of the fields above. For example, we could make a POST request to the default search handler http://localhost:8080/search/ to find the users, except for our own user 777, within 50 miles from our location, that have been online since the timestamp 1592486978, ranked by most recent activity, and keeping the top two candidates. Let's also select the summaryfeatures to help us see the contributions of each ranking expression that we have in our rank profile:

{
    "yql": "select userId, summaryfeatures from user where lastOnline > 1592486978 and !(userId contains \"777\") limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}

We might get a result like:

{
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 317
        },
        "coverage": {
            "coverage": 100,
            "documents": 958,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:user/0/bde9bd654f1d5ae17fd9abc3",
                "relevance": 48.99315843621399,
                "source": "user",
                "fields": {
                    "userId": -5800469520557156329,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.99315843621399,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            },
            {
                "id": "index:user/0/e8aa37df0832905c3fa1dbbd",
                "relevance": 48.99041280864198,
                "source": "user",
                "fields": {
                    "userId": 6888497210242094612,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.99041280864198,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            }
        ]
    }
}

After filtering on the matched hits the first-phase ranking expressions are evaluated to rank the hits. The relevance returned is the overall score as a result of all the first-phase ranking functions in the rank-profile we've specified in our query, i.e. ranking.profile myRankProfile. In the list of ranking.features we've specified a feature query(lastOnlineWeight) of 50, which is then referenced in the only ranking expression we use: lastOnlineScore. That utilizes a built-in rank feature freshness that is a number close to 1 if the timestamp in the attribute is recent compared to the current timestamp. So far so good, nothing too tricky.

Unlike static content, this content can influence whether they should be seen by you. For example, they can like you! We could index a weighted set likedUserSet field on each user document that holds as keys the userids that they have liked and as values the timestamp that the like happened. It would then be straightforward to filter on those that have liked you (e.g. adding a likedUserSet contains \"777\" clause to the YQL), but how can we incorporate that weighted set information during ranking? How might we boost a user that has liked our user?

In the previous results the ranking expression incomingLikeScore was 0 for both of these hits. User 6888497210242094612 has actually liked user 777, but this isn't currently accessible in ranking, even if we had provided "query(incomingLikeWeight)": 50. We can make use of a rank function in the YQL (the first, and only the first, argument of the rank() function determines whether a document is a match, but all arguments are used for calculating rank score) and then use a dotProduct in our YQL ranking clause to store and retrieve raw scores (in this case, the timestamp when the user liked us) like so:

{
    "yql": "select userId,summaryfeatures from user where !(userId contains \"777\") and rank(lastOnline > 1592486978, dotProduct(likedUserSet, {\"777\":1})) limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50",
            "query(incomingLikeWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}{
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 317
        },
        "coverage": {
            "coverage": 100,
            "documents": 958,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:user/0/e8aa37df0832905c3fa1dbbd",
                "relevance": 98.97595807613169,
                "source": "user",
                "fields": {
                    "userId": 6888497210242094612,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 50.0,
                        "rankingExpression(lastOnlineScore)": 48.97595807613169,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            },
            {
                "id": "index:user/0/bde9bd654f1d5ae17fd9abc3",
                "relevance": 48.9787037037037,
                "source": "user",
                "fields": {
                    "userId": -5800469520557156329,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.9787037037037,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            }
        ]
    }
}

Now user 6888497210242094612 has been boosted to the top as they've liked our user and their incomingLikeScore nets the full value. Of course, we actually have the timestamp when they liked us so we could utilize it in more sophisticated expressions, but we'll keep it simple for now.

This demonstrates the mechanics of how to filter and rank results via the ranking framework. The ranking framework provides a flexible way to apply ranking expressions (which are mostly just math) on hits at query-time.

Customizing middleware Java layer

What if we wanted to support a different path and to make that dotProduct clause implicitly part of every query? That’s where the customizable Java container layer comes in — we can write a custom Searcher component. This lets us handle arbitrary parameters, rewrite the query, and process the results in certain ways. Here’s an example in Kotlin:

@After(PhaseNames.TRANSFORMED_QUERY)
class MatchSearcher : Searcher() {

    companion object {
        // HTTP query parameter
        val USERID_QUERY_PARAM = "userid"

        val ATTRIBUTE_FIELD_LIKED_USER_SET = “likedUserSet”
    }

    override fun search(query: Query, execution: Execution): Result {
        val userId = query.properties().getString(USERID_QUERY_PARAM)?.toLong()

        // Add the dotProduct clause
        If (userId != null) {
            val rankItem = query.model.queryTree.getRankItem()
            val likedUserSetClause = DotProductItem(ATTRIBUTE_FIELD_LIKED_USER_SET)
            likedUserSetClause.addToken(userId, 1)
            rankItem.addItem(likedUserSetClause)        
       }

        // Execute the query
        query.trace("YQL after is: ${query.yqlRepresentation()}", 2)
        return  execution.search(query)
    }
}

Then in our services.xml file we can configure this component as follows:

...       
         <search>
            <chain id="default" inherits="vespa">
                <searcher id="com.okcupid.match.MatchSearcher" bundle="match-searcher"/>
            </chain>
        </search>
        <handler id="default" bundle="match-searcher">
            <binding>http://*:8080/match</binding>
        </handler>
...

We then simply build and deploy the application package and now when we make a query to the custom handler http://localhost:8080/match?userid=777:

{
    "yql": "select userId,summaryfeatures from user where !(userId contains \"777\") and rank(lastOnline > 1592486978) limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50",
            "query(incomingLikeWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}

We get back the same results as before! Note that in the Kotlin code example, we added a trace to print out the YQL representation after we modified so if we set tracelevel=2 in the URL params, the response also shows:

...
                    {
                        "message": "YQL after is: select userId, summaryfeatures from user where ((rank(lastOnline > 1592486978, dotProduct(likedUserSet, {\"777\": 1})) AND !(userId contains \"777\") limit 2;"
                    },
...

The middleware Java container layer is a powerful way to add custom logic handling via Searchers or to customize rendering of results via Renderers. We customize our Searcher to handle cases like the above and other aspects that we want to make implicit in our searches. For example, one product concept we support is the idea of “mutual fit” — you may be searching for users with certain preference criteria (like age range and distance), but you also have to fit the candidates’ search criteria. To support such a use case in our Searcher component we might fetch the searching user’s document to supply some of their attributes in a subsequent fan-out query for filtering and ranking. The ranking framework and the custom middleware layer together provide a flexible way for us to support our many use cases. We’ve only covered a few aspects in these examples but there is extensive documentation available here.

How it went building out and productionizing a Vespa cluster

In the spring of 2019 we started hashing out plans to build out this new system. At this time we also reached out to the Vespa team and regularly consulted them on our use cases. Our operations team estimated and built out an initial cluster setup and the backend team began documenting, designing, and prototyping various use cases in Vespa.

Early prototyping phases

At OkCupid the backend systems are written in Golang and C++. In order to write custom logical components in Vespa as well as to ensure high feed feed rates by using the Java Vespa HTTP feed client API, we had to get somewhat familiar with a JVM environment — we ended up utilizing Kotlin in customizing Vespa components and in our feeding pipelines.

From there, it was lots of porting over years of application logic and uncovering what was possible in Vespa, consulting the Vespa team as necessary. Much of our matching system logic is in C++ so we also added logic to translate our current data model of filters and sorts into the equivalent YQL queries that we issue via REST to the Vespa cluster. Early on we also made sure to build out a good pipeline for repopulating the cluster with the full user base of documents; prototyping would involve many changes to determine the right field types to utilize and inadvertently require refeeding documents.

Monitoring and Load Testing

As we built out our Vespa search cluster, we needed to make sure of two things: that it could handle anticipated search and write traffic and that the recommendations served by this system were comparable in quality to the existing matching system.

Before load tests, we added Prometheus metrics everywhere. Vespa-exporter provides a ton of stats and Vespa itself also exposes a small set of additional metrics. From this we created various Grafana dashboards around queries per second, latencies, resource usage by Vespa processes, etc. We also ran vespa-fbench to test out query performance and with the help of the Vespa team determined that due to relatively high static query cost that a grouped layout would provide us higher throughput. In a flat layout, adding more nodes would mainly only cut down on the dynamic query cost (i.e. the portion of the query that depends on the number of documents indexed). A grouped layout means that each configured group of nodes would contain the full document set and thus a single group could serve a query. Due to our high static query cost, while keeping our node count the same we increased our throughput much more by increasing the number of groups from a flat layout of effectively one group to three groups. Lastly, we also performed live shadow traffic testing after we had gained confidence in the static benchmarks.

Performance optimization

One of the biggest hurdles we faced early on, however, was in feed performance. Early on we had trouble handling even 1,000 QPS of updates. We had heavily utilized weighted set fields but these weren’t performant at first. Luckily the Vespa team promptly helped fix these issues as well as others around data distribution. Since then the Vespa team has also added extensive documentation on feed sizing, many of which we employ to some extent: integer fields in large weighted sets when possible, allow batching by setting visibility-delay, utilizing few conditional updates and having those rely on attribute (i.e. in-memory) fields, and reducing client round trips by compacting and merging operations in our feed pipelines. Now the pipelines comfortably handle 3K QPS at steady state and our modest cluster has been observed to handle 11k QPS updates when there is a backlog of operations for whatever reason.

Recommendation quality

After we were confident that the cluster could handle the load, we needed to validate that the quality of the recommendations were just as good, if not better than the existing system. It wasn’t possible to perfectly replicate all existing behavior, yet any minor deviation in how the ranking was implemented would have outsized effects on the general quality of the recommendations and the overall ecosystem in general. For this we applied our experiment system, where some test groups got their recommendations through Vespa while the control group continued to utilize the existing matching system. We analyzed several defensive business metrics, reiterating and fixing issues until results from the Vespa group were observed to be as good as, if not better than, results in the control group. Once we were confident in the results served by Vespa, we simply had to route recommendation queries to the Vespa cluster. We were able to swap all search traffic to the Vespa cluster without a hitch!

System Diagram

In the end, a simplified architecture overview of the new system looks like this:

How Vespa is doing now and what’s next

Let’s compare the state of the matching system now backed by Vespa compared to our legacy system:

Schema Updates

Before: a calendar week spent on hundreds of lines of code changes, and a carefully coordinated deployment with multiple subsystems
After: a couple hours to add a simple field to the schema definition, and deploy the application package

Adding a new sort

Before: half a day spent on deployment
After: ranking expressions are also an update to the schema definition and can be deployed to the live system. That means it only takes a few seconds to take effect!

Scaling and maintenance

Before: multi-week effort to manually distribute shards and placement of production service run files to achieve high availability
After: just add the new node to the configuration file and Vespa automatically distributes data to meet desired redundancy levels. The bulk of our operations don’t require any manual intervention or restarting of any stateful nodes

Overall the development and maintenance aspect of the Vespa cluster has been a boon for much of OkCupid’s product roadmap. Since the end of January 2020, we’ve productionized our Vespa cluster and serve all of our recommendations through it. We’ve also added dozens of new fields, ranking expressions, and use cases that support major product releases this year like Stacks. And unlike our previous matching system, we are now using machine learning models live at query time.

What’s next?

For us one of the biggest selling points of Vespa is its direct support for ranking with tensors and integration with models trained with frameworks like TensorFlow. This capability is one of the major features we hope to continue to leverage in the coming months. We’re already making use of tensors for certain use cases and we’re excited to soon look into integrating more machine learning models that we hope will better predict outcomes and match our users.

Additionally, Vespa recently released support for high-dimensional approximate nearest neighbor indexes that are fully real time, concurrently searchable, and dynamically updatable. We look forward to exploring other use cases with real-time nearest neighbor searches.

OkCupid x Vespa. Ship it!

Many people have heard of or worked with Elasticsearch, but there is not as big a community around Vespa. We believe there are many other applications that have been built with Elasticsearch that would be better served with Vespa. Vespa has been a great match for OkCupid’s use cases and we’re happy that we made that investment. This new architecture has allowed us to move faster and deliver new features much more rapidly. We’re a relatively small team so it’s also great to not have to worry so much about operational complexities. Now we are much more poised to horizontally scale our search capacity. We certainly would not have been able to make the progress we’ve achieved in the last year without Vespa. For more Vespa information and technical capabilities, be sure to check out E-commerce search and recommendation with Vespa AI by @jobergum.

We made the first move in liking and sending the Vespa team a message. They messaged us back and it was a match! We could not have done this without the help of the Vespa team. Special thanks to @jobergum and @geirst for providing guidance on querying and ranking and a super special shoutout to @kkraune and @vekterli for all their support. The level of support and effort the team provided us was truly awesome, from digging deeply into our use case to diagnosing performance issues to making enhancements in the Vespa engine in short time. @vekterli even flew out to our office in NYC to work directly with us for a week to make sure our integration and use cases could be met. Thank you so much to the team at Vespa!

In closing, we’ve only touched upon a few aspects about our use of Vespa but none of this would have been possible without the tremendous work by our backend and operations teams over the last year. There have been numerous unique challenges we’ve faced in bridging the gap between our existing systems and a more modern technology stack but those are blog posts for another time.

If you’re interested in the challenges the backend team at OkCupid is working on, we’re hiring!

Originally published at https://tech.okcupid.com on September 29, 2020.