Some time over the summer, we began thinking about converting OkCupid's desktop webapp to a more SPA-style experience. "Why not?" we asked. "It's strange we hadn't started doing this earlier!" was probably something someone else said.
Maybe a day later, I responded "Oh."
In order to handle the code splitting and lazy loading required of any self-respecting SPA, we'd need to include content hashes in our compiled filenames.
Before everyone gets mad at me, let's back up for a second:
How Deployment Worked
Until recently, our workflow looked something like this:
- Commit those changes to our frontend repository.
- Run webpack to compile them, using a hand-rolled plugin to send the compiled files to our dev server and primary repository via NFS.
- SSH into the dev server.
- Commit the compiled files to the primary repository.
- Hand-edit lines in a JSON file to handle cachebusting, one value per compiled file.
- Commit the JSON file to the primary repository.
- Deploy the compiled files.
- Deploy the JSON file separately, where a filename's new value would serve as a fresh query string when included on the page.
It took me about a month to develop a complete mental model of our deploy process. If anything broke, a quick rollback often meant using git to revert to out-of-date versions of the compiled files, leaving our two repositories in a state that's stressing me out as I type this. It was time-consuming, annoying, and error-prone, but it also worked most of the time, so I'm sure you understand our dilemma.
I began casing the project for some low-hanging fruit, and found Webpack-Manifest-Plugin. By generating unique content hashes for each compiled file, the plugin could replace the portion of our process where—to reiterate—we hand-incremented numbers in a big JSON file and could take the app down on all platforms by forgetting a comma or whatever.
Common.min.css are sort of the same file, says the plugin, until they're not.
We were going to need something to generate separate content hashes of the CSS files. Extract-Text-Plugin includes a feature to include them in the filenames themselves, but we weren't yet ready for that.
(Deep sigh) We were going to need a way to pull the content hashes from the filenames. Fortunately, Webpack-Manifest-Plugin contains a
map method that lets us mess with any entry in the manifest however we see fit. For example, we could swap out a CSS file's JS-identical content hash with the accurate one derived from its filename.
(We're aware Webpack 4 can do this much more sensibly, but the CommonsChunk/SplitChunks schism would have collided so directly with OkCupid's usage of Webpack that upgrading fell out of scope pretty much immediately. If you don't know what I'm talking about, I am very jealous of you.)
With the corrected manifest entry in place, we were free to regex the hash out of the CSS filename. By juggling all this before sending the compiled files to our dev server, we could deploy without hand-editing any JSON.
Somehow, it worked. We were on our way.
With the cache now being busted programmatically, our deploys could be automated from end-to-end. This meant we could begin work on getting the files out of version control, which would in turn allow us to include content hashes within filenames, start creating an SPA, and move on with our lives.
There was one question on everyone's mind: If the flat files weren't going to be hosted on our servers, where were they going to be hosted? A bunch of us shrugged and said "S3" at the same time, and a bucket was created soon afterwards.
One advantage of relying on S3 was the existence of S3-Plugin-Webpack, which chucks the results of a Webpack compilation into the S3 bucket of the user's choice. Hats off to its developers—setting it up was a breeze, except for when I forgot S3 isn't a real filesystem and thus totally freaks out if you try to upload to it using relative paths.
With that question answered, we could move on to a second one: If we weren't going to keep track of our builds in version control, where were we going to keep track of them? Unfortunately, this one took way more than two characters to answer:
How Deployment Works Now
- Same as it ever was: Write, commit, compile.
- Send the flat files and Webpack manifests to S3 instead of our dev server.
- Each Webpack build is given a
version, which is a string consisting of the build's most recent git commit concatenated to a timestamp.
3a. A row in a
deploy_logdatabase table is created for that
versioncorresponds to a Webpack manifest, which lives in S3 under
- If a
versionis flagged as
activein the database, OkCupid will refer to its corresponding manifest.
With that system in place, and a web-based dashboard for managing our deploys, we were able to include our files with the classic
common.49313671.min.css naming convention.
And there we had it: A nearly one-for-one replacement for what had once been a big confusing bummer. The "nearly," however, raised enough eyebrows that an experiment was proposed: Half of our users would get assets served via S3, and half would get them the old-fashioned way. In a perfect world, the stats between those groups of users would be equal.
If you've read this far, you probably know both of these facts already, but: The world's not perfect and extra http requests make a difference. Fetching the Webpack manifest from S3 introduced one such request, which increased average load times by enough milliseconds to tank our overall number of votes/swipes by about 1.5%. Not sustainable.
If fetching the manifests via http wasn't going to be an option, the next easiest solution seemed to be storing the hundred-or-so lines of JSON alongside the
version in the database and fetching it that way. "Why not?" we asked. "It's strange we hadn't started doing this earlier!" was probably something someone else said.
About five minutes after I deployed that change, I received a tap on my desk from Erwan, our systems architect. After pointing at his monitor, where various graphs were taking off like fireworks, he explained that if I didn't roll back whatever I'd just done within about twenty minutes, the database would start to become unresponsive.
I rolled it back.
As it turned out, the backend service for our deploy log implemented caching on a per-method basis. The method we'd been using, called
get_active_deploy, returned the
version but no additional information. In order to fetch that deploy's manifest, a second method was required. On every page view. On every platform.
You see where I'm going with this. In my frustration over the experiment not working, I'd shot from the hip and increased the volume of OkCupid database calls by about 30% in less than ten minutes.
With that facet of my idiocy understood, and some incredible last-minute help from our backend engineer Josh, a new method was created, complete with a more comprehensive response and all the caching we could ever ask for. We ran the experiment back, and this time it succeeded (read: nothing happened). Flat files were removed from the repository, our deploy process' potential for human error was dramatically reduced, and everything was fine.
Much of OkCupid's technical debt is a result of having been slightly ahead of our time—decisions were made to hand-roll solutions to problems that would be solved in a more widely-adopted fashion soon afterwards. The extent to which we'd made our own bed meant nobody had ever had these infrastructure problems, much less solved them. Getting used to a byzantine and awkward system can make it difficult to imagine how things might work outside of it, and the decision to dedicate some precious developer time to only potentially R&D'ing a solution wasn't made lightly. A few months later, though, the project had succeeded, and our web team's quality of life had improved significantly.
Shortly after the month it took me to complete my mental model of our build process, I was assigned a ticket called something like "Don't Check Compiled Files In." After about a year and a half, I'd completed it. (The ticket had gotten pretty well buried during that time, though, and after about five minutes I gave up on trying to close it.)
If nothing else, I hope this post motivates some folks to play Jenga with some legacy code. If this turns out to be bad advice, feel free to hassle me about it on Twitter, or—even better—come work here and clown on me in person.