For years now, the web community has lauded the benefits of the Single Paged Application (SPA) architecture. Go to virtually any web conference or popular tech blog, and you're bound to encounter a plethora of discussions about their benefits. It makes sense why we're so into them: waiting to load and initialize a full page of HTML, JavaScript, and StyleSheets every time we navigate to a different page of a website is slow; loading small bits of JavaScript on demand is fast. SPAs also allow us to avoid that dreaded flash-of-blank-screen between page loads, instead letting us provide more user-friendly loading screens. This makes SPAs better suited than traditional architectures for connectivity-scarce environments like mobile web, or even desktop web in certain countries. If optimized correctly, even the very first page load can be fairly lean; and I won't even get into the well-known financial incentives to delivering a fast browsing experience. The short version is: gotta go fast.

However, what many SPA discussions don’t take into account is just how arduous a process it can be to migrate to a single page architecture.

This is especially when you’re dealing with a mature application, with millions of users and heaps of poorly-understood code. It’s understandable why this isn't talked about too much—many of the most vocal tech companies tend to fall into one of two categories: either they're a relatively new startup, with proportionally little tech debt; or they're a more established FAANG-type with virtually unlimited resources to tackle tech debt.

At OkCupid, we’ve been around since 2004, with a fairly small (~30ish people) engineering team powering things throughout. Our web codebase predates Ruby on Rails, and you might be surprised (or appropriately horrified) to learn our tech stack relies on a home-grown PHP-like framework for server-side rendering that we call Pub. Our product was, until very recently, a patchwork of Pub, vanilla JS, jQuery, Coffescript, React, Reflux, Redux, and more, largely relying on server-side hydrated state in place of proper APIs. Additionally, there was a myriad of possibly-dead code paths and conditionally injected third-party scripts of questionable usefulness. So, when we talk about legacy code and tech debt, this is what we’re bringing to the table.

And as many of you know, it’s often difficult to make a business justification for a ground-up rewrite of a product. So how do you take a web application older than some JavaScript programmers and bring it into the modern era? Well, we’ve actually been making significant progress towards this goal for a while now. The fact that React can easily render into an existing app was crucial to helping us start adopting it, as early as mid-2016—and today our product is thankfully almost entirely React-powered. I’m also happy to report that all of our product pages are API-driven (though we’re not quite at the GraphQL-powered utopia we’re striving towards—more on that later). Both of these were slow, gradual transitions that we've been working at for years, and that made our eventual move to a SPA significantly easier.

But despite all this great progress, we were, until very recently still relying on Pub to actually generate each of our HTML pages on the fly with every request. This was preventing us from doing cool things, like not having our programmers have to learn a made up language (very useful for recruiting). So about 6 months ago I started on a project to move us off of Pub. We started this transition with our mobile web product, and brought those learnings to desktop web 3 months later. The task list for both platforms was broad but fairly straightforward:

  • Generate an HTML file on compile to serve our JavaScript bundles.
  • Ensuring a fast (enough) first paint.
  • Use code splitting to avoid ending up with a bloated bundle.
  • Identify better abstractions for boilerplate logic like third party scripts.
  • Update legacy vanilla JavaScript tools relying on direct DOM manipulation for React.
  • Eliminate any unused code I find along the way.

In the process of this massive migration, I made a few observations and designed a couple abstractions that I hope you might find useful, both in cleaning up old code as well as in writing new code.

Tracking Down Dead Code

So let's start addressing those bullet points in the most logical way: alphabetically. Eliminating unused code was a huge goal when migrating off of our legacy infrastructure. Partially this was idealistic (yay deleting code!), partially this was pragmatic (yay less code to translate!). But an incredibly annoying challenge  was discerning between what legacy code was critical to keeping our site working, and what code could safely be deleted because it hadn't been relevant since the days when OkCupid hosted journals and forums (yes, we used to do that; no, I can't tell you why we thought it was a good idea). There was, however, one very basic tool in our tool belt that proved invaluable in figuring out what code was running and what wasn't: analytics!

Sweet, sweet silence.

There was plenty of code I stumbled upon that my engineering forebears had had the foresight to add analytics events to. When I'd find these, I'd check whether they'd been fired within the last year or so, and at what volume, and make a judgement call as to whether it the offending code could be excised. This honestly made me want to add an analytic event to every single React component I ever write. Someone talk me out of it.

If only everything were as simple as firing an analytic event!

Optimizing First Paint

Another difficult aspect of migrating to a single page app was that much of our web application depended on certain data being guaranteed available at all times within our React app. While this was easy to ensure with server-side data hydration, this was much more challenging when moving to a static HTML template. Getting rid of all of these dependencies would have been extremely tedious and difficult to QA, so we had to develop strategies to work around the problem. In the end, we settled upon something that looked like this:

// index.js - our main application entry point
import React from "react";
import ReactDOM from "react-dom";

import API from "./api";
import Helpers from "./helpers";
import App from "./app";
import AppError from "./app_error";

const root = document.getElementById("root");

// Load the absolute necessities.
API.loadCriticalData()
    .then(() => Helpers.initializeGlobals())
    .then(() => ReactDOM.render(<App />, root))
    .catch((error) => ReactDOM.render(<AppError error={error} />, root));

What this does is, it loads the data that's absolutely critical to our application. Then, it makes the data available to the rest of the application by initializing some global libraries. Finally, it renders the application to the DOM. If anything in this process should fail, we fallback to an error state instead.

There's one significant downside to this approach, though: what gets rendered while we wait for the data to load? We wouldn't want to show the user a blank screen while this happens—even if it only happens once per session, showing the user a blank screen is never ideal. To combat this, we populate our template file with an app shell:

// index.html
<html>
    <head>
        <title>OkCupid</title>
    </head>
    <body>
        <div id="root">
            <div class="some-loading-state">
            	<div class="some-flashy-loader"></div>
            </div>
            <noscript>
            	Turn on JavaScript, jerk!
            </noscript>
        </div>
    </body>
</html>

React will eventually render into the "root" div, and wipe out everything inside of it; until that happens, we can render anything within that div that we'd like, including an app shell to show the user in the meantime or a noscript with UI to show users without JavaScript on. This ensures that while the JavaScript is loading over the wire, parsing, compiling, and then loading its requisite data, the user isn't forced to sit in front of a blank screen. Learn more about JavaScript Startup Performance here.

Code Splitting

Another critical aspect of ensuring a fast first paint is making sure our JavaScript bundle didn't balloon in size as our Single Page App added more and more routes. Fortunately these days, React offers some great out-of-the-box tools to prevent this from happening. Namely: React.lazy and React.Suspense.

If you're not familiar with React.lazy, it's a great addition that was released fairly recently in react@16.6.0, and it allows us to dynamically load any component on demand. No longer do you need a third-party library to code split based on the current route; instead, you can do something like this:

// Routes.jsx
import React from "react";
import { Switch, Route } from "react-router-dom";

import Loading from "./Loading"; // Some loading state UI.

const Home = React.lazy(() => import("pages/home"));
const Login = React.lazy(() => import("pages/login"));
const Signup = React.lazy(() => import("pages/signup"));

const Routes = () => (
    <React.Suspense fallback={<Loading />}>
        <Switch>
            <Route exact path="/" component={Home} />
            <Route exact path="/login" component={Login} />
            <Route exact path="/signup" component={Signup} />
        </Switch>
    </React.Suspense>
);

Here I'm using React Router's Switch and Route components to handle rendering a component at a given path. It's completely agnostic to the lazy loading logic happening around it, and you can choose use any or no library in place of it.

What's happening here is, the Switch component will decide which Route to render based off the current path. When it finds one that matches, the Route will render the component passed to its component prop. What the router doesn't know is, in this case, each of these components is being dynamically loaded only once they're going to be rendered. React.Suspense will actually hold off on rendering until the component loads. While it's loading, it will render whatever you pass to its fallback prop in the meantime. This means that adding additional routes to our SPA comes at little to no cost to the main bundle!

In the future, it's likely React will allow us to use React.Suspense to pause rendering while data is being fetched (relay, React's GraphQL framework, already implements this), or any other asynchronous operation happens. But it's stable enough to use for code splitting in production today, like we are! Learn more about code splitting with React here.

Third Party Scripts with useScript

I won't pretend I know what Google Analytics is doing here. But, really?

One of the most common things I found myself doing during the SPA migration was pasting in some third party script, almost all of which looked something like this:

<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXX-Y', 'auto');
ga('send', 'pageview');
</script>
<!-- End Google Analytics -->

These usually come pre-minified as above, but the gist to literally every single one is: add a script tag to the body, load some third party JavaScript from a CDN, and run some related code.

But some of these scripts we really only wanted to load in some cases (e.g. for only logged in users, only for users with ads, only for users with higher data connectivity, etc.), and there's not much room for this kind of nuance within a static HTML template. Fortunately, React Hooks ended up being a great tool to handle this! We ended up with something that looked like this:

// useScript.js
import { useRef, useEffect } from “react”;

/**
* Dynamically load a JavaScript.
* @param {string} url - the URL of the script to load.
* @param {object} [options={}] - any overrides for the script element.
*/
function useScript(url, options = {}) {
    // Create the script element, once.
    const scriptRef = useRef(document.createElement("script"));
    useEffect(() => {
        // Basic setup.
        const script = scriptRef.current;
        script.type = "text/javascript";
        script.src = url;

        // Advanced modification, and/or subscribing to events.
        Object.keys(options)
            .forEach((key) => script[key] = props[key]);

        // Add to the body if necessary.
        if (!document.body.contains(script)) {
            document.body.appendChild(script);
        }
    }, [url, options]);
}

export default useScript;

Because these scripts often require a bit more setup, my usual approach was to wrap them like so:

// useGoogleAnalytics.js
import { useState, useEffect, useRef } from “react”;
import useScript from “./useScript”;

const GOOGLE_ANALYTICS_SDK_URL = “//www.google-analytics.com/analytics.js”;

/**
 * Load and use the Google Analytics SDK.
 * @param {object} location - the current user location.
 */
function useGoogleAnalytics(location = window.location) {
    // Track load state.
    const [hasLoaded, setHasLoaded] = useState(false);

    // Load the SDK.
    const sdkOptions = useRef({ onload: () => setHasLoaded(true) });
    useScript(GOOGLE_ANALYTICS_SDK_URL, sdkOptions.current);
    
    // Initialize Google Analytics.
    useEffect(() => {
    	// Exit if the SDK hasn’t loaded yet.
        if (!hasLoaded) {
            return;
        }
        
    	window.ga('create', 'UA-XXXXX-Y', 'auto');
    }, [hasLoaded]);

    // Track page views.
    const path = location.pathname;
    useEffect(() => {
        // Exit if the SDK hasn’t loaded yet.
        if (!hasLoaded) {
            return;
        }

        window.ga.current(“send”, “pageview”);
    }, [path, hasLoaded]);
}

export default useGoogleAnalytics;

Such that when we needed to use them, it was as simple as this:

// Page.jsx
import { useLocation } from “react-router-dom”;
import useGoogleAnalytics from “./useGoogleAnalytics”;

function BasicPage() {
    const location = useLocation();
    useGoogleAnalytics(location);

    return (
        <div>
            <header />
            <main />
            <footer />
        </div>
    );
}

Note: I pass location as a parameter here, instead of placing the useLocation call within the hook. Frustratingly, I discovered pretty early on that React Router’s hooks throw a runtime error if you try to use them outside of a Router context, which proved troublesome for interoperability with our non-SPA pages. By passing location as a parameter instead of calling useLocation directly in the hook, we’re able to more easily reuse it across both the SPA and non-SPA code.

This approach to loading scripts scales well for any number of helper scripts, while remaining easy to read for developers scanning through code! To learn more about React Hooks, see my earlier blogpost on the subject here.

Migrating Vanilla JavaScript Tools

Another problem we ran into was how to migrate some of our legacy UI tools that relied on a combination of vanilla JavaScript and manual DOM manipulation. Usually we’d have some tool that looked something like this:

// legacy_popover.js
const LegacyPopover = {
    init({ theme }) {
        const dom = document.getElementById(“popover-dom”);

        // some fun dom manipulation.
        this.applyTheme(dom, theme);
    },

    applyTheme(dom, theme) {
        /* additional logic here */
    },
};

And within our Pub template, we'd have some code that looked like this:

// index.html
<div id=“popover-dom”></div>
<script>
    LegacyPopover.init({
        // ↓↓↓ magically available ↓↓↓ user data! Yay 🤦‍
        theme: “%{user.preferences.theme}”
    });
</script>

While there wasn’t a one-size-fits-all solution for migrating things like this, I did find an approach that worked well in 95% of cases, with a some tweaks here and there. Again, the goal here was usually to modify as little of the original code as possible, so as to reduce the surface area for potential bugs.

There's two approaches I could take with something like this: I could opt to programmatically create any DOM that the legacy utility required via document.createElement; or I could pass a reference to a DOM node gotten from elsewhere. I’d usually decide which approach to go with depending on how complex the DOM that needed to be created was. In this case, I’ll opt for the reference approach. The result looks something like this:

// legacy_popover.js
const LegacyPopover = {
    init({ dom, theme }) {
        // some fun dom manipulation.
        this.applyTheme(dom, theme);
    },

    applyTheme(dom, theme) {
        /* additional logic here */
    },
};

export default LegacyPopover;

So where does the dom node come from? React, of course!

// Popover.jsx
import React from “react”;
import get from “lodash/get”;
import gql from “graphql-tag”;
import { useQuery } from “@apollo/react-hooks”;

import LegacyPopover from “./legacy_popover“;

// Apollo Query to get requisite user data.
const THEME_QUERY = gql`
    query getUserTheme($userid: String!) {
        user(id: $userid) {
            id
            preferences {
                theme
            }
        }
    }
`;

const Popover = React.memo(({ userid }) => {
    // Load the user theme preference.
    const { data } = useQuery(THEME_QUERY, { userid });
    const theme = get(data, “user.preferences.theme”);

    // Keep a reference to the DOM node.
    const domRef = useRef(null);
    useEffect(() => {
        // Early exit if we’re not ready.
        if (!domRef.current || !theme) {
            return;
        }

        // Initialize the LegacyPopover.
        const dom = domRef.current;
        LegacyPopover.init({ dom, theme });
    }, [theme, domRef]);

    return (
        <div id=“popover-dom” ref={domRef} />
    );
});

export default Popover;

There’s a few things to notice here. First, the legacy tool (imported as LegacyPopover) doesn’t get initialized until both the data it needs has loaded and the DOM node has rendered. Second, the initialization should only happen once per mount of the component (unless the theme should change), due to the way useEffect works. And finally, I memoize the component to avoid unnecessary re-renders that could mess with the DOM.

There were a few occasions where the legacy utility I was migrating handled some navigation logic. Within the SPA, it would be preferable to navigate using React Router instead of via anchor tags, as these don't give us our nice transitions. To accomplish this, I would pass the legacy utility a goToUrl function as a parameter, which would either navigate via React Router's history.push within the SPA or by manually setting window.location in non-SPA environments.

What’s Next?

While most of the OkCupid website is now being served through our new Single Page App architecture, there’s still a lot of work to be done, and a lot of bundle optimizations to explore now that we’ve migrated. Among these, we'd like to move to using GraphQL across most of our pages, so that we can share a normalized entity store across our product (i.e. sharing cached data across pages). Additionally, we’re already thinking ahead to Progressive Web App technologies like Service Workers for long-term local caching of resources, and continuing to optimize our product for more connectivity-scarce environments. We confident this new architecture will allow us to move faster and deliver new features more regularly to our users. As for me personally, I’m looking forward to going back and deleting many, many lines of legacy code.

Interested in the challenges the web team at OkCupid is working on? We're hiring!