Engineering Smart && Building Dumb: Building an Android Thin-Client at OkCupid
Blog post adapted from my talk at the New York Android Developers Meetup sponsored by OkCupid (We're hiring!)
“Thin-client? There hasn’t been a thin-client here for thirty years…” , cue spooky music. I’ll find a way to work that joke into nearly any situation, consider its use here gratuitous. That being said, conceptually, it’s true. As mobile developers, we often hardcode everything from layouts to machine learning right onto the device. While this approach works for many situations, there are times when you’re going to want your app to be a bit more flexible. What I’m going to show you is a top-level walk-through of how I built a complex feature here at OkCupid and along the way demonstrate a few design patterns that you can employ in your own app to create remotely configurable layouts and behavior on-the-fly.
Putting the Dumb in Thin — What is a thin-client?
A thin-client client by any other name would be just as dumb. Thin-clients, also known as dumb clients, are typically computers optimized for remoting into a server, which in turn will do most of the work of display and processing. I use the term here to describe an app that takes it’s visual and behavioral cues from the server allowing it to be configured remotely on-the-fly.
Thin-client apps are only concerned with two things.
When taking the Thin™ approach to building your app, the app itself is only concerned with a couple of things. Displaying the data and receiving the input. With regard to the first, I’ll show you a strategy for deserializing JSON into views. As for the second, we’ll cover a simple way to declare behavior in that same JSON so that you can inform your app what it should be doing with user interactions like clicks.
Case Study: Discovery
Face. Swipe. Face. Swipe. Face. Swipe. — Swiping as a means of exposing users and facilitating matches is a common design pattern, almost immediately understandable for anyone who has ever found themselves alone again on a Friday night Tindering with no foreseeable plan beyond binge watching an entire season of The Great British Baking Show. A few months ago, our product team got together and envisioned a new way for users to interact, one that could leverage the insights and data they’ve given us to create a more engaging experience.
The result is Discovery, a page full of modules that users can interact with to uncover great potential matches. Modules on the page feature everything from Instagram photos to annotated questions and answers. The greatest thing about Discovery? For the user, the sheer amount of dynamic information.The worst thing about Discovery? For the developer, the sheer amount of dynamic information. But with only a couple of months to take this feature from idea to production, it was time to get down to brass tacks and figure out how to architect it all.
OkCupid Discovery — Matches curated for you.
Cutting Out the Extra Calories — Dealing with Data.
When looking at Discovery the first thing that one might be tempted to think is, look at the massive number of business models we’ll need to create. We’re showing a dynamic page of data, so we can assume that we’ll be throwing it all into a RecyclerView. But what is is it? Questions, Users, Messages, MatchInfo, Interests, Albums. Data class after data class ad absurdum. Let’s take a pause. This is where Thinking Thin™ comes into play. Coupling your business logic so tightly with the display is almost certainly unnecessary and at times can make refactoring an absolute nightmare. Sure you call them users today, but what if tomorrow your marketing department comes up with some other crazy term, like bagels? Now you’ve gotta go back and change all those references even though what you’re showing on the screen is effectively the exact same content.
Tip 1: Think Layouts, Not Users
Here’s a useful design pattern. Rather than having entity-centric models, why not have business-agnostic layout-based models? Models that describe the items on the page literally rather than semantically. Discovery doesn’t need to care that this is Erin, a 96% match and this is a picture of her face.
Our view only cares that there is a picture and two text fields that need to be filled out. Rather than receiving the entire User object from the server and parsing it on the client, we teach the server to only give us exactly as much information as we need to display on the screen. Think layouts, not users.
With this philosophy in mind, we created a system of naming that works for our team. The name of our data classes, while verbose, accurately describe what is being shown. Given that we’re building up a catalog of components that are conceptually shared between our design and engineering team, it’s nice to at a glance inspect a spec and know exactly which layout we need to use. Names like DetailedLayout_New can only take you so far, so we stayed away from them entirely. We chose to name our data classes and fields using a descriptive taxonomy of positional attributes like top_text or bottom_border. Use whatever works for you. All of our layout models share a common base class of LayoutData which you can see defined at the top of the file, but we’ll come back to what’s happening there shortly.
I find that when someone’s taking time to do something right in the present, they’re a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they’re a master artisan of great foresight. -xkcd
This reusable view data system allows the server to put whatever type of information it wants into the containers, and the app itself simply acts a conduit for that information, receiving the raw models. We then inject these models into our viewmodels and bind them to our views using Android Databinding.
If we should decide that, tomorrow, we actually want to use that layout for displaying ads or some other manner of information — we wouldn’t require an APK update to do so. Rather we’d just flip the switches on the server. Sound like magic? Damn near is. Let’s talk briefly about the way the server works.
Dumb Client, Smart Server
Building a thin app is not entirely magic and it’s not going to save you or your team from writing complex code at some stage. The idea here is that we’ve moved a greater portion of the processing and business logic over to the server than we typically see in standard app development. When working on a real product with a multi-platform team, duplicated logic often (read: inevitably) means more bugs. Years ago, we introduced the concept of extended genders and orientations to the OkCupid platform. Now, while an entirely worthy and noble pursuit, the combinatorial explosion of all those options and their oft-complex matching logic made the code a petri dish for all types of bugs. Over a year, we played whack-a-mole on all three of our platforms (Android, iOS, Desktop) plugging the bugs. In the end, we decided that the best thing to do was have the server handle the heavy-lifting while the clients handled the display. Thus was born the dumb-client mantra: “Don’t do on the client what the server can do.”
I can’t stress enough the exercise of common sense here, not every single thing need be handled on the server — this isn’t 1970, we’re not sending clicks over the wire to the mainframe. We’re just ensuring that most of our business logic comes from the server itself so that it need not be written in duplicate.
Tip 2: Layouts as a Love Language
Building a solid API is as much a matter of technical functionality as it is about UX. Sitting down with as consumers and developers to create sustainable endpoints with sensible request and response parameters is imperative. But, as with all code, requirements will change over time. Endpoints are no less subject to refactors than the rest of your code. No two ways about it, versioning an API sucks.
We created a smart system for the Discovery endpoint. The client application sends an enumeration of the layouts it can support (e.g. PictureTitleSubtitle, AvatarThreeText, TwoImageOneText) — the server then prioritizes by layout and fills in the data.
As an example, if you have an old build that only shipped with support for the PictureTitleSubtitle layout, we’ll put your future date’s information into that more simplistic view. However, if you have our latest build, we can set up the server to prioritize sending future bae’s data in a supported visually richer layout. This way we can easily support many clients from different releases without forcing users to upgrade and creating a potential loss.
Tip 3: Polymorphic Deserialization is Super()
Now we have a solid system for telling the server what kind of layouts we can support, but what happens when it sends us back a list of those fulfilled layouts? Our Retrofit network call only knows that it’s requesting LayoutData (this is just what we call our layout data class, you can choose whatever makes your heart content), but it doesn’t know what specific subtypes and JSON doesn’t have a concept of classes and hierarchy, so how do we teach our app to parse the JSON into a list of the correct subtypes? Polymorphic type handling.
An actual example of our LayoutData JSON
In practice, you specify a field on the JSON object as the discriminator and the deserializer will check that field and parse the whole object into the class specified there through reflection.
GSON handles polymorphic deserialization using a class called the RuntimeAdapterFactory. Once you register all the subtypes that you want to deserialize your JSON into, the adapter will do the rest of the work to deliver you a covariant list of your data. Most deserializers have a concept of this, from GSON to Jackson (I believe there’s a PR out for support in Moshi as well, if not, it’d be trivial to code it up yourself. Open source ftw.) They all effectively work the same way.
Tip 4: Data Don’t Mean Sh*t If You Can’t See It — Use Epoxy
We’re sending data back and forth, things are looking great. So how do we get it on the screen? Say hello to my little fren, Epoxy from AirBnB. Epoxy is essentially a Recyclerview on steroids. You provide Epoxy with a representation of what you want to display on the screen and it will handle a lot of the heavy lifting required to get it there. For Discovery, we model a state interpreted from the LayoutData that we got in our API response and feed that to the EpoxyController which will in turn diff the state and update the RecyclerView with any changes.
Pictured: The only reason I have my sanity after three months on this project
There are a lot of benefits to be gained when using RecyclerViews in this manner, even for relatively static data. RecyclerViews lazily load the view hierarchy so layout times are inherently faster with less memory usage, and newly loaded data won’t repeatedly invalidate the view hierarchy. Usage of Epoxy itself is relatively straightforward so I won’t cover that in depth here. The main gist is that we declare to Epoxy the resource identifiers for our DataBinding layouts and it generates models for us that we can declaratively add to the TypedEpoxyController which handles the RecyclerView.
On State: A Brief Synopsis
Here’s where I’m going to take you on a little journey through the building blocks of Discoveries MVI-like architecture. In much of the development that we do, the conceptual state of what is happening on-screen is only maintained in the view. Say for instance we are loading some data and start a spinner that will run until our data operation has completed. The state of whether or not that spinner is active is only maintained in the Spinner widget itself and not a field in logic class like a Presenter or ViewModel. As such, accurately recreating this state on something like a configuration change, or persisting it to storage would be a difficult task.
In the previous section, we saw how we can pass a “state” to Epoxy and use the that to construct a visual representation of our data. So what is meant by state here? It’s an often overloaded term with various definitions, but what I am referring to is a centralized representation of the current status of all the components of our application. It’s no fancier than that. In Kotlin, we can represent it as we would any other model by using a data class. The important aspect to mind here is that the properties of the state be maintained immutable. There should be no setters for the properties of your state class, and the properties themselves should be reflective of any data that you would want to be able to recreate, persist or report. Abiding by this rule will enable you to have a predictable state container that is not subject to the side effects of other operations.
For Discovery, our state mostly consists of instances of our layout data derived from the deserialization of our data. There are properties contained in the state, like currentPageIndex, that may not be from the JSON itself but are still descriptive of the applications current status.
If the state of our application is an immutable data class (disclaimer: Kotlin vals are not definitionally immutable, but we needn’t be pedantic, right?), how then are we to modify it as the user interacts with our application. We’ve done a thorough walk-through of decoding the visual component of our dumb-client, so let’s look at the beeps and boops of decoding interaction.
Defining and Decoding InterACTION
Let’s suspend disbelief and allow ourselves to get hypothetical for a moment. If someone asked you to describe every interaction available in your app, would you know where to look? Most of us would not. Now say there was a way to easily define the set of all actions that could be taken within an application concisely and consistently? Spoiler alert: There is. Double spoiler alert: I didn’t come up with it.
FSA on the left, Kotlin version on the right.
Your actions should be easily read and written by humans, simple and straightforward descriptors of what is happening or needs to be done. They may either be objects with no attached information or data classes with relevant information contained inside. Either way, if we use this model we’ll see that we end up with a strong idea of everything the app can do. In other words, these actions are a powerful tool for abstracting and encoding behavior. Let’s get a look at how we can use them to modify our state.
Be the Change You Want To See in the State
So with our not-so-hypothetical actions, how then do we deal with any one instance of interaction? We do the same thing that we naturally do when we receive actions from others, we make some decisions. We need to pass these actions through a control flow construct that will parse it and decide what to do with the information.
The reducer is a pure function, that the returned value (state) is only determined by the input values (actions) with no observable side-effects. Different parts of our code can’t arbitrarily change our state from outside of the reducer instance. The reducer looks for actions and then uses the data contained there-within to create an entirely new state.
One common sentiment you’ll often hear parroted is that copying the state is an expensive and slow operation. Let’s take a moment to pause and pragmatically reflect on this. We only copy the state on interaction — unless you are creating a video game with hundreds of actions a second, the effect of copying a sanely built state with a properly normalized shape will be imperceptible. For reference, copying and diffing the entire state of Discovery with over 100 models and tons of layouts took somewhere on the order of about 9ms. Well under the jank limit of 60ms that would cause any noticeable delay in your UI. It’s important to keep numbers like this in mind so we don’t run the risk of making pre-optimizations. On the other hand, having a single source of truth for the state of your app makes developing and debugging a hell of a lot easier.
All this is well and good, but realistically we’ll need to run asynchronous actions from time to time that will eventually affect the state, so let’s look at a pattern for handling those.
Handling Side-Business, Creating Actions
If we have interactions or events in the app that necessarily must spawn longer running side-processes or async actions, we can pass them through what is sometimes known as an ActionCreator. The ActionCreator in form is very similar to the Reducer with the key differentiator being that it can have side-effects but has no direct access to the state itself.
Using a control flow, we parse the actions and then run async tasks using something like RxJava observables. When the side-business has completed, for example in the subscription lambda, we can then dispatch either another or another side effect inducing action or the final action that will be caught by the reducer and used to modify the state.
So now we have a complete idea of how to describe both the visual representation and behavior of our app. Remember Discovery? Let’s get back to that and take a look at how we encode and implement this behavior.
Final Steps: Parsing and Binding the Actions
We need to deserialize the actions from JSON in order for them to come along with the main payload running the feature and be remotely configurable. Using the same pattern that we used for layouts we use polymorphic deserialization to parse the JSON actions into their subtypes discriminating by a field in the JSON body of the object.
We must then bind the, now Kotlin, value-object to the view itself. In order to do so, we are again leveraging the strength and flexibility of Android DataBinding. The action itself is defined as a field on the view, parsed from the same JSON. On the view itself, we define the modes of interaction which will fire the as yet unknown action, i.e. clicks, long-presses, gestures. Once that user view interaction occurs, we signal to the ViewModel to fire the action through a RxJava subject. A parent ViewModel that holds reference to the State, Reducer, and ActionCreator dispatches that action and the state is modified accordingly.
The beauty of this is that once your app knows how to handle and transform its internal state for any given action — the views themselves are agnostic of both what they are doing and displaying. Congratulations, you’ve basically just made yourself a native component web browser! Either way, we now have a dumb-client system for which we can remotely configure the layouts and behavior on-the-fly.
A Few Final Thoughts
Hopefully there’s something in here for everyone. Some of the concepts are pretty complicated and it’s difficult to do them justice while looking at the feature from such a high-level, but if there’s enough interest I'll revisit those aspects in more depth in a follow-up post.
This system is not a panacea, as with all opinionated programming decisions, there are trade-offs. But one thing I would like to clarify is this: making your app operate as a thin-client in this manner doesn’t mean that it won’t be able to operate offline or cache responses. The reducer still should contain enough logic to run all internal operations that don’t need access to the internet/server. Anything else can be persisted on the client and queued until the connection is reestablished, just as you would with a thick-client.
It’s important to remember to not dogmatically adhere to any given architecture or design pattern, but instead to seek out creative solutions and drive innovative engineering. Remember that architecture is nothing more than a contract between you and your fellow developers, so do what works (just remember to document it!) Finally, anyone can copy code, but the true gift you can give to the Android community is your own perspective. I’d love to hear from you all, so feel free to follow and message me on Twitter @BrandonJF.
Cheers and happy coding!