Jordan Guggenheim

iOS Scroll Performance Tutorial

OkCupid is hiring for iOS! Click here to learn more.

At OkCupid, 30% of iPhone users have a device that we consider to be underpowered. This makes it extremely important that we code our app to perform optimally across all devices.

We monitor our user retention by device model and there have been times when poorly optimized features have caused users on older devices to have up to 7% lower retention. It reaffirms to us that the fit and finish of an app really matters!

Also, with the release of new iPads with 120hz displays, the bar has been raised to deliver ultra high performance scrolling experiences.

This tutorial

This tutorial will go in depth on how to successfully debug CPU bottlenecks using Apple's Time Profiler, walking through an example project plagued with scrolling performance issues. ☠️ ☠️ ☠️

The goal is that by the time you're done, you will have a much better understanding of how to spot deficiencies in your own app. I will also share a few optimization techniques that we employ here at OkCupid.

It's important to remember the following:

  • The practice of optimizing your app can sometimes be a tradeoff between writing simplistic or performant code. The key to success is striking the balance.

    For example, AsyncDisplayKit / Texture is a remarkably powerful framework for performant layouts, but do you know how to write Objective C++? Can you troubleshoot it if you encounter a showstopper bug? What if an iOS upgrade breaks an integration?

    You should exhaust all options within the Apple APIs before adding an external dependency.

  • Profile early and often. The earlier you catch a performance regression, the easier the code will be to refactor.

  • To see performance issues, you must profile on an actual device, not the simulator.

Underpowered Devices

OkCupid's current list of underpowered devices includes the iPhone 5 family and the iPhone 6+ (3x rendering strains the CPU). You will get the most out of this tutorial if you have one of these devices handy, but you can still follow along with a newer phone. I have created a layout that will make even the best devices drop frames.

Getting Started

Clone the git repo located here and open the sample project. In the project's target, set your team under the "Signing" section. You will notice when you run it that the app is already fully optimized for 60 fps scrolling. However, you can trigger the app's performance issues by setting kIsOptimized = false on Line 9 in the AppDelegate.

AppDelegate

Go ahead and do that and then rerun the project. Notice the difference on your phone as you scroll? The scrolling will appear choppy as it drops frames trying to render the content. Hit stop in Xcode.

Navigate to OKConversationMessageClient Line 30 and replace "emojis" with "mixedCharacters" like so:

Client

You will see much less stuttering when you rerun the app.

If you change "mixedCharacters" to "alphabet" you will see completely smooth scrolling on even the oldest devices.

Important Lesson #1:

Consider many different scenarios when you're testing your app for performance.

In the sample app, short one line messages like "hey" will generate a lot more cells on screen than paragraph long messages. This means that all those cells must layout on the main thread in quick succession. Even worse than cells instantiating one after another is multiple cells coming on screen at the same time.

Beyond the quantity of cells, think about what variations of content are likely.

Let's look at text rendering for example. Do you use complicated attributed text with link highlighting? How about supporting languages such as Arabic, Hebrew or Mandarin? Will your text contain emojis? (probably, since it's 2017... 🙈). All of these factors impact the drawing performance of UILabel, TTTAttributedLabel (which is a fantastic and performant open source framework) and UITextView.

With that in mind, let's switch OKConversationMessageClient Line 30 back to "emojis". But this time instead of running the app, let's profile it and identify the non-performant code!

Time Profiler

Time Profiler is a very helpful instrument that samples the phone's CPU a thousand times per second. Each sample is a recording of a backtrace, which is a summary of what the CPU was executing and how it got there.

It then symbolicates the backtrace and displays how many times each symbol (method) occurred and the "time" it spent running on the phone's CPU.

The reason that "time" is in quotes is because the milliseconds number that is shown to you is actually calculated as:
(count of symbol's occurrence in the samples) x (time between samples).

So in other words... Time Profiler is NOT measuring the time a method spends running. Instead it is capturing either long running methods or fast running methods that are firing multiple times.

But regardless of this limitation, the profiler provides valuable insight on which lines in your code are consuming the CPU and therefore the phone's battery as well. I highly recommend you check out these resources below if you're interested in going in depth on the inner workings of Apple's Time Profiler:

Time Profiler at WWDC 16
Time Profiler User Guide

Facebook built a tool that can profile a device in the field and send the data to a server, but it is up to you to send and receive the data and afterwards symbolicate it. For our purposes, Apple's Time Profiler is exactly what we want.

So let's try it out!

In Xcode hold down the "Run" button and select Profile.

Profiler

Xcode will immediately begin building and when that completes the Instruments window will open. Select "Time Profiler" and hit OK.

Instruments

When the profiler opens, hit the red record button and the app will launch. Once launched, scroll down and watch as the Time Profiler records the data. After you have scrolled for 5-10 seconds hit the stop button.

Your profiler should look something like this:

Time Profiler Instrument

Annotations:

  1. This is the graph of the CPU usage over time. You can highlight sections of the graph and the details section below will filter on just the traces that occurred over the highlighted period. You can also pinch to zoom for more granular highlighting.

  2. These three buttons at the top toggle how you view CPU usage, which can be particularly helpful for narrowing down issues.

  3. The detail section displays the backtraces broken out by thread. Remember, the time is calculated as the (count of samples) x (time between samples).

  4. The call tree options provide meaningful ways of grouping and filtering the data.

  5. The process search bar allows you to filter the trace on symbol or process names.

  6. The heaviest stack trace shows the trace that occurred the most in the samples.

Now that we have our recording, highlight the portion of the graph that occurred when you scrolled through the messages. This will filter out the heavy costs associated with launching the app, setting up the view controller and animating the cells.

Take a look at the heaviest stack trace.

You will see a lot of symbols that you do not recognize, but as you scroll down you will notice [UILabel drawTextInRect:] ... Bingo! As our initial swapping of "alphabet" and "emoji" conversations showed us, the UILabel drawing is extremely expensive. What you can see here is that it is accounting for 31.6% of all symbols sampled in the highlighted results. Note: Your own trace will have different % numbers due to the randomization of the string value in the message objects, and your iOS version, device model, how fast you scrolled etc.

Time Profiler Instrument

For now, let's comment out the setting of the attributed string in OKMessageCell Line 79 and rerun the time profiler to prove that we have correctly identified the issue.

Time Profiler Instrument

If we are right, [UILabel drawTextInRect:] should no longer be the heaviest stack trace in the results. It will also remove a lot of noise from the trace to see if anything else can be optimized.

Go through the same motions as before... In Xcode hit profile, select Time Profiler, hit record, when the app launches scroll through the messages for a few seconds and then hit stop.

Great news! [UILabel drawTextInRect:] is no longer the heaviest stack trace. Instead... our OKMessageCell.layoutCell() now comes up in the heaviest stack trace recording 28.6% of all symbols in the highlighted results.

Time Profiler Instrument

But that symbol is comprised of multiple components:

Time Profiler Instrument

Note: You may see [UILabel _intrinsicSizeWithinSize:] as the heaviest stack, but again that depends on a variety of factors. In the screenshot above, it is apparent that UIImageView is doing a lot of work. This may be due to our use of stretchable images with cap insets or even the tint we are applying for incoming or outgoing bubble color.

A super handy trick to cut through the noise is to use the "Involves Symbol" search function.

Time Profiler Instrument

It takes a little bit of guesswork to map the private Apple APIs to functions called in your app, but it's not rocket science either!

If you type in "tint" you get this:
Time Profiler Instrument

If you type in "resizable" you get this:
Time Profiler Instrument

While not foolproof at determining if setting the tint is our culprit, the 51 ms of the trace for [UIView setTintColor] vs 1 ms trace for _setContentStetchForImage gives us the confidence to try and fix the tint first.

Reviewing the other nodes in the OKMessageCell.layoutCell() tree shows that instantiating images, sizing labels ([UILabel _intrinsicSizeWithinSize:]) and creating attributed strings ([NSConcreteAttributedString initWithString:attributes:]) all incur costs on the main thread as well.

Let's profile the app one last time. When it launches, rotate the display. Notice the app hang for a few seconds before the rotation completes.

Stop the profiler and highlight the spike that occurred while rotating. You will see the following:
Time Profiler Instrument

Important Lesson #2:

UICollectionViewDelegate calls sizeForItemAtIndexPath for every cell when reloadData or a rotation occurs. It's needed to calculate the content size of the collection view. Make sure you are doing as little as you can in this function.

Also, note that sizeForItemAtIndexPath is called inside an animation block on rotation. This can lead to unintended consequences. In our case, the transform called in OKMessageCell.configureBubbleImageView(with: OKMessage) is animatable and on rotation it completely blocks the main thread.

In the screenshot above, if you double click on [UIView setTransform:] it will take you to the part of the code being sampled. Remember, all the "1598x" is telling us is that is the number of times that symbol showed up in the highlighted samples.

Time Profiler Instrument

Performance Fixes

So now has come the time to fix our performance issues. For those of you following along in the sample project, search for instances of the kIsOptimized bool and note how it enables or disables performance improvements.

First, you can see in OKConversationAssetFactory and OKConversationSizingFactory that kIsOptimized is preventing the use of our caching layer.

Asset Factory

Also, note that in OKConversationAssetFactory we set both the tint color and the transform on the image itself instead of applying it on the main thread at the view level.

Asset Factory

Also in OKConversationAssetFactory we create an image representation of the message's attributed string in the label size that we will feed in from the OKConversationSizingFactory

Asset Factory

Important Lesson #3:

Apple's Safari app uses a similar text rendering technique to our caching of the attributed string as an image. When a user is scrolling, a cached lower quality image of the content is displayed on screen. This aides with scroll performance. When the user finishes scrolling, that cached image is swapped out for a higher quality, interactive view.

If you take a look at OKConversationViewController Line 287 you will see the code where we listen for a scroll to stop and swap out our image rendering with the tappable text field.

Scrolling

Bringing it all together

All the caching changes are great and all... but they will be effectively useless at solving our performance issues if we don't generate the attributed strings, images and sizes on a background thread.

For your own app, this should be done after an API call returns JSON, but before you update your collection view's data source. See OKConversationViewController Line 163 for how the sample app performs the caching.

Scrolling

Important Lesson #4:

iPhone CPUs have multiple cores, but the main thread (or any thread for that matter) can only execute on one core at a time. If you instead perform multiple tasks simultaneously on background threads the work will happen twice as fast on a 2 core CPU, but the CPU % will be twice as high. This sounds bad for battery life right?

Actually not so! The ability for a CPU to finish work completely and enter a low power mode is better for the phone's battery than executing the same operations on one core over a longer period of time.

Let's talk about the elephant in the room...

I am not using Autolayout constraints in the view classes. Don't get me wrong. Autolayout constraints are amazing in many ways. Let's go over the pros/cons:

Pros

  • Constraints can figure out highly complicated layouts where there are multiple views that influence the position of views around them.

  • Constraints can be constructed quickly on nibs that not only keeps your class code clean, but provides you with a visual editor to see how the layout will work. This includes size class niceties as well.

  • Constraints can communicate with view classes that have an intrinsic size, sparing you code for sizeThatFits: logic.

  • Constraints are easy to read and support by members of your team.

Cons

  • Constraints in nibs cannot be copied to another nib in your app.

  • Constraints in nibs are hard to track in Git if changes are made.

  • Constraints perform all intrinsic size calculations (sizeThatFits) on the main thread each time the cell comes on screen.

  • Constraints require outlets, which can add a lot of code to your class, not only as outlet properties, but also anywhere you manipulate them. Take a look at OKMessageCell and you will see that the operation to switch from messageLabel to messageLabelImageView is easily done just by setting one frame to .zero and the other to the cached label frame.

  • Constraints if not done explicitly add CPU overhead to calculate. For example:
    ✅ view.trailing == constant
    ❌ view.trailing <= otherView.leading

A few notes on nibs specifically...

  • Nibs add to the compile time of an app.
  • Nibs cannot be worked on by two developers at the same time without causing messy diffs in Git.

In summary

Well I hope you learned something new from this tutorial! To summarize the main points, always remember:

  • Test your app on older devices.
  • Time Profile early and often.
  • Only add external dependencies as a last resort.
  • When testing app performance, always consider many different scenarios.
  • Cache on a background thread to improve performance and battery life.
  • UICollectionView sizeThatFits is called for every cell, make sure your sizing logic is as minimal and optimized as possible.
  • Constraints can often be a performance bottleneck if used improperly.



OkCupid is hiring for iOS! Click here to learn more.