Bartosz Ciechanowski

Mesh Transforms

I’m a huge fan of transform property. Combining rotations, translations and scalings is on of the easiest way to modify a shape of a UIView or a CALayer. While easy to use, regular transforms are quite limited in what they can achieve – a rectangular shape of a layer can be transformed into an arbitrary quadrilateral. It’s nothing to sneeze at, but there are much more powerful toys out there.

This article is focused on mesh transforms. The core idea of a mesh transform is very straightforward: you introduce a set of vertices in the layer then you move them around deforming the entire contents:

A mesh transform

The major part of this post is dedicated to a private Core Animation API that’s been part of the framework since iOS 5.0. If at this point you’re leaving this page to prevent spoiling your mind with deliciousness of private API, fear not. In the second section of the article I present an equivalent, open-sourced alternative.

CAMeshTransform

The first time I saw iOS-runtime headers I was mesmerized. The descriptions of so many private classes and hidden properties were extremely eyeopening. One of my most intriguing findings was CAMeshTransform class and a corresponding meshTransform property on CALayer. I badly wanted to figure it all out and recently I finally did. While seemingly complex, the concepts behind a mesh transform are easy to grasp. Here’s how a convenience construction method of CAMeshTransform looks like:

+ (instancetype)meshTransformWithVertexCount:(NSUInteger)vertexCount
                                    vertices:(CAMeshVertex *)vertices
                                   faceCount:(NSUInteger)faceCount
                                       faces:(CAMeshFace *)faces
                          depthNormalization:(NSString *)depthNormalization;

This method clearly describes the basic components of CAMeshTransform – vertices, faces and a a string describing depth normalization. We will tackle these components one by one.

Disclaimer: unfortunately, names of fields inside a struct are lost during compilation so I had to come up with reasonable descriptions on my own. While original names are most likely different, their intention remains the same.

A Vertex

CAMeshVertex is a simple struct with two fields:

typedef struct CAMeshVertex {
    CGPoint from;
    CAPoint3D to;
} CAMeshVertex;

CAPoint3D is very similar to a regular CGPoint – it’s trivially extended by a missing z coordinate:

typedef struct CAPoint3D {
    CGFloat x;
    CGFloat y;
    CGFloat z;
} CAPoint3D;

With that in mind the purpose of a CAMeshVertex is easily inferred: it describes the mapping between the flat point on the surface of a layer and the transformed point located in a 3D space. CAMeshVertex defines the following action: “take this point from the layer and move it to that position”. Since CAPoint3D field has x, y and z components, a mesh transform doesn’t have to be flat:

A vertex moves from a 2D point on a layer to a 3D point in a space

A Face

CAMeshFace is simple as well:

typedef struct CAMeshFace {
    unsigned int indices[4];
    float w[4];
} CAMeshFace;

The indices array describes which four vertices a face is spanned on. Since CAMeshTransform is defined by an array of vertices, a CAMeshFace can reference vertices it’s build with by their indexes in vertices array. This is a standard paradigm in a computer graphics and it’s very convenient – many faces can point at the same vertex. This not only removes the problem of data duplication, but also makes it easy to continuously modify the shape of all the attached faces:

Faces are defined by their vertices

As for the w field of CAMeshFace, we’ll temporarily postpone its discussion.

Coordinates

With an overview of the vertices and faces at hand it’s still not obvious what values should we put inside a CAMeshVertex. While the vast majority of CALayer’s properties are defined in points, there are a few that make use of unit coordinates, the anchorPoint being probably the most popular one. CAMeshVertex makes use of unit coordinates as well. The from point of {0.0, 0.0} corresponds to a top left corner of the layer and {1.0, 1.0} point corresponds to bottom right corner of the layer. The to point uses exactly the same coordinate system:

Vertices are defined in unit coordinates

The reason for using unit coordinates is introduced in Core Animation Programming Guide:

Unit coordinates are used when the value should not be tied to screen coordinates because it is relative to some other value

The best thing about unit coordinates is that they’re size invariant. You can reuse the exact same mesh to transform both small and large views and it will all work just fine. I believe this was the main reason for choosing them as units of CAMeshTransform.

Modifying Mesh Transforms

One of the drawbacks of creating a regular CAMeshTransform is that it’s immutable and all the vertices and faces have to be defined before the transform is created. Thankfully, a mutable subclass named CAMutableMeshTransform also exists and this one does allow adding, removing and replacing vertices and faces at any time.

Both mutable and immutable mesh transform have a subdivisionSteps property that describes how many mesh subdivisions should be performed by the framework when layer gets rendered on screen. Number of splits grows exponentially, so setting the value of property to 3 will divide each edge to 8 pieces. The default value of subdivisionSteps is -1 and it usually makes the meshes look smooth. I assume it tries to automatically adjust the number of subdivisons to make the final result look good.

What’s not obvious, for non 0 values of subdivisionSteps generated mesh doesn’t go through all of its vertices:

Shape of subdivided mesh vs its vertices

In fact, the vertices are control points of a surface and by observing how they influence the shape I suspect CAMeshTransform actually defines a cubic NURBS surface. Here’s where w field of CAMeshFace comes back. Setting a value at one of the four indices of the w array seems to influence the weight of the corresponding vertex. The factor doesn’t seem to be the weight as defined in NURBS equation. Unfortunately, I couldn’t force myself to get through literally hundreds lines of floating point assembly to figure out what’s going on.

Even though NURBS surfaces are extremely powerful, the fact that they don’t go through the defining vertices is quite a kicker. When I was designing my meshes I wanted to have total control over what’s going on and how the generated mesh looked like so I usually set subdivisionSteps property to 0.

Applying Mesh Transform

On its own CAMeshTransform is of little use, but it can be easily assigned to a private property of a CALayer:

@property (copy) CAMeshTransform *meshTransform;

The following piece of code creates a wavy mesh transform. It’s excessively verbose for the purpose of demonstrating how all the pieces fit together. With a bunch of convenience methods, the same effect can be created within literally few lines of code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
- (CAMeshTransform *)wavyTransform
{
    const float Waves = 3.0;
    const float Amplitude = 0.15;
    const float DistanceShrink = 0.3;

    const int Columns = 40;

    CAMutableMeshTransform *transform = [CAMutableMeshTransform meshTransform];
    for (int i = 0; i <= Columns; i++) {

        float t = (float)i / Columns;
        float sine = sin(t * M_PI * Waves);

        CAMeshVertex topVertex = {
            .from = {t, 0.0},
            .to   = {t, Amplitude * sine * sine + DistanceShrink * t, 0.0}
        };
        CAMeshVertex bottomVertex = {
            .from = {t, 1.0},
            .to   = {t, 1.0 - Amplitude + Amplitude * sine * sine - DistanceShrink * t, 0.0}
        };

        [transform addVertex:topVertex];
        [transform addVertex:bottomVertex];
    }

    for (int i = 0; i < Columns; i++) {
        unsigned int topLeft     = 2 * i + 0;
        unsigned int topRight    = 2 * i + 2;
        unsigned int bottomRight = 2 * i + 3;
        unsigned int bottomLeft  = 2 * i + 1;

        [transform addFace:(CAMeshFace){.indices = {topLeft, topRight, bottomRight, bottomLeft}}];
    }

    transform.subdivisionSteps = 0;

    return transform;
}

Here’s a UILabel with a mesh transform applied:

Mesh-transformed UILabel

It’s worth pointing out that you might often see different results depending on whether the app is run on simulator or device. Apparently the iOS simulator version of Core Animation uses a software renderer for its 3D stuff and it’s a different software renderer than the one used for OpenGL ES. This is especially visible with patterned textures.

Leaky Abstractions

When you look closer at the mesh-transformed UILabel on a retina device, you’ll notice that its text quality is subpar. It turns out it can be easily improved with a single property assignment:

label.layer.rasterizationScale = [UIScreen mainScreen].scale;

This is a giveaway of how it all might work under the hood. The contents of layer and all its sublayers get rasterized into a single texture that later gets applied to the vertex mesh. In theory, the rasterization process could be avoided by generating the correct meshes for all the sublayers of the transformed layer by making them perfectly overlap their respective superlayers. In general case however, vertices of the sublayers would be placed in-between vertices of parent layer which would surely cause a nasty z-fighting. Rasterization looks like a good solution.

The other problem I noticed has its roots in hardware. CAMeshTransform provides a nice abstraction of a face which is nothing more than a quadrilateral, also known as a quad. However, modern GPUs are only interested in rendering triangles. Before it is sent to the GPU a quad has to be split into two triangles. This process can be done in two distinct ways by choosing either diagonal as a separating edge:

Two different divisions of same quad into triangles

It might not seem like a big deal, but performing a seemingly similar transform can produce vastly different results:

Symmetrical meshes, asymmetrical results

Notice that the shapes of mesh transforms are perfectly symmetrical, yet the result of their action is not. In the left mesh only one of the triangles actually gets transformed. In the right mesh both triangles do. It shouldn’t be hard to guess which diagonal does Core Animation use for its quad subdivision. Note that the effect will also happen for the exact meshes if you change the order of indices inside their respective faces.

Even though the small issues caused by rasterization and triangulation are leaky abstractions and can’t be completely ignored, they seem to be the only viable solutions to the complexity they try to mask.

Adding Depth

The unit coordinates are a neat idea and they work great for both width and height. However, we don’t have any way to define a third dimension – a size field of layer’s bounds has merely two dimensions. One unit of width is equal to bounds.size.width points and height works correspondingly. How can one specify how many points does one unit of depth have? Authors of Core Animation have solved this problem in a very simple but surprisingly effective way.

A depthNormalization property of CAMeshTransform is an NSString that can legally by set to one of the six following constants:

extern NSString * const kCADepthNormalizationNone;
extern NSString * const kCADepthNormalizationWidth;
extern NSString * const kCADepthNormalizationHeight;
extern NSString * const kCADepthNormalizationMin;
extern NSString * const kCADepthNormalizationMax;
extern NSString * const kCADepthNormalizationAverage;

Here’s the trick: CAMeshTransform evaluates the depth normalization as a function of the other two dimensions. The constant names are self-explanatory, but let’s get through a quick example. Let’s assume the depthNormalization is set to kCADepthNormalizationAverage and the layer bounds are equal to CGRectMake(0.0, 0.0, 100.0, 200.0). Since we picked the average normalization, one unit of depth will map to 150.0 points. A CAMeshVertex with to coordinates of {1.0, 0.5, 1.5} will map to a 3D point with coordinates equal to {100.0, 100.0, 225.0}:

Converting from units to points

Why go through the trouble of converting unit coordinates to points? It’s because of a transform property of a CALayer and its type – CATransform3D. Components of CATransform3D are defined in terms of points. You can actually apply any transform to the layer itself and it will influence its vertices as well. The z coordinate translation and a perspective transform come to mind as a major beneficiaries of this feature.

At this point we could create another example, this time with depthNormalization not equal to the default kCADepthNormalizationNone. The results would be quite disappointing – everything would look flat. The depth added by non-zero z coordinates of vertices is very unconvincing. We can skip this step altogether and add a missing component that would emphasize the slopes and curvatures of the mesh – the shading.

Meeting Prometheus

Since we’ve already opened Pandora’s box of private Core Animation classes, we might as well use another one. At this point it should come as no surprise that a class named CALight exists and it’s actually very useful since CALayer has a private, NSArray-typed lights property.

A CALight is created with + (id)lightWithType:(NSString *)lightType convenience method and the lightType argument can be one of the following four values:

extern NSString * const kCALightTypeAmbient;
extern NSString * const kCALightTypeDirectional;
extern NSString * const kCALightTypePoint;
extern NSString * const kCALightTypeSpot;

I’m not going to discuss CALight in details, so let’s jump straight to an example. This time we’re going to use two hand-made CAMutableMeshTransform convenience methods. The first one, identityMeshTransformWithNumberOfRows:numberOfColumns:, creates a mesh with uniformly spread vertices that don’t introduce any disturbances. Then we’re going to modify those vertices by mapVerticesUsingBlock: method that maps all vertices to some other vertices.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CALight *light = [CALight new]; // directional light by default
[label.superview.layer setLights:@[light]]; // has to be applied to superlayer

CAMutableMeshTransform *meshTransform = [CAMutableMeshTransform identityMeshTransformWithNumberOfRows:50 numberOfColumns:50];
[meshTransform mapVerticesUsingBlock:^CAMeshVertex(CAMeshVertex vertex, NSUInteger vertexIndex) {
    float x = vertex.from.x - 0.5f;
    float y = vertex.from.y - 0.5f;

    float r = sqrtf(x * x + y * y);

    vertex.to.z = 0.05 * sinf(r * 2.0 * M_PI * 4.0);

    return vertex;
}];
label.layer.meshTransform = meshTransform;

CATransform3D transform = CATransform3DMakeRotation(-M_PI_4, 1, 0, 0);
transform.m34 = -1.0/800.0; // some perspective
label.layer.transform = transform;

And here’s the result of applying the code to a square UILabel:

CALight, CAMeshTransform, and CATransform3D all working together

While the lighting looks a little bit cheesy, it certainly is impressive how easy it is to do quite complex effects.

CALight seems to have tuneable ambient, diffuse and specular intensities – a standard set of coefficients of Phong reflection model. Moreover, CALayer has corresponding surface reflectance properties. I played with these for a few minutes and I didn’t really get anywhere, but I cleaned-up the private headers so it should be much easier to test the lighting capabilities of Core Animation.

Private for a Reason

One of the most important reasons for keeping an API private is that it doesn’t have to be bullet proof and CAMeshTransform certainly is not. There are a few ways to get hurt.

To begin with, assigning 20 to subdivisionSteps property is probably the easiest way to programatically reboot your device. A set of memory warnings spilled into console is a clear indication of what’s going on. This is certainly annoying, but can be easily avoided – don’t touch the property or set it to 0.

If one of the faces you provide is degenerated, e.g. all of its indices point to the same vertex, you will hang your device. Everything will stop working, including the hardware buttons (!) and only a hard restart will help (long press home + power buttons). The framework doesn’t seem to be prepared for malformed input.

Why do these problems happen? It’s because of the backboardd – a process that’s, among other activities, acting as a render server for Core Animation. Technically, it’s not the app itself that makes the system crash, it’s the indirect misuse of one of the core components of iOS that causes all the troubles.

Missing Features

The idea of a general purpose mesh-transformable layer is complex enough that Core Animation team had to cut some corners and skip some of the potential features.

Core Animation allows mesh-transformed layers to have an alpha channel. Rendering semitransparent objects correctly is not a trivial problem. It’s usually done with a Painter’s algorithm. The z-sorting step is not hard to implement and indeed the code does seem to execute a radix sort call which is quite clever, since floats can be sorted with radix sort as well. However, it’s not enough to sort the triangles as some of them may overlap or intersect.

The usual solution to this problem is to divide the triangles so that all the edge cases are removed. This part of the algorithm seems to be not implemented. Granted, correct, good-looking meshes should rarely overlap in a tricky way, but sometimes it does happen and the mesh-transformed layer may look glitchy.

Another feature that’s been completely ignored is hit testing – the layer behaves as if it hasn’t been mesh-transformed at all. Since neither CALayer’s nor UIView’s hitTest: method are aware of mesh, the hit test area of all the controls will rarely match their visual representation:

Hit test area of an embedded UISwitch is not affected by mesh transform

The solution to this problem would be to shoot a ray through the space, figure out which triangle has been hit, project the hit point from the 3D space back into the 2D space of the layer and then do the regular hit testing. Doable, but not easy.

Replacing Private API

Taking into account all the drawbacks of CAMeshTransform one could argue it’s a faulty product. It’s not. It’s just amazing. It opens up the entire new spectrum of interaction and animation on iOS. It’s a breath of fresh air in a world of plain old transforms, fade-ins and blurs. I badly wanted to mesh-transform everything, but I can’t consciously rely on that many lines of private API calls. So I wrote an open-sourced and very closely matching replacement.

In the spirit of CAMeshTransform I created a BCMeshTransform which copies almost every feature of the original class. My intention was clear: if CAMeshTransform ever ships, you should be able to use the exact same mesh transforms on any CALayer and achieve extremely similar, if not exact, results. The only required step would be to find and replace BC class prefix with CA.

With a transform class in hand the only thing that’s missing is a target of a mesh transform. For this purpose I created a BCMeshTransformView, a UIView subclass that has a meshTransform property.

Without direct, public access to Core Animation render server I was forced to use OpenGL for my implementation. This is not a perfect solution as it introduces some drawbacks the original class didn’t have, but it seems to be the only currently available option.

A Few Tricks

When I was creating the classes I encountered a few challenges and it probably won’t hurt to discuss my solutions to these problems.

Animating with UIView Animation Block

It turns out it’s not that hard to write a custom animatable property of any class. David Rönnqvist has pointed out in his presentation on UIView animations that a CALayer asks its delegate (a UIView owning this layer) for an action when any of its animatable properties is set.

If we’re inside an animation block then UIView will return an animation as a result of an actionForKey:method call. With a CAAnimation in hand we can check its properties to figure out what animation parameters does the block based animation have.

My initial implementation looked like this:

1
2
3
4
5
6
7
8
9
10
- (void)setMeshTransform:(BCMeshTransform *)meshTransform
{
    CABasicAnimation *animation = [self actionForLayer:self.layer forKey:@"opacity"];
    if ([animation isKindOfClass:[CABasicAnimation class]]) {
      // we're inside an animation block
      NSTimeInterval duration = animation.duration;
      ...
    }
    ...
}

I quickly realized it was an invalid approach – the completion callback did not fire. When a block based animation is made, UIKit creates an instance of UIViewAnimationState and sets it as a delegate of any CAAnimation created within the block. What I suspect also happens, UIViewAnimationState waits for all the animations it owns to finish or get cancelled before firing the completion block. Since I was obtaining the animation just for the purpose of reading its properties, it hasn’t been added to any layer and thus it never finished.

The solution for this problem was much less complicated than I expected. I added a dummy view as a subview of BCMeshTransformView itself. Here’s the code I’m currently using:

1
2
3
4
5
6
7
8
9
10
11
12
- (void)setMeshTransform:(BCMeshTransform *)meshTransform
{
    [self.dummyAnimationView.layer removeAllAnimations];
    self.dummyAnimationView.layer.opacity = 1.0;
    self.dummyAnimationView.layer.opacity = 0.0;
    CAAnimation *animation = [self.dummyAnimationView.layer animationForKey:@"opacity"];

    if ([animation isKindOfClass:[CABasicAnimation class]]) {
      // we're inside UIView animation block
    }
    ...
}

The double opacity assignment is needed to ensure the property changes it value. The animation will not be added to a layer if it’s already in the destination state. Moreover, a layer has to be in view hierarchy of any UIWindow, otherwise its properties won’t get animated.

As for animating the meshes themselves, it’s possible to force Core Animation to interpolate any float values by packing them in NSNumber, shoving them into NSArray, implementing needsDisplayForKey: class method and responding to presentation layer changes inside setValue:forKey: method. While very convenient, this approach has some serious performance issues. Meshes with 25x25 faces were not animated with 60 FPS, even on the iPad Air. The cost of packing and unpacking is very high.

Instead of pursuing the Core Animation way, I used a very simple animation engine powered by CADisplayLink. This approach is much more performant, handling 100x100 faces with butter smooth 60 FPS. It’s not a perfect solution, we’re loosing many conveniences of CAAnimation, but I believe the 16x speed boost is worth the trouble.

Rendering Content

The core purpose of BCMeshTransformView is to display its mesh-transformed subviews. The view hierarchy has to be rendered into a texture before its submitted to OpenGL. The textured vertex mesh then gets displayed by GLKView which is the the main workhorse of BCMeshTransformView. This high level overview is straightforward, but it doesn’t mention the problem of snapshotting the subview hierarchy.

We don’t want to snapshot the GLKView itself as this would quickly create a mirror-tunnel like effect. On top of that, we don’t want to display the other subviews directly – they’re supposed to be visible inside the OpenGL world, not within the UIKit view hierarchy. They can’t be put beneath the GLKView as it has to be non opaque. To solve these issues I came up with a concept of a contentView, similarly to how UITableViewCell handles its user defined subviews. Here’s how a view hierarchy looks:

The view hierarchy of BCMeshTransformView

The contentView is embedded inside a containerView. The containerView has a frame of CGRectZero and clipsToBounds property set to YES, making it invisible to the user but still within the reach of BCMeshTransformView. Every subview that should get mesh-transformed must be added to contentView.

A content of contentView is rendered into a texture using drawViewHierarchyInRect:afterScreenUpdates:. The entire process of snapshotting and uploading texture is very fast, but unfortunately for larger views it takes more than 16 milliseconds. This is too much to render the hierarchy on every frame. Even though BCMeshTransformView automatically observes changes of its contentView subviews and re-renders the texture on its own, it doesn’t support animations inside the meshed subviews.

Final Words

Without a doubt, a mesh transform is a fantastic concept, yet it seems so unexplored in the world of interfaces. It certainly adds more playfulness to otherwise dull screens. In fact, you can experience mesh transforms today, on your iOS device, by launching Game Center and watching the subtle deformations of bubbles. This is CAMeshTransform working its magic.

I encourage you to check out the demo app I made for BCMeshTransformView. It contains a few ideas of how a mesh transform can be used to enrich interaction, like my very simple, but functional take on that famous Dribbble. For more inspiration on some sweet meshes, Experiments by Marcus Eckert is a great place to start.

I wholeheartedly hope BCMeshTransformView becomes obsolete on the first day of WWDC 2014. The Core Animation implementation of mesh transforms has more features and is more tightly integrated with the system. Although it currently doesn’t handle all the edge cases correctly, with a bit of polishing it surely could. Fingers crossed for June 2.

Exposing NSDictionary

Hash tables are just awesome. To this day I find it fascinating that one can fetch an object corresponding to an arbitrary key in a constant time. Although iOS 6.0 introduced an explicit hash table, it is NSDictionary that’s almost exclusively used for associative storage.

NSDictionary doesn’t make any promise of its internal implementation. It would make little sense for a dictionary to store its data in a completely random fashion. However, this assumption doesn’t answer the key question: does NSDictionary make use of a hash table? This is what I’ve decided to investigate.

Why not to tackle the full featured NSMutableDictionary? A mutable dictionary is, understandably, much more complex and the amount of disassembly I would have had to go through was terrifying. Regular NSDictionary still provided a nontrivial ARM64 deciphering challenge. Despite being immutable, the class has some very interesting implementation details which should make the following ride enjoyable.

This blog post has a companion repo which contains discussed pieces of code. While the entire investigation has been based on the iOS 7.1 SDK targeting a 64-bit device, neither iOS 7.0 nor 32-bit devices impact the findings.

The Class

Plenty of Foundation classes are class clusters and NSDictionary is no exception. For quite a long time NSDictionary used CFDictionary as its default implementation, however, starting with iOS 6.0 things have changed:

1
2
(lldb) po [[NSDictionary new] class]
__NSDictionaryI

Similarly to __NSArrayM, __NSDictionaryI rests within the CoreFoundation framework, in spite of being publicly presented as a part of Foundation. Running the library through class-dump generates the following ivar layout:

1
2
3
4
5
@interface __NSDictionaryI : NSDictionary
{
    NSUIngeter _used:58;
    NSUIngeter _szidx:6;
}

It’s surprisingly short. There doesn’t seem to be any pointer to either keys or objects storage. As we will soon see, __NSDictionary literally keeps its storage to itself.

The Storage

Instance Creation

To understand where __NSDictionaryI keeps its contents, let’s take a quick tour through the instance creation process. There is just one class method that’s responsible for spawning new instances of __NSDictionaryI. According to class-dump, the method has the following signature:

+ (id)__new:(const id *)arg1:(const id *)arg2:(unsigned long long)arg3:(_Bool)arg4:(_Bool)arg5;

It takes five arguments, of which only the first one is named. Seriously, if you were to use it in a @selector statement it would have a form of @selector(__new:::::). The first three arguments are easily inferred by setting a breakpoint on this method and peeking into the contents of x2, x3 and x4 registers which contain the array of keys, array of objects and number of keys (objects) respectively. Notice, that keys and objects arrays are swapped in comparison to the public facing API which takes a form of:

+ (instancetype)dictionaryWithObjects:(const id [])objects forKeys:(const id <NSCopying> [])keys count:(NSUInteger)cnt;

It doesn’t matter whether an argument is defined as const id * or const id [] since arrays decay into pointers when passed as function arguments.

With three arguments covered we’re left with the two unidentified boolean parameters. I’ve done some assembly digging with the following results: the fourth argument governs whether the keys should be copied, and the last one decides whether the arguments should not be retained. We can now rewrite the method with named parameters:

+ (id)__new:(const id *)keys :(const id *)objects :(unsigned long long)count :(_Bool)copyKeys :(_Bool)dontRetain;

Unfortunately, we don’t have explicit access to this private method, so by using the regular means of allocation the last two arguments are always set to YES and NO respectively. It is nonetheless interesting that __NSDictionaryI is capable of a more sophisticated keys and objects control.

Indexed ivars

Skimming through the disassembly of + __new::::: reveals that both malloc and calloc are nowhere to be found. Instead, the method calls into __CFAllocateObject2 passing the __NSDictionaryI class as first argument and requested storage size as a second. Stepping down into the sea of ARM64 shows that the first thing __CFAllocateObject2 does is call into class_createInstance with the exact same arguments.

Fortunately, at this point we have access to the source code of Objective-C runtime which makes further investigation much easier.

The class_createInstance(Class cls, size_t extraBytes) function merely calls into _class_createInstanceFromZone passing nil as a zone, but this is the final step of object allocation. While the function itself has many additional checks for different various circumstances, its gist can be covered with just three lines:

_class_createInstanceFromZone(Class cls, size_t extraBytes, void *zone)
{
    ...
    size_t size = cls->alignedInstanceSize() + extraBytes;
    ...
    id obj = (id)calloc(1, size);
    ...
    return obj;
}

The extraBytes argument couldn’t have been more descriptive. It’s literally the number of extra bytes that inflate the default instance size. As an added bonus, notice that it’s the calloc call that ensures all the ivars are zeroed out when the object gets allocated.

The indexed ivars section is nothing more than an additional space that sits at the end of regular ivars:

Allocating objects

Allocating space on its own doesn’t sound very thrilling so the runtime publishes an accessor:

void *object_getIndexedIvars(id obj)

There is no magic whatsoever in this function, it just returns a pointer to the beginning of indexed ivars section:

Indexed ivars section

There are few cool things about indexed ivars. First of all, each instance can have different amount of extra bytes dedicated to it. This is exactly the feature __NSDictionaryI makes use of.

Secondly, they provide faster access to the storage. It all comes down to being cache-friendly. Generally speaking, jumping to random memory locations (by dereferencing a pointer) can be expensive. Since the object has just been accessed (somebody has called a method on it), it’s very likely that its indexed ivars have landed in cache. By keeping everything that’s needed very close, the object can provide as good performance as possible.

Finally, indexed ivars can be used as a crude defensive measure to make object’s internals invisible to the utilities like class-dump. This is a very basic protection since a dedicated attacker can simply look for object_getIndexedIvars calls in the disassembly or randomly probe the instance past its regular ivars section to see what’s going on.

While powerful, indexed ivars come with two caveats. First of all, class_createInstance can’t be used under ARC, so you’ll have to compile some parts of your class with -fno-objc-arc flag to make it shine. Secondly, the runtime doesn’t keep the indexed ivar size information anywhere. Even though dealloc will clean everything up (as it calls free internally), you should keep the storage size somewhere, assuming you use variable number of extra bytes.

Looking for Key and Fetching Object

Analyzing Assembly

Although at this point we could poke the __NSDictionaryI instances to figure out how they work, the ultimate truth lies within the assembly. Instead of going through the entire wall of ARM64 we will discuss the equivalent Objective-C code instead.

The class itself implements very few methods, but I claim the most important is objectForKey: – this is what we’re going to discuss in more detail. Since I made the assembly analysis anyway, you can read it on a separate page. It’s dense, but the thorough pass should convince you the following code is more or less correct.

The C Code

Unfortunately, I don’t have access to the Apple’s code base, so the reverse-engineered code below is not identical to the original implementation. On the other hand, it seems to be working well and I’ve yet to find an edge case that behaves differently in comparison to the genuine method.

The following code is written from the perspective of __NSDictionaryI class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
- (id)objectForKey:(id)aKey
{
    NSUInteger sizeIndex = _szidx;
    NSUInteger size = __NSDictionarySizes[sizeIndex];

    id *storage = (id *)object_getIndexedIvars(dict);

    NSUInteger fetchIndex = [aKey hash] % size;

    for (int i = 0; i < size; i++) {
        id fetchedKey = storage[2 * fetchIndex];

        if (fetchedKey == nil) {
            return nil;
        }

        if (fetchedKey == aKey || [fetchedKey isEqual:aKey]) {
            return storage[2 * fetchIndex + 1];
        }

        fetchIndex++;

        if (fetchIndex == size) {
            fetchIndex = 0;
        }
    }

    return nil;
}

When you take a closer look at the C code you might notice something strange about key fetching. It’s always taken from even offsets, while the returned object is at the very next index. This is the dead giveaway of __NSDictionaryI’s internal storage: it keeps keys and objects alternately:

Keys and objects are stored alternately

Update: Joan Lluch provided a very convincing explanation for this layout. The original code could use an array of very simple structs:

struct KeyObjectPair {
    id key;
    id object;
};

The objectForKey: method is very straightforward and I highly encourage you to walk through it in your head. It’s nonetheless worth pointing out a few things. First of all, the _szidx ivar is used as an index into the __NSDictionarySizes array, thus it most likely stands for “size index”.

Secondly, the only method called on the passed key is hash. The reminder of dividing key’s hash value by dictionary’s size is used to calculate the offset into the index ivars section.

If the key at the offset is nil, we simply return nil, the job is done:

When the key slot is empty, nil is returned

However, if the key at the offset is non nil, then the two cases may occur. If the keys are equal, then we return the adjacent object. If they’re not equal then the hash collision occurred and we have to keep looking further. __NSDictionaryI simply keeps looking until either match or nil is found:

Key found after one collision

This kind of searching is known as linear probing. Notice how __NSDictionaryI wraps the fetchIndex around when the storage end is hit. The for loop is there to limit the number of checks – if the storage was full and the loop condition was missing we’d keep looking forever.

__NSDictionarySizes & __NSDictionaryCapacities

We already know __NSDictionarySizes is some kind of array that stores different possible sizes of __NSDictionaryI. We can reason that it’s an array of NSUIntegers and indeed, if we ask Hopper to treat the values as 64-bit unsigned integers it suddenly makes a lot of sense:

                ___NSDictionarySizes:
0x00000000001577a8         dq         0x0000000000000000
0x00000000001577b0         dq         0x0000000000000003
0x00000000001577b8         dq         0x0000000000000007
0x00000000001577c0         dq         0x000000000000000d
0x00000000001577c8         dq         0x0000000000000017
0x00000000001577d0         dq         0x0000000000000029
0x00000000001577d8         dq         0x0000000000000047
0x00000000001577e0         dq         0x000000000000007f
...

In a more familiar decimal form it presents as a beautiful list of 64 primes starting with the following sequence: 0, 3, 7, 13, 23, 41, 71, 127. Notice, that those are not consecutive primes which begs the question: what’s the average ratio of the two neighboring numbers? It’s actually around 1.637 – a very close match to the 1.625 which was the growth factor for NSMutableArray. For details of why primes are used for the storage size this Stack Overflow answer is a good start.

We already know how much storage can __NSDictionaryI have, but how does it know which size index to pick on initialization? The answer lies within the previously mentioned + __new::::: class method. Converting some parts of the assembly back to C renders the following code:

1
2
3
4
5
6
7
8
9
10
int szidx;
for (szidx = 0; szidx < 64; szidx++) {
    if (__NSDictionaryCapacities[szidx] >= count) {
        break;
    }
}

if (szidx == 64) {
    goto fail;
}

The method looks linearly through __NSDictionaryCapacities array until count fits into the size. A quick glance in Hopper shows the contents of the array:

                ___NSDictionaryCapacities:
0x00000000001579b0         dq         0x0000000000000000
0x00000000001579b8         dq         0x0000000000000003
0x00000000001579c0         dq         0x0000000000000006
0x00000000001579c8         dq         0x000000000000000b
0x00000000001579d0         dq         0x0000000000000013
0x00000000001579d8         dq         0x0000000000000020
0x00000000001579e0         dq         0x0000000000000034
0x00000000001579e8         dq         0x0000000000000055
...

Converting to base-10 provides 0, 3, 6, 11, 19, 32, 52, 85 and so on. Notice that those are smaller numbers than the primes listed before. If you were to fit 32 key-value pairs into __NSDictionaryI it will allocate space for 41 pairs, conservatively saving quite a few empty slots. This helps reducing the number of hash collisions, keeping the fetching time as close to constant as possible. Apart from trivial case of 3 elements, __NSDictionaryI will never have its storage full, on average filling at most 62% of its space.

As a trivia, the last nonempty value of __NSDictionaryCapacities is 0x11089481C742 which is 18728548943682 in base-10. You’d have to try really hard not to fit in within the pairs count limit, at least on 64-bit architecture.

Non Exported Symbols

If you were to use __NSDictionarySizes in your code by declaring it as an extern array, you’d quickly realize it’s not that easy. The code wouldn’t compile due to a linker error – the __NSDictionarySizes symbol is undefined. Inspecting the CoreFoundation library with nm utility:

nm CoreFoundation | grep ___NSDictionarySizes

…clearly shows the symbols are there (for ARMv7, ARMv7s and ARM64 respectively):

00139c80 s ___NSDictionarySizes
0013ac80 s ___NSDictionarySizes
0000000000156f38 s ___NSDictionarySizes

Unfortunately the nm manual clearly states:

If the symbol is local (non-external), the symbol’s type is instead represented by the corresponding lowercase letter.

The symbols for __NSDictionarySizes are simply not exported – they’re intended for internal use of the library. I’ve done some research to figure out if it’s possible to link with non-exported symbols, but apparently it is not (please tell me if it is!). We can’t access them. That is to say, we can’t access them easily.

Sneaking in

Here’s an interesting observation: in both iOS 7.0 an 7.1 the kCFAbsoluteTimeIntervalSince1904 constant is laid out directly before __NSDictionarySizes:

                _kCFAbsoluteTimeIntervalSince1904:
0x00000000001577a0         dq         0x41e6ceaf20000000
                ___NSDictionarySizes:
0x00000000001577a8         dq         0x0000000000000000

The best thing about kCFAbsoluteTimeIntervalSince1904 is that it is exported! We’re going to add 8 bytes (size of double) to the address of this constant and reinterpret the result as pointer to NSUInteger:

NSUInteger *Explored__NSDictionarySizes = (NSUInteger *)((char *)&kCFAbsoluteTimeIntervalSince1904 + 8);

Then we can access its values by convenient indexing:

1
2
3
4
5
6
(lldb) p Explored__NSDictionarySizes[0]
(NSUInteger) $0 = 0
(lldb) p Explored__NSDictionarySizes[1]
(NSUInteger) $1 = 3
(lldb) p Explored__NSDictionarySizes[2]
(NSUInteger) $2 = 7

This hack is very fragile and will most likely break in the future, but this is the test project so it’s perfectly fine.

__NSDictionaryI Characteristics

Now that we’ve discovered the internal structure of __NSDictionaryI we can use this information to figure out why things work they way they work and what unforeseen consequences does the present implementation of __NSDictionaryI introduce.

Printout Code

To make our investigation a little bit easier we will create a helper NSDictionary category method that will print the contents of the instance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
- (NSString *)explored_description
{
    assert([NSStringFromClass([self class]) isEqualToString:@"__NSDictionaryI"]);

    BCExploredDictionary *dict = (BCExploredDictionary *)self;

    NSUInteger count = dict->_used;
    NSUInteger sizeIndex = dict->_szidx;
    NSUInteger size = Explored__NSDictionarySizes[sizeIndex];

    __unsafe_unretained id *storage = (__unsafe_unretained id *)object_getIndexedIvars(dict);

    NSMutableString *description = [NSMutableString stringWithString:@"\n"];

    [description appendFormat:@"Count: %lu\n", (unsigned long)count];
    [description appendFormat:@"Size index: %lu\n", (unsigned long)sizeIndex];
    [description appendFormat:@"Size: %lu\n", (unsigned long)size];

    for (int i = 0; i < size; i++) {
        [description appendFormat:@"[%d] %@ - %@\n", i, [storage[2*i] description], [storage[2*i + 1] description]];
    }

    return description;
}

Order of keys/objects on enumeration is the same as order of keys/objects in storage

Let’s create a simple dictionary containing four values:

1
2
3
4
5
6
NSDictionary *dict = @{@1 : @"Value 1",
                       @2 : @"Value 2",
                       @3 : @"Value 3",
                       @4 : @"Value 4"};

NSLog(@"%@", [dict explored_description]);

The output of the explored description is:

Count: 4
Size index: 2
Size: 7
[0] (null) - (null)
[1] 3 - Value 3
[2] (null) - (null)
[3] 2 - Value 2
[4] (null) - (null)
[5] 1 - Value 1
[6] 4 - Value 4

With that in mind let’s do a quick enumeration over dictionary:

1
2
3
[dict enumerateKeysAndObjectsUsingBlock:^(id key, id obj, BOOL *stop) {
    NSLog(@"%@ - %@", key, obj);
}];

And the output:

3 - Value 3
2 - Value 2
1 - Value 1
4 - Value 4

Enumeration seems to simply walk through the storage, ignoring the nil keys and calling the block only for non-empty slots. This is also the case for fast enumeration, keyEnumerator, allKeys and allValues methods. It makes perfect sense. The NSDictionary is not ordered, so it doesn’t really matter in what sequence the keys and values are provided. Using the internal layout is the easiest and probably the fastest option.

If you mess up, __NSDictionaryI may return something for nil key!

Let’s consider an example. Imagine we’re building a simple 3D strategy game set in space. The entire universe is split into cube-like sectors that imaginary factions can fight over. A sector can be referenced by its i, j and k indexes. We shouldn’t use a 3D array to store the sectors info – the game space is huge and most of it is empty, so we would waste memory storing nil pointers. Instead, we’re going to use a sparse storage in a form of NSDictionary with a custom key class that will make it super easy to query if there is something at a given location.

Here’s the interface for key, a BC3DIndex class:

1
2
3
4
5
6
7
@interface BC3DIndex : NSObject <NSCopying>

@property (nonatomic, readonly) NSUInteger i, j, k; // you actually can do that

- (instancetype)initWithI:(NSUInteger)i j:(NSUInteger)j k:(NSUInteger)k;

@end

And its equally trivial implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@implementation BC3DIndex

- (instancetype)initWithI:(NSUInteger)i j:(NSUInteger)j k:(NSUInteger)k
{
    self = [super init];
    if (self) {
        _i = i;
        _j = j;
        _k = k;
    }
    return self;
}

- (BOOL)isEqual:(BC3DIndex *)other
{
    return other.i == _i && other.j == _j && other.k == _k;
}

- (NSUInteger)hash
{
    return _i ^ _j ^ _k;
}

- (id)copyWithZone:(NSZone *)zone
{
    return self; // we're immutable so it's OK
}

@end

Notice how we’re being a good subclassing citizen: we implemented both isEqual: and hash methods and made sure that if two 3D-indexes are equal then their hash values are equal as well. The object equality requirements are fulfilled.

Here’s a trivia: what will the following code print?

1
2
3
4
5
NSDictionary *indexes = @{[[BC3DIndex alloc] initWithI:2 j:8 k:5] : @"A black hole!",
                          [[BC3DIndex alloc] initWithI:0 j:0 k:0] : @"Asteroids!",
                          [[BC3DIndex alloc] initWithI:4 j:3 k:4] : @"A planet!"};

NSLog(@"%@", [indexes objectForKey:nil]);

It should be (null) right? Nope:

Asteroids!

To investigate this further let’s grab a dictionary’s description:

Count: 3
Size index: 1
Size: 3
[0] <BC3DIndex: 0x17803d340> - A black hole!
[1] <BC3DIndex: 0x17803d360> - Asteroids!
[2] <BC3DIndex: 0x17803d380> - A planet!

It turns out __NSDictionaryI doesn’t check if the key passed into objectForKey: is nil (and I’d argue this is a good design decision). Calling hash method on nil returns 0, which causes the class to compare key at index 0 with nil. This is important: it is the stored key that executes the isEqual: method, not the passed in key.

The first comparison fails, since i index for “A black hole!” is 2 whereas for nil it’s zero. The keys are not equal which causes the dictionary to keep looking, hitting another stored key: the one for “Asteroids!”. This key has all three i, j, and k properties equal to 0 which is also what nil will return when asked for its properties (by the means of nil check inside objc_msgSend).

This is the crux of the problem. The isEqual: implementation of BC3DIndex may, under some conditions, return YES for nil comparison. As you can see, this is a very dangerous behavior that can mess things up easily. Always ensure that your object is not equal to nil.

A Helper Key Class

For the next two tests we’re going to craft a special key class that will have a configurable hash value and will print stuff to the console when executing hash and isEqual: method.

Here’s the interface:

1
2
3
4
5
6
7
@interface BCNastyKey : NSObject <NSCopying>

@property (nonatomic, readonly) NSUInteger hashValue;

+ (instancetype)keyWithHashValue:(NSUInteger)hashValue;

@end

And the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
@implementation BCNastyKey

+ (instancetype)keyWithHashValue:(NSUInteger)hashValue
{
    return [[BCNastyKey alloc] initWithHashValue:hashValue];
}

- (instancetype)initWithHashValue:(NSUInteger)hashValue
{
    self = [super init];
    if (self) {
        _hashValue = hashValue;
    }
    return self;
}

- (id)copyWithZone:(NSZone *)zone
{
    return self;
}

- (NSUInteger)hash
{
    NSLog(@"Key %@ is asked for its hash", [self description]);

    return _hashValue;
}

- (BOOL)isEqual:(BCNastyKey *)object
{
    NSLog(@"Key %@ equality test with %@: %@", [self description], [object description], object == self ? @"YES" : @"NO");

    return object == self;
}

- (NSString *)description
{
    return [NSString stringWithFormat:@"(&:%p #:%lu)", self, (unsigned long)_hashValue];
}

@end

This key is awful: we’re only equal to self, but we’re returning an arbitrary hash. Notice that this doesn’t break the equality contract.

isEqual doesn’t have to be called to match the key

Let’s create a key and a dictionary:

1
2
BCNastyKey *key = [BCNastyKey keyWithHashValue:3];
NSDictionary *dict = @{key : @"Hello there!"};

The following call:

[dict objectForKey:key];

Prints this to the console:

Key (&:0x17800e240 #:3) is asked for its hash

As you can see, the isEqual: method has not been called. This is very cool! Since the vast majority of keys out there are NSString literals, they share the same address in the entire application. Even if the key is a very long literal string, the __NSDictionaryI won’t run the potentially time consuming isEqual: method unless it absolutely has to. And since 64-bit architectures introduced tagged pointers, some instances of NSNumber, NSDate and, apparently, NSIndexPath benefit from this optimization as well.

Worst case performance is linear

Let’s create a very simple test case:

1
2
3
4
5
6
7
8
9
BCNastyKey *targetKey = [BCNastyKey keyWithHashValue:36];

NSDictionary *b = @{[BCNastyKey keyWithHashValue:1] : @1,
                    [BCNastyKey keyWithHashValue:8] : @2,
                    [BCNastyKey keyWithHashValue:15] : @3,
                    [BCNastyKey keyWithHashValue:22] : @4,
                    [BCNastyKey keyWithHashValue:29] : @5,
                    targetKey : @6
                    };

A single killer line:

NSLog(@"Result: %@", [[b objectForKey:targetKey] description]);

Reveals the disaster:

Key (&:0x170017640 #:36) is asked for its hash
Key (&:0x170017670 #:1) equality test with (&:0x170017640 #:36): NO
Key (&:0x170017660 #:8) equality test with (&:0x170017640 #:36): NO
Key (&:0x170017680 #:15) equality test with (&:0x170017640 #:36): NO
Key (&:0x1700176e0 #:22) equality test with (&:0x170017640 #:36): NO
Key (&:0x170017760 #:29) equality test with (&:0x170017640 #:36): NO
Result: 6

This is extremely pathological case – every single key in the dictionary has ben equality tested. Even though each hash was different, it still collided with every other key, because the keys’ hashes were congruent modulo 7, which turned out to be the storage size of the dictionary.

As mentioned before, notice that the last isEqual: test is missing. The __NSDictionaryI simply compared the pointers and figured out it must be the same key.

Should you care for this linear time fetching? Absolutely not. I’m not that into probabilistic analysis of hash distribution, but you’d have to be extremely lucky for all your hashes to be modulo congruent to dictionary’s size. Some collisions here and there will always happen, this is the nature of hash tables, but you will probably never run into the linear time issue. That is, unless you mess up your hash function.

Final Words

I’m fascinated how simple the __NSDictionaryI turned out to be. Needless to say, the class certainly serves its purpose and there’s no need to make things excessively complex. For me, the most beautiful aspect of the implementation is the key-object-key-object layout. This is a brilliant idea.

If you were to take one tip from this article then I’d go with watching out for your hash and isEqual: methods. Granted, one rarely writes custom key classes to be used in a dictionary, but those rules apply to NSSet as well.

I’m aware that at some point in time NSDictionary will change and my findings will become obsolete. Internalizing the current implementation details may become a burden in the future, when the memorized assumptions will no longer apply. However, right here and right now it’s just so much fun to see how things work and hopefully you share my excitement.

Exposing NSMutableArray

I’ve always wondered how NSMutableArray works internally. Don’t get me wrong, immutable arrays certainly provide enormous benefits. Not only are they thread safe but also copying them is essentially free. It doesn’t change the fact that they’re quite dull – their contents can’t be modified. I find the actually memory manipulation details fascinating and this is why this article focuses on mutable arrays.

Since I more or less describe the complete process I used to investigate NSMutableArray, this post is fairly technical. There is an entire section discussing the ARM64 assembly, so if you find that boring then do not hesitate to skip it. Once we’re through the low level details, I present the non obvious characteristics of the class.

Implementation details of NSMutableArray are private for a reason. They’re subject to change at any time, both in terms of underlying subclasses and their ivar layouts, as well as underpinning algorithms and data structures. Regardless of those caveats, it’s worth peeking under the hood of NSMutableArray and figuring out how it works and what can we expect of it. The following study is based on iOS 7.0 SDK.

As usual, you can find the accompanying Xcode project on my GitHub.

The Problem of Plain Old C Arrays

Every self-respecting programmer knows how C arrays work. It boils down to a continuous segment of memory that can be easily read from and written into. While arrays and pointers are not the same (see Expert C Programming or this article), it’s not much of an abuse to consider “malloc-ed block of memory” as an array.

One of the most obvious drawbacks of using a linear memory is that inserting an element at index 0 requires shifting all the other elements via means of memmove:

Inserting to C array at index 0

Similarly, removing the first element requires shifting action as well, assuming one wants to keep the same memory pointer as an address of first element:

Removing from C array at index 0

For very large arrays this quickly becomes a problem. Obviously, direct pointer access is not necessarily the highest level of abstraction in the world of arrays. While C-style arrays are often useful, the daily bread and butter of Obj-C programmers wanting a mutable, indexed container is NSMutableArray.

NSMutableArray

Diving in

Even though Apple publishes source code for many libraries, Foundation and its NSMutableArray is not open sourced. However, there are some tools that make unrevealing its mysteries a little bit easier. We begin our journey at the highest possible level, reaching into the lower layers to get otherwise inaccessible details.

Getting & Dumping Class

NSMutableArray is a class cluster – its concrete implementations are actually subclasses of NSMutableArray itself. Instances of which class does actually +[NSMutableArray new] return? With LLDB we don’t even have to write any code to figure that out:

1
2
(lldb) po [[NSMutableArray new] class]
__NSArrayM

With class name in hand we reach for class-dump. This handy utility forges the class headers obtained by analyzing provided binaries. Using the following one-liner we can extract the ivar layout we’re interested in:

1
./class-dump --arch arm64 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation | pcregrep -M "^[@\w\s]*__NSArrayM[\s\w:{;*]*}"

I suck at regular expressions so the one used above is probably not a masterpiece, but it provides fruitful results:

1
2
3
4
5
6
7
8
9
10
11
12
@interface __NSArrayM : NSMutableArray
{
    unsigned long long _used;
    unsigned long long _doHardRetain:1;
    unsigned long long _doWeakAccess:1;
    unsigned long long _size:62;
    unsigned long long _hasObjects:1;
    unsigned long long _hasStrongReferences:1;
    unsigned long long _offset:62;
    unsigned long long _mutations;
    id *_list;
}

The bitfields in the original output are typed with unsigned int, but obviously you can’t fit 62 bits into 32-bit integer – class-dump hasn’t been patched yet to parse ARM64 libraries correctly. Despite those minor flaws, one can tell quite a lot about the class just by looking at its ivars.

Disassembling Class

The most important tool in my investigation was Hopper. I’m in love with this disassembler. It’s an essential tool for curious souls who have to know how everything works. One of the best features of Hopper is that it generates C-like pseudocode that very often is clear enough to grasp the gist of the implementation.

The crucial method for understanding __NSArrayM is - objectAtIndex:. While Hopper does great job providing pseudocode for ARMv7, this feature doesn’t work for ARM64 yet. I figured it would be an excellent exercise to do this manually, with some hints from its ARMv7 counterpart.

Dissecting the Method

With ARMv8 Instruction Set Overview (registration required) in one hand and a bunch of educated guesses in another I think I’ve deciphered the assembly correctly. However, you should not treat the following analysis as an ultimate source of wisdom. I’m new at this.

Passing Arguments

As a start point let’s note that every Obj-C method is actually a C function with two additional parameters. The first one is self which is a pointer to the object being receiver of the method call. The second one is _cmd which represents the current selector.

One might say that the equivalent C-style declaration of the - objectAtIndex: function is:

id objectAtIndex(NSArray *self, SEL _cmd, NSUInteger index);

Since on ARM64 these types of parameters are passed in the consecutive registers, we can expect the self pointer to be in x0 register, the _cmd in x1 register and the index of object in x2 register. For details of parameter passing please reference the ARM Procedure Call Standard, noting that Apple’s iOS version has some divergences.

Analizing Assembly

This will look scary. Since analyzing a large chunk of assembly at once is not sane, we’re going to walk through the following code step by step, figuring out what each line does.

0xc2d4         stp        x29, x30, [sp, #0xfffffff0]!
0xc2d8         mov        x29, sp
0xc2dc         sub        sp, sp, #0x20
0xc2e0         adrp       x8, #0x1d1000
0xc2e4         ldrsw      x8, [x8, #0x2c]
0xc2e8         ldr        x8, [x0, x8]
0xc2ec         cmp        x8, x2
0xc2f0         b.ls       0xc33c

0xc2f4         adrp       x8, #0x1d1000
0xc2f8         ldrsw      x8, [x8, #0x30]
0xc2fc         ldr        x8, [x0, x8]
0xc300         lsr        x8, x8, #0x2
0xc304         adrp       x9, #0x1d1000
0xc308         ldrsw      x9, [x9, #0x34]
0xc30c         ldr        x9, [x0, x9]
0xc310         add        x9, x2, x9, lsr #2
0xc314         cmp        x8, x9
0xc318         csel       x8, xzr, x8, hi
0xc31c         sub        x8, x9, x8
0xc320         adrp       x9, #0x1d1000
0xc324         ldrsw      x9, [x9, #0x38]
0xc328         ldr        x9, [x0, x9]
0xc32c         ldr        x0, [x9, x8, lsl #3]
0xc330         mov        sp, x29
0xc334         ldp        x29, x30, [sp], #0x10
0xc338         ret

The Setup

We begin with what seems to be an ARM64 function prologue. We’re saving x29 and x30 registers on stack then we move the current stack pointer to x29 register:

0xc2d4         stp        x29, x30, [sp, #0xfffffff0]!
0xc2d8         mov        x29, sp

We make some room on the stack (subtracting, since stack grows downwards):

0xc2dc         sub        sp, sp, #0x20

The path code we’re interested in doesn’t seem to be using this space. However, the “out of bounds” exception throwing code does call some other functions thus the prologue has to facilitate both options.

Fetching Count

The next two lines perform program counter relative addressing. The specifics of addresses encoding are quite complicated and the literature is scarce, however, Hopper automatically calculates the more sane offsets:

0xc2e0         adrp       x8, #0x1d1000
0xc2e4         ldrsw      x8, [x8, #0x2c]

The above two lines will fetch contents of memory located at 0x1d102c and store it into x8 register. What’s over there? Hopper is kind enough to help us:

_OBJC_IVAR_$___NSArrayM._used:
0x1d102c         dd         0x00000008

This is the offset of _used ivar inside the __NSArrayM class. Why go through the trouble of additional fetch, instead of simply putting the value 8 into the assembly? It’s because of fragile base class problem. The modern Objective-C runtime handles this by giving itself an option to overwrite the value at 0x1d102c (and all the other ivar offsets for that matter). If either NSObject or NSArray or NSMutableArray adds new ivars, the old binaries will still work.

The runtime can modify the ivars’ offsets dynamically without breaking compatibility

While the CPU has to do an additional memory fetch, this is a beautiful solution as explained in more details on Hamster Emporium and Cocoa with Love.

At this point we know the offset of _used within the class. And since the Obj-C objects are nothing more than structs, and we’ve got pointer to this struct in x0, all we have to do is fetch the value:

0xc2e8         ldr        x8, [x0, x8]

The C equivalent of the above code would be:

unsigned long long newX8 = *(unsigned long long *)((char *)(__bridge void *)self + x8);

I prefer the assembly version. Quick analysis of disassembled - count method of __NSArrayM reveals that the _used ivar contains the number of elements in __NSArrayM, and as of this moment we have this value in x8 register.

Checking Bounds

Having requested index in x2 and count in x8 the code compares the two:

0xc2ec         cmp        x8, x2
0xc2f0         b.ls       0xc33c

When the value of x8 is lower or same as x2 then we jump to the code at 0xc33c, which handles the exception throwing. This is essentially the bounds checking. If we fail the test (count is lower or equal to index), we throw exception. I’m not going to discuss those parts of disassembly since they don’t really introduce anything new. If we pass the test (count is greater than index) then we simply keep executing instructions in order.

Calculating Memory Offset

We’ve seen this before, this time we fetch the offset of _size ivar located at 0x1d1030:

0xc2f4         adrp       x8, #0x1d1000
0xc2f8         ldrsw      x8, [x8, #0x30]

Then we retrieve its contents and shift it to the right by two bits:

0xc2fc         ldr        x8, [x0, x8]
0xc300         lsr        x8, x8, #0x2

What’s the shift for? Let’s take a look at the dumped header:

unsigned long long _doHardRetain:1;
unsigned long long _doWeakAccess:1;
unsigned long long _size:62;

It turns out, all three bitfields share the same storage, so to get the actual value of _size, we have to shift value to the right, discarding the bits for _doHardRetain and _doWeakAccess. Ivar offsets for _doHardRetain and _doWeakAccess are exactly the same, but their bit accessing code is obviously different.

Going further it’s the same drill, we get the contents of _offset ivar (at 0x1d1034) into x9 register:

0xc304         adrp       x9, #0x1d1000
0xc308         ldrsw      x9, [x9, #0x34]
0xc30c         ldr        x9, [x0, x9]

In the next line we add the requested index stored in x2 to the 2-bits-right-shifted _offset (it’s also 62 bits bitfield) and then we store it all back in x9. Isn’t assembly just amazing?

0xc310         add        x9, x2, x9, lsr #2

The next three lines are the most important ones. First of all, we compare the _size (in x8) with _offset + index (in x9)

0xc314         cmp        x8, x9

Then we conditionally select value of one register based on the result of previous comparison.

0xc318         csel       x8, xzr, x8, hi

This is more or less equal to ?: operator in C:

x8 = hi ? xzr : x8;      // csel       x8, xzr, x8, hi

The xzr register is a zero register which contains value 0, and the hi is the name of condition code the csel instruction should inspect. In this case we’re checking whether the result of comparison was higher (if value of x8 was greater than that of x9).

Finally we subtract the new value of x8 from the _offset + index (in x9) and we store it in x8 again

0xc31c         sub        x8, x9, x8

So what just happened? First of all let’s look into equivalent C code:

int tempIndex = _offset + index;        // add        x9, x2, x9, lsr #2
BOOL isInRange = _size > tempIndex;     // cmp        x8, x9
int diff = isInRange ? 0 : _size;       // csel       x8, xzr, x8, hi
int fetchIndex = tempIndex - diff;      // sub        x8, x9, x8

In the C code we don’t have to shift neither _size nor _offset to the right, since compiler does this automatically for bitfields access.

Getting the Data

We’re almost there. Let’s fetch the contents of _list ivar (0x1d1038) into x9 register:

0xc320         adrp       x9, #0x1d1000
0xc324         ldrsw      x9, [x9, #0x38]
0xc328         ldr        x9, [x0, x9]

At this moment the x9 points to the beginning of memory segment containing the data.

Finally, shift the value of fetch index stored in x8 to the left by 3 bits, add it to the x9 and put the contents of memory at that location into x0

0xc32c         ldr        x0, [x9, x8, lsl #3]

Two things are important here. First of all, every data offset is in bytes. Shifting a value to the left by 3 is equivalent to multiplying it by 8, which is the size of pointer on a 64-bit architecture. Secondly, the results goes into x0 which is the register which stores the return value of a function returning NSUInteger.

At this point we are done. We have fetched the correct value stored in array.

Function Epilog

There rest are some boilerplate operations that restore the state of registers and stack pointer before the call. We’re reversing the function’s prologue and returning:

0xc330         mov        sp, x29
0xc334         ldp        x29, x30, [sp], #0x10
0xc338         ret

Putting it all together

I explained what this code does, but the question we will now answer is why?

The Meaning of ivars

Let’s do a quick summary of what each ivar means:

  • _used is count
  • _list is pointer to buffer
  • _size is size of the buffer
  • _offset is index of first element of array in buffer

The C code

With ivars in mind and the disassembly analyzed we now can write an equivalent Objective-C code that performs the very same actions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
- (id)objectAtIndex:(NSUInteger)index
{
    if (_used <= index) {
        goto ThrowException;
    }

    NSUInteger fetchOffset = _offset + index;
    NSUInteger realOffset = fetchOffset - (_size > fetchOffset ? 0 : _size);

    return _list[realOffset];

ThrowException:
    // exception throwing code
}

Assembly was definitely much more long-wided.

Memory Layout

The most crucial part is deciding whether the realOffset should be equal to fetchOffset (subtracting zero) or fetchOffset minus _size. Since staring at dry code doesn’t necessarily paint the perfect picture, let’s consider two examples of how object fetching works.

_size > fetchOffset

In this example, the offset is relatively low:

A simple example

To fetch object at index 0 we calculate the fetchOffset as 3 + 0. Since _size is larger than fetchOffset the realOffset is equal to 3 as well. The code returns value of _list[3]. Fetching object at index 4 makes fetchOffset equal to 3 + 4 and the code returns _list[7].

_size <= fetchOffset

What happens when offset is large?

A more difficult example

Fetching object at index 0 makes fetchOffset equal to 7 + 0 and the call returns _list[7] as expected. However, fetching object at index 4 makes fetchOffset equal to 7 + 4 = 11, which is larger than _size. Obtaining realOffset requires subtracting _size value from fetchOffset which makes it 11 - 10 = 1 and the method returns _list[1].

We’re essentially doing modulo arithmethic, circling back to the other end of the buffer when crossing the buffer boundary.

Data Structure

As you might have guessed, __NSArrayM makes use of circular buffer. This data structure is extremely simple, but a little bit more sophisticated than regular array/buffer. The contents of circular buffer can wrap around when either end is reached.

Circular buffer has some very cool properties. Notably, unless the buffer is full, insertion/deletion from either end doesn’t require any memory to be moved. Let’s analyze how the class utilizes circular buffer to be superior in its behavior in comparison to C array.

__NSArrayM Characteristics

While reverse-engineering the rest of the disassembled methods would provide the definite explanation of __NSArrayM internals, we can employ the discovered points of data to investigate the class at a higher level.

Inspecting at Runtime

To inspect __NSArrayM at runtime we can’t simply paste in the dumped header. First of all, the test application won’t link without providing at least empty @implementation block for the __NSArrayM class. Adding this @implementation block is probably not a good idea. While the app builds and actually runs, I’m not entirely sure how does the runtime decide which class to use (if you do know, please let me know). For the sake of safety I’ve renamed the class name to something unique – BCExploredMutableArray.

Secondly, ARC won’t let us compile the id *_list ivar without specifying its ownership. We’re not going to write into the ivar, so prepending id with __unsafe_unretained should provide the least interference with memory mangement. However, I’ve chosen to declare the ivar as void **_list and it will soon become clear why.

Printout code

We can create a category on NSMutableArray that will print the contents of ivars as wells as the list of all the pointers contained within the array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- (NSString *)explored_description
{
    assert([NSStringFromClass([self class]) isEqualToString:@"__NSArrayM"]);

    BCExploredMutableArray *array = (BCExploredMutableArray *)self;

    NSUInteger size = array->_size;
    NSUInteger offset = array->_offset;

    NSMutableString *description = [NSMutableString stringWithString:@"\n"];

    [description appendFormat:@"Size: %lu\n", (unsigned long)size];
    [description appendFormat:@"Count: %llu\n", (unsigned long long)array->_used];
    [description appendFormat:@"Offset: %lu\n", (unsigned long)offset];
    [description appendFormat:@"Storage: %p\n", array->_list];

    for (int i = 0; i < size; i++) {
        [description appendFormat:@"[%d] %p\n", i, array->_list[i]];
    }

    return description;
}

The Results

Insertion and deletion is fast at both ends

Let’s consider a very simple example:

1
2
3
4
5
6
7
8
9
10
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 5; i++) {
    [array addObject:@(i)];
}

[array removeObjectAtIndex:0];
[array removeObjectAtIndex:0];

NSLog(@"%@", [array explored_description]);

The output shows that removing object at index 0 twice simply clears the pointers and moves the _offset ivar accordingly:

1
2
3
4
5
6
7
8
9
10
Size: 6
Count: 3
Offset: 2
Storage: 0x178245ca0
[0] 0x0
[1] 0x0
[2] 0xb000000000000022
[3] 0xb000000000000032
[4] 0xb000000000000042
[5] 0x0

Here’s the visual interpretation of what’s going on:

Removing object at index 0 twice

What about adding? Let’s run another test on a brand new array:

1
2
3
4
5
6
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 4; i++) {
    [array addObject:@(i)];
}
[array insertObject:@(15) atIndex:0];

Inserting object at index 0 uses the circular buffer magic to put the newly inserted object at the end of the buffer:

1
2
3
4
5
6
7
8
9
10
Size: 6
Count: 5
Offset: 5
Storage: 0x17004a560
[0] 0xb000000000000002
[1] 0xb000000000000012
[2] 0xb000000000000022
[3] 0xb000000000000032
[4] 0x0
[5] 0xb0000000000000f2

And visually:

Adding object at index 0

This is fantastic news! It means that __NSArrayM can be processed from either side. You can use __NSArrayM as either stack or queue without any performance hits.

On a side note, you can see how on 64-bit architectures NSNumber uses tagged pointers to do its storage.

Non integral growth factor

OK, for this one I’ve cheated a little bit. While I’ve done some empirical testing as well, I wanted to have the exact value and I’ve peeked into the disassembly of insertObject:atIndex:. Whenever the buffer gets full, it is reallocated with 1.625 times larger size. I was quite astonished for it not to be equal to 2.

Update: Mike Curtiss has provided a very good explanation of why making resizing factor equal to 2 is suboptimal.

Once grown, doesn’t shrink

This is a shocker – __NSArrayM never reduces its size! Let’s run the following test code:

1
2
3
4
5
6
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 10000; i++) {
    [array addObject:[NSObject new]];
}
[array removeAllObjects];

Even though at this point the array is empty, it still keeps the large buffer:

1
Size: 14336

This is not something you should worry about unless you use the NSMutableArray to load in some enormous amounts of data and then clear the array with the intention of freeing the space.

Initial capacity almost doesn’t matter

Let’s allocate new arrays with initial capacity set to consecutive powers of two:

1
2
3
for (int i = 0; i < 16; i++) {
    NSLog(@"%@", [[[NSMutableArray alloc] initWithCapacity:1 << i] explored_description]);
}

Surprise surprise:

1
2
3
4
5
6
7
8
9
Size:2     // requested capacity - 1
Size:2     // requested capacity - 2
Size:4     // requested capacity - 4
Size:8     // requested capacity - 8
Size:16    // requested capacity - 16
Size:16    // requested capacity - 32
Size:16    // requested capacity - 64
Size:16    // requested capacity - 128
... // Size:16 all the way down

It doesn’t clean up its pointers on deletion

This is barely important, but I found it nonetheless interesting:

1
2
3
4
5
6
7
8
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 6; i++) {
    [array addObject:@(i)];
}
[array removeObjectAtIndex:1];
[array removeObjectAtIndex:1];
[array removeObjectAtIndex:1];

And the output:

1
2
3
4
5
6
7
8
9
10
Size: 6
Count: 3
Offset: 3
Storage: 0x17805be10
[0] 0xb000000000000002
[1] 0xb000000000000002
[2] 0xb000000000000002
[3] 0xb000000000000002
[4] 0xb000000000000042
[5] 0xb000000000000052

The __NSArrayM doesn’t bother clearing previous space when moving its objects forward. However, the objects do get deallocated. It’s not the NSNumber doing its magic either, NSObject acts correspondingly.

This explains why I’ve opted for defining _list ivar as void **. If the _list was declared as id * then the following loop would crash on the object assignment:

1
2
3
4
for (int i = 0; i < size; i++) {
    id object = array->_list[i];
    NSLog("%p", object);
}

ARC implicitly inserts a retain/release pair and it accesses deallocated objects. While prepending id object with __unsafe_unretained fixes the problem, I absolutely didn’t want anything/anyone to call any methods on this bunch of stray pointers. That’s my void ** reasoning.

Worst case scenario is adding/removing from the middle

In these two examples we will remove elements from roughly the middle of the array:

1
2
3
4
5
6
NSMutableArray *array = [NSMutableArray array];

for (int i = 0; i < 6; i++) {
    [array addObject:@(i)];
}
[array removeObjectAtIndex:3];

In output we see that the top part shifts down, where down are lower indexes (notice the stray pointer at [5])

1
2
3
4
5
6
[0] 0xb000000000000002
[1] 0xb000000000000012
[2] 0xb000000000000022
[3] 0xb000000000000042
[4] 0xb000000000000052
[5] 0xb000000000000052
Removing object at index 3

However, when we call [array removeObjectAtIndex:2] instead, the bottom part shifts up, where up are higher indexes:

1
2
3
4
5
6
[0] 0xb000000000000002
[1] 0xb000000000000002
[2] 0xb000000000000012
[3] 0xb000000000000032
[4] 0xb000000000000042
[5] 0xb000000000000052
Removing object at index 2

Inserting object in the middle has very similar results. The reasonable explanation is __NSArrayM tries to minimize the amount of memory moved, thus it will move at most half of its elements.

Being a Good Subclassing Citizen

As disccussed in NSMutableArray Class Reference every NSMutableArray subclass must implement the following seven methods:

  • - count
  • - objectAtIndex:
  • - insertObject:atIndex:
  • - removeObjectAtIndex:
  • - addObject:
  • - removeLastObject
  • - replaceObjectAtIndex:withObject:

Unsurprisingly, __NSArrayM fulfills this requirement. However, the list of all the methods implemented by __NSArrayM is quite short and doesn’t contain 21 additional methods listed in NSMutableArray header. Who’s responsible for executing those methods?

It turns out they’re all part of the NSMutableArray class itself. This is extremely convenient – any subclass of NSMutableArray can implement only the seven most rudimentary methods. All the other higher-level abstractions are built on top of them. For instance - removeAllObjects method simply iterates backwards, calling - removeObjectAtIndex: one by one. Here’s pseudo code:

1
2
3
4
5
6
7
8
9
10
11
12
// we actually know this is safe, since count is stored on 62 bits
// and casting to NSInteger will *not* overflow
NSInteger count = (NSInteger)[self count];
if (count == 0) {
  return;
}

count--;
do {
  [self removeObjectAtIndex:count];
  count--;
} while (count >= 0);

However, where it makes sense __NSArrayM does reimplement some of its superclasses’ methods. For instance, while NSArray provides the default implementation for - countByEnumeratingWithState:objects:count: method of NSFastEnumeration protocol, __NSArrayM has its own code-path as well. Knowing its internal storage __NSArrayM can provide much more efficient implementation.

Foundations

I’ve always had this idea of Foundation being a thin wrapper on CoreFoundation. My argument was simple – there is no need to reinvent the wheel with brand new implementations of NS* classes when the CF* counterparts are available. I was shocked to realize neither NSArray nor NSMutableArray have anything in common with CFArray.

CFArray

The best thing about CFArray is that it’s open sourced. This will be a very quick overview, since the source code is publicly available, eagerly waiting to be read. The most important function in CFArray is _CFArrayReplaceValues. It’s called by:

Basically, CFArray moves the memory around to accommodate the changes in the most efficient fashion, similarly to how __NSArrayM does its job. However, the CFArray does not use a circular buffer! Instead it has a larger buffer padded with zeros from both ends which makes enumeration and fetching the correct object much easier. Adding elements at either end simply eats up the remaining padding.

Final Words

Even though CFArray has to serve slightly more general purposes, I find it fascinating that its internals don’t work the same way as __NSArrayM does. While I suppose it would make sense to find common ground and make a single, canonical implementation, perhaps there are some other factors that influenced the separation.

What those two have in common? They’re the concrete implementation of abstract data type known as deque. Despite its name, NSMutableArray is an array on steroids, deprived of C-style counterpart’s drawbacks.

Personally, I’m most delighted with constant-time performance of insertion/deletion from either end. I no longer have to question myself using NSMutableArray as queue. It works perfectly fine.