Thursday, January 9, 2020

Remote iPhone Exploitation Part 2: Bringing Light into the Darkness -- a Remote ASLR Bypass

Posted by Samuel Groß, Project Zero

This post is the second in a series about a remote, interactionless iPhone exploit over iMessage.The first blog post, which introduced the exploited vulnerability, can be found here.

The initial primitive gained from the vulnerability is an absolute address dereference in which the read value is afterwards used as an ObjC object. As such, some knowledge of the target address space is required in order to exploit this vulnerability for remote code execution. This blog post describes a way to defeat ASLR remotely without any additional information disclosure vulnerabilities. 

First off, the effectiveness of an old technique, heap spraying, is evaluated. Afterwards, a technique is described through which it is possible to infer the base address of the dyld shared cache region given only a memory corruption bug. The released code implements the presented attack and can infer the shared cache base address remotely on vulnerable devices within a couple of minutes.

Heap Spraying on iOS

The goal of heap spraying is to put data of the attacker’s choosing at a known address so that it can be referenced during the exploitation process. Heap spraying should be less effective today than it was 15 years ago due to ASLR and a 64-bit address space. However, it turns out that on iOS the technique is still fairly usable. In fact, only about 256MB of data need to be sprayed to put controlled data at a known address. This can easily be demonstrated with the following code snippet:

    const size_t size = 0x4000;
    const size_t count = (256 * 1024 * 1024) / size;
    for (int i = 0; i < count; i++) {
        int* chunk = malloc(size);
        *chunk = 0x41414141;
   // Now look at the memory at 0x110000000

Running this in on iOS (e.g. as part of a custom App) will put 0x41414141 at address 0x110000000:
(lldb) x/gx 0x110000000
0x110000000: 0x0000000041414141

Heap Spraying via iMessage

The next question then is how to best spray a few hundred megabytes of data remotely over iMessage. Basically, there are two ways to do that:

  1. By abusing a memory leak (not an information leak!), a bug in which a chunk of memory is “forgotten” and never freed, and triggering it multiple times until the desired amount of memory has been leaked.
  2. By finding and abusing an “amplification gadget”: a piece of code that takes an existing chunk of data and copies it, potentially multiple times, thus allowing the attacker to spray a large amount of memory by only sending a relatively small number of bytes.

As it turns out, the NSKeyedUnarchiver API provides both primitives and this exploit actually combines the two: it sends messages of around 100kB, which seems to be the maximum size of inline data, and with that sprays around 32MB of heap data. It then leaks those 32MB and repeats the procedure a couple of times to perform the full heap spray.

However, it is likely possible to perform the entire spray in a single message by using downloaded attachments (the pbdi key). That way the memory leak wouldn’t be necessary and the bug could be triggered multiple times if necessary. This is left as an exercise for the reader.

There are many places in the NSKeyedUnarchiver subsystem where memory is leaked. One example are cyclic object graphs, which will never be cleaned up as they keep each other alive - a reference cycle. There is also another way: __NSKeyedCoderOldStyleArrays, which are used during decoding of NSValues, and can, due to a quirk in the NSKeyedUnarchiver, always be deserialized regardless of the set of allowed classes. A  __NSKeyedCoderOldStyleArray stores its size and a pointer to a malloced chunk containing a number of same-type values, e.g. integers, c-strings, or ObjC objects. Interestingly, when the __NSKeyedCoderOldStyleArray is destroyed, it only frees its backing memory but does not recursively free the objects contained in it. As such, if the array contains pointers to other memory chunks, such as ObjC objects, memory is leaked. This is normally fine, as a __NSKeyedCoderOldStyleArray is only used to temporarily decode the array values of an NSValue. However, as it can also be decoded as a standalone object, it becomes possible to leak a large number of ObjC objects with it.

As for the amplification gadget, ACZeroingString instances appear to be the perfect candidate, having already been (ab)used in Natalie’s exploit for CVE-2019-8646. As part of its initWithCoder, an ACZeroingString will take an existing NSData object and copy its content into a newly malloc’ed memory chunk.

With that, the object graph to spray around 32MB of heap data looks like this:
Here, each ACZeroingString instance copies the NSData’s content into a new malloc buffer and the __NSKeyedCoderOldStyleArray keeps all those ACZeroingString instances alive even after the current message has been fully processed. After sending around 8 of these messages, controlled data will be located at 0x110000000.

Having controlled data at a known address is a first step, but likely not enough. Given the existing exploit primitive, it would now be possible to create fake ObjC objects in the heapspray region and have them be used in [NSSharedKeySet indexForKey:]. However, as the heapspray content is not executable, it is necessary to know the address of code pages before faking objects really becomes possible. How this can be achieved will be discussed next, after a short introduction to some ObjC internals.

Objective-C for the ASLR Bypasser

Shown next is the relevant code snippet  from [NSSharedKeySet indexForKey:] in which the read from a controlled address happens:

// index is fully controlled, _keys is nullptr
      id candidate = self->_keys[index];
      if (candidate != null) {
        if ([key isEqual:candidate]) {
          return prevLength + index;

The id type that is used here represents a reference to an ObjC object for which no more type information is available, not unlike a void* in C. However, unlike C, ObjC has runtime type information, so it is always possible to determine the exact type of an object at runtime, for example through the isKindOfClass method. 
Further, ObjC supports pointer tagging and so a pointer to an ObjC object, for example an id, can generally be one of two things:
  1. An actual pointer to an ObjC object
  2. A pointer-sized value containing both the type and value information

The  layout of objects will be discussed in the next post, where it will be relevant for gaining code execution. However, for this blog post it is already necessary to take a closer look at ObjC tagged pointers.

NSNumbers and very short NSStrings are examples for tagged pointers. On Arm64, an id is considered a tagged pointer if its MSB is set. In that case, the corresponding class is stored in a global table to which an index is encoded in the tagged pointer. With that, an NSNumber instance storing a 32-bit integer (in ObjC: `NSNumber* n = @42`) would roughly be represented as

(1 011 00...001010100010)

Where the MSB is 1, indicating a tagged pointer and the following 3 bits identify the index of the class, in this case 3, corresponding to __NSCFNumber. In the case of __NSCFNumber, the 32bit value is stored in bits 8..40 while the lowest byte indicates the specific number type, in this case kCFNumberSInt32Type.

The APIs operating on ObjC id’s (objc_msgSend, objc_retain, objc_xyz) will usually first check the tag bit and proceed accordingly:

    if (arg & 0x8000000000000000) {
        // handle tagged pointer
    } else {
       // handle real pointer

Likely in order to break exploit techniques that abuse tagged pointers, tagged pointer values are now also XORed with a per-process random value. As such, the actual value used at runtime would now be

0xb0000000000002a2 ^ objc_debug_taggedpointer_obfuscator

Where objc_debug_taggedpointer_obfuscator seems to be a fully random number except for the MSB, which must be zero (to preserve the pointer tagging bit). The following experiment with lldb and a simple iOS app demonstrates this:

(lldb) p n
(__NSCFNumber *) $0 = 0xf460034a00975a82 (int)42
(lldb) p objc_debug_taggedpointer_obfuscator
(void *) $1 = 0x4460034a00975820
(lldb) p/x (uintptr_t)n ^ (uintptr_t)objc_debug_taggedpointer_obfuscator
(unsigned long) $9 = 0xb0000000000002a2

The Dyld Shared Cache

On iOS (as well as on macOS), most system libraries are prelinked into one giant binary blob, called the dyld_shared_cache. Amongst other benefits, this improves program load times as it reduces the runtime overhead of symbol resolution. One security relevant aspect of the shared cache is that it is mapped at the same address in every process, with its exact location only being randomized once during device boot. This is likely due to the shared cache being mapped into all userspace processes (thus reducing overall system memory usage) but also containing absolute pointers to itself, making it not position independent. As such, once the base address of the shared cache is known, the addresses of pretty much all libraries in any userspace process on that device, including thousands of ROP gadgets, all ObjC Classes, various strings, and much more, are also known. This is sufficient for an RCE exploit.

On the latest iOS versions, the dyld_shared_cache will be mapped somewhere in the address range 0x180000000 to 0x280000000, providing for roughly 200000 possible base addresses as the cache itself is around 1GB in size and the page size (the smallest granularity for the ASLR shift) is 0x4000 bytes. As such, it would be possible to find the base address by sheer brute force. However, as every wrong guess would likely cause a crash, the targeted imagent process would soon be subject to restart limiting enforced by launchd if a service crashes too quickly. This can be avoided by only crashing once roughly every 10 seconds. With that, a full brute force attack would then take around 3-4 weeks. While such an attack is not completely impossible given that mobile devices are rarely rebooted, it is probably not a realistic one either. The remaining part of this blog post describes how the base address can be inferred within 5 minutes instead.

Defeating ASLR with a Crash Oracle

One outstanding aspect of CVE-2019-8646 is that it creates an additional communication channel between the victim device and the attacker. It appears that this is a rare situation which should also be mitigatable to a large degree by properly sandboxing the process in question to prevent it from doing any kind of network activity. As such, one of the main questions motivating this research was the following: given only a remote memory corruption vulnerability, would it be possible to infer the base address of the shared cache somehow? To achieve this, some kind of communication channel had to first be found, which exists in the form of iMessage receipts.

iMessage supports two different types of message receipts: delivery receipts and read receipts. The latter one can be disabled in Settings while the former will always be sent. The following screenshot from an iMessage chat shows how these receipts are used.
Here, the sender received a delivery receipt and a read receipt for the message “Foo”, a delivery receipt for the message “Bar”, and no receipt at all for the message “Baz”. Delivery receipts are interesting as they are sent automatically without any user interaction. Even more interesting is the time they are sent out by imagent, as shown in the following pseudocode snippet of imagent’s logic for processing incoming iMessages:

    msgPlist = decodeIntoPlist(message)
    # extract some values from the plist ...
    atiData = msgPlist['ATI']
    ati = NSKeyedUnarchive(atiData)              [1]
    # more stuff ...
    # yet more stuff ...

Any bug in the NSKeyedUnarchiver API will trigger at [1]. With that it is possible to build a “crash oracle”: if a crash is triggered during the NSKeyedUnarchiving, no delivery receipt will be sent, otherwise one will be sent. This in turn allows the sender to infer whether or not the payload caused a crash in imagent on the remote device. What remains now is to turn the bug into an oracle function so that a useful bit of information can be extracted from the outcome of each oracle query. 

The ideal oracle would be 

def oracle(addr): 
    if isMapped(addr):

Given that, breaking ASLR remotely would be rather straightforward:
  1. Perform a linear search between 0x180000000 and 0x280000000 in ~500MB steps, which would take at most 8 oracle queries
  2. Perform a binary search between the found address and the found address minus the step size. This would again only take a few queries as it runs in logarithmic time.

This could take as little as 10 iMessages to break ASLR and likely no more than 20. 

In practice, however, it is unlikely that a memory corruption vulnerability will yield this perfect oracle function, so a more general version of the above algorithm is necessary. In any case, the vulnerability likely first needs a bit of exploit engineering to result in a usable oracle function. This is also the case for CVE-2019-8641.

First off, the bug trigger given in part 1 of this series unfortunately crashes regardless of whether the given address is valid or not: (line numbers reference the code snippets from part 1) the ObjC id that is read from an attacker controlled address is afterwards compared to the key that is currently being looked up (line 13 in -[NSSharedKeySet indexForKey:]). As it most likely does not match, the lookup will fail and -[NSSharedKeySet initWithCoder:] will attempt to recreate the NSSharedKeySet from scratch (lines 20 through 23). For that it will call [NSSharedKeySet allKeys] on its subKeySet (line 22). Unfortunately, as the subKeySet is not yet fully initialized (this is the bug), the allKeys method will definitely crash when accessing the _keys array as that is still nullptr. Luckily, it is possible to work around this with the following object graph:
The trick here is to add a new tail KeySet (SharedKeySet3) which will always be able to look up the second key (“k2”). However, as this KeySet is now the subKeySet of SharedKeySet1, SharedKeySet2 must be unarchived in some other way. This is only possible by unarchiving a new SharedKeyDictionary first, which in turn is only possible through the _keys array of SharedKeySet1. Unfortunately, the class whitelist used to unarchive _keys does not include NSDictionary. Fortunately though, the __NSLocalizedString class (itself an NSString and as such allowed) has the following piece of code in its initWithCoder implementation to decode its config dictionary:

NSSet* classes = [NSSet setWithObjects:[NSDictionary class], ...];
NSDictionary* configDict = [coder decodeObjectOfClasses:classes 

As such, a NSSharedKeyDictionary can be decoded during unarchiving of _keys by “wrapping” it into a __NSLocalizedString.

With that, unarchiving the shown payload will only crash in one of the following two cases:
  1. if the address (in this case 0x41414140 as the index read from _rankTable is multiplied by 8) is not readable (i.e. not mapped)
  2. if calling [key isEqual:candidate] with the value read from the address crashes

If the value is zero, [key isEqual:] will not be called (line 11). Otherwise, if the value does not have the MSB set, it will be treated as a pointer to an ObjC object and methods will be called on it, surely leading to a crash unless the value pointed to is in fact an ObjC object. Finally, if the value has the MSB set, it will be treated as a tagged pointer and will likely have its class fetched. This happens by first XORing the tagged pointer with the random obfuscator value, then extracting an index into the class table from the upper bits and using that. As not all entries of the class table are populated, this step can cause a crash if the index is invalid. As the obfuscator value is unknown, it is not normally possible to predict in advance whether a given value will cause a crash. However, there is a situation in which even invalid tagged pointer values will not cause a crash: in the implementation of [NSCFString isEqual:], which is used when the key that is looked up is a string (as is the case in the previous object graph examples). That implementation has the following special casing for tagged pointer values:

if (a3 & 0x8000000000000000) {
  // Extract class index from tagged pointer
  v5 = ((a3 ^ objc_debug_taggedpointer_obfuscator) >> 60) & 7;
  if ( v5 == 7 )
    // Use extended class index in bits 52 - 60
    v5 = (((a3 ^ objc_debug_taggedpointer_obfuscator) >> 52) & 0xFF) + 8;

  // Check class index equals the one for NSString
  if ( v5 == 2 )
    // If yes, extract string content from lower bits and compare
    return _NSTaggedPointerStringEqualCFString(a3, self);

  // If not, just return false directly
  return 0;

Given this isEqual method, any value that has the MSB set can now be used as argument without causing a crash.

Finally, with all that, the resulting oracle function is now roughly

  if isMapped(addr) and 
     (isZero(*addr) or hasMSBSet(*addr) or pointsToObjCObject(*addr)):

Given this oracle, it is then necessary to build a “profile” of the target device’s shared cache, which is essentially a bitmap where a zero indicates that an access would crash and a one indicates that it would not. As the shared cache binary is the same on all iPhones of the same hardware model and iOS version, this can simply be done by running a custom app on a device similar to the target device and scanning over the shared cache region in memory. In practice,  the profile should also support a third, “unknown” state in which both outcomes are possible. This would for example be used for writable memory regions in the shared cache as their runtime content is unknown. However, For simplicity, the following explanation will assume a two-state profile. Adopting the algorithm to work with three state profiles is straightforward and is implemented in the released source code.

A bitmap for a two-state profile might look as follows:
The next step is then to use the oracle to again perform a linear search between 0x180000000 and 0x280000000 until no crash is observed. Afterwards, the set of all possible shared cache base addresses can be computed simply by iterating over the profile in pagesized steps and computing the base address for any offset that would not yield a crash when probed. In practice, this step will result in roughly 30000-40000 different base addresses (candidates).

Next, a search algorithm has to be used to efficiently determine the correct base address with minimal number of additional oracle queries (as each oracle query takes around 10s so as to not crash imagent too quickly, which would soon cause launchd to delay restarting the service). The following picture shows the shared cache (hypothetically) mapped at each of the possible base addresses (in this case 5) in memory:
The goal is now to find a new address that, when probed through the crash oracle, will allow roughly half of the remaining candidates to be discarded. In the above picture that would, for example, be the address 0x19020c028 (green line). If a crash occurs when querying the oracle for that address, then only the first and last candidate remain, otherwise the middle three candidates are kept. Given the candidates and the address to probe, it is also possible to compute the probability of a crash (⅖ in this case), and with that the expected number of remaining candidates (E) after the oracle query, in this case:

E = ⅗ * 3 + ⅖ * 2 = 2.6
To efficiently find the correct base address, the algorithm will now greedily pick the address with the smallest value for E in every iteration. Ideally, if the ones and zeroes in the profile are roughly balanced (which is more or less the case here), this will be half the current number of candidates. In that case the correct base address will be found in logarithmic time, or roughly 2-5 minutes.

When implementing this algorithm, a slight performance problem occurs as a full search for the ideal candidates would take shared_cache_size/8 * num_candidates operations, easily reaching one trillion (1012) operations. However, in practice it is good enough to approximate the ideal solution by randomly testing 100 different addresses. Another minor complication that arises when using three-state profiles as discussed above is that the algorithm is conservative and will treat writable memory pages as both potentially crashing and not crashing (as it is unknown what value will be there at runtime). As such, the probability of a crash can only be approximated from the read-only pages. However, again, this approximation of the real probability works just fine in practice as it does not affect the correctness of the algorithm.

Pseudocode for the final algorithm is shown next.

candidates = [...]
while len(candidates) > 1:
  best_address = 0x0
  best_E = len(candidates)
  remaining_candidates_on_crash = None
  remaining_candidates_on_nocrash = None

  for _ in range(0, 100):
    addr = random.randrange(minbase, maxbase, 8)
    crashset = []
    nocrashset = []
    for profile in candidates:
      if profile.addr_will_crash(addr):
      if profile.addr_will_not_crash(addr):

    crash_prob = len(crashset) / len(candidates)
    nocrash_prob = 1.0 - crash_prob 
    E = crash_prob * len(crashset) + nocrash_prob * len(nocrashset)
    if E < best_E:
      best_E = E
      best_address = addr
      remaining_candidates_on_crash = crashset
      remaining_candidates_on_nocrash = nocrashset

  if oracle(best_address):
    candidates = remaining_candidates_on_nocrash
    candidates = remaining_candidates_on_crash

The following snippet shows the output of the part of the exploit that infers the shared cache’s base address on the target device. The printed “score” value corresponds to the computed expected number of remaining candidates after the displayed address is queried.

> ./
[!] Note: this exploit *deliberately* displays notifications to the target
[*] Trying to find a valid address...
[*] Testing address 0x180000000...
[*] Testing address 0x188000000...
[*] Testing address 0x190000000...
[*] Testing address 0x198000000...
[*] Testing address 0x1a0000000...
[*] Testing address 0x1a8000000...
[*] Testing address 0x1b0000000...
[*] Testing address 0x1b8000000...
[*] Testing address 0x1c0000000...
[+] 0x1c0000000 is valid!
[*] Have 34353 potential candidates for the dyld_shared_cache slide
[*] Shared cache is mapped somewhere between 0x181948000 and 0x203d64000
[*] Now determining exact base address of shared cache...
[*] 34353 candidates remaining...
[*] Best (approximated) address to probe is 0x1b12070d0 with a score of 17208.40
[*] 17906 candidates remaining...
[*] Best (approximated) address to probe is 0x1b8a353d8 with a score of 9144.48
[*] 9656 candidates remaining...
[*] Best (approximated) address to probe is 0x1bcb23de0 with a score of 5093.02
[*] 5104 candidates remaining...
[*] Best (approximated) address to probe is 0x1e172e3f8 with a score of 2754.83
[*] 2682 candidates remaining...
[*] Best (approximated) address to probe is 0x1b363c658 with a score of 1454.06
[*] 1728 candidates remaining...
[*] Best (approximated) address to probe is 0x1e0301200 with a score of 929.21
[*] 915 candidates remaining...
[*] Best (approximated) address to probe is 0x1b0c04368 with a score of 497.63
[*] 593 candidates remaining...
[*] Best (approximated) address to probe is 0x1e0263068 with a score of 319.15
[*] 326 candidates remaining...
[*] Best (approximated) address to probe is 0x1bec43868 with a score of 163.84
[*] 156 candidates remaining...
[*] Best (approximated) address to probe is 0x1c15ab0e8 with a score of 78.21
[*] 82 candidates remaining...
[*] Best (approximated) address to probe is 0x1c49efe90 with a score of 41.02
[*] 40 candidates remaining...
[*] Best (approximated) address to probe is 0x1befd60f8 with a score of 20.00
[*] 20 candidates remaining...
[*] Best (approximated) address to probe is 0x1c14089d0 with a score of 10.00
[*] 10 candidates remaining...
[*] Best (approximated) address to probe is 0x1c428d450 with a score of 5.00
[*] 5 candidates remaining...
[*] Best (approximated) address to probe is 0x1df0939f0 with a score of 2.60
[*] 2 candidates remaining...
[*] Best (approximated) address to probe is 0x1c3d255f8 with a score of 1.00
[+] Shared cache is mapped at 0x1bf2b4000

As a final note, it seems likely that a similar oracle function can be constructed from different memory corruption vulnerabilities. As an example, any vulnerability that allows the attacker to corrupt or fake an ObjC object (e.g. through an OOB read such as CVE-2019-8641 or a use-after-free like CVE-2019-8647 and CVE-2019-8662) could potentially be turned into a similar oracle in the following way: whenever a reference to an ObjC object is dropped, objc_release is called with the object as argument. This function will, amongst others, perform the following steps: it first checks if the object has “custom retain release” functionality by checking a bit in the Class object associated with the released object (every ObjC object references its Class object through a pointer stored at offset zero, called the “ISA” (Is-a) pointer). If the object has no custom retain release, an inline refcount will be decremented and, if the result is not zero, nothing more happens. Otherwise the object’s destructor is invoked and the memory chunk freed.

As such, if the Class pointer of an object can be corrupted to point into the shared cache region, and at the same time the object’s inline refcount is > 1, then objc_release will only crash if a specific bit is set at an offset from the pointed to value. With that, a crash oracle can again be constructed.

Supporting Different iOS versions and Hardware Models

One seeming limitation of the presented technique is that it requires up-front knowledge of the target device’s hardware model and iOS version to build the correct shared cache profile. However, this is not necessarily the case: if the model number or version are unknown, then the attacker can build shared cache profiles for all possible combinations of hardware model and iOS version that could be used. This would then likely result in a few million candidates after the initial linear scan is finished. However, due to the magic of logarithms, even with millions of candidates this attack will still only require sending a few dozen messages to determine the correct base address and with it the correct shared cache, model number, and version. This has, however, not been implemented during this research.

A Note about Noisiness 

Another seeming limitation is the inherent noisiness of this attack. While crashing imagent dozens of times is not noticeable to the user, it would send crash reports to Apple if “Share iPhone Analytics” is enabled on the device. As such, this technique might seem less useful for a real-world attacker as it could theoretically inform Apple of the vulnerability that is being exploited. However, it appears that iOS stops collecting crash logs for a process after 25 crashes. This can be observed by repeatedly crashing an iOS process (e.g. imagent) and monitoring the device log using on a Mac connected to the device. Eventually the following messages should appear:

default 14:54:42.957547 +0200 ReportCrash Formulating report for corpse[597] imagent
default 14:54:42.977891 +0200 ReportCrash Report of type '109(<private>)' not saved because the limit of 25 logs has been reached

As such, it should be possible to first use a seperate, unexploitable DoS bug (e.g. an ObjC exception or a stack overflow due to too much recursion) to crash imagent around 25 times so that no more crashlogs are collected for it and only then start the actual exploitation procedure. This has, however, also not been verified in practice.

The Issue with Automatic Delivery Receipts

As was demonstrated in this blog post, it is possible to bypass ASLR by creating a side channel that abuses automatic delivery receipts sent by the target device to the attacker. The fix for this at first appears simple: send the delivery receipt before doing any complex message parsing so that it will be sent regardless of whether the payload causes a crash. However, this is not sufficient, as the following attack likely still works:

  1. send the “oracle query” message (the one that might cause a crash) 2-3 times in a row
  2. send a "normal" message that will never cause a crash
  3. measure the time it takes for the last delivery receipt to arrive. If that is longer than a few seconds, then imagent likely crashed repeatedly during step 1 and is now subject to a restart delay by launchd

This attack now exploits the restart delay enforced by launchd on iOS. A similar mechanism might exist on other platforms as well though. With that, it appears that any kind of automatic message sent from the recipient side (or any other action that could be observable by the attacker, for example accessing a website) is potentially dangerous as it can be exploited in a similar way. As such, ideally, no automatic messages should be sent out at all, or at least not to senders with which the user never had any previous interaction.

This concludes the second part of this blog post series. The third and final part of the series will detail how, even in the presence of pointer authentication (PAC) on A12 and newer devices, remote code execution can now be achieved.

No comments:

Post a Comment