Monday, August 3, 2020

Exploiting Android Messengers with WebRTC: Part 1

Posted by Natalie Silvanovich, Project Zero

This is a three-part series on exploiting messenger applications using vulnerabilities in WebRTC. This series highlights what can go wrong when applications don't apply WebRTC patches and when the communication and notification of security issues breaks down. Part 2 is scheduled for August 5 and Part 3 is scheduled for August 6.

Part 1: First Attempts

WebRTC is an open source video conferencing solution used by a variety of software including browsers, messaging clients and streaming services. While Project Zero has reported several vulnerabilities in WebRTC in the past, it was not clear whether these bugs were exploitable, especially outside of browsers. I investigated whether two recent bugs are exploitable in popular Android messaging applications.

The Bugs


I started off by trying to exploit two bugs, CVE-2020-6389 and CVE-2020-6387.

Both of these vulnerabilities are in WebRTC’s Remote Transport Protocol (RTP) processing. RTP is the protocol WebRTC uses to transport audio and video content from peer to peer. RTP supports extensions, which are extra pieces of data that can be included in each packet to tell the destination peer how to display or process the data. For example, there is an extension that contains information about the screen orientation of the sending device, and one that contains the volume level. Both of these vulnerabilities occurred in extensions that had been implemented in WebRTC in 2019.

CVE-2020-6389 occurred in the frame marking extension, which contains information on how video content is split into frames. The bug is in how it processes layer information: WebRTC only supports five layers, but the layer number is a three-bit field in the extension, which means it can go as high as seven. This leads to an out-of-bounds write in the following code. temporal_idx is set from the layer number in the extension. 

if (layer_info_it->second[temporal_idx] != -1 &&
AheadOf<uint16_t>(layer_info_it->second[temporal_idx], frame->id.picture_id)) {
      // Not a newer frame. No subsequent layer info needs update.
     break;
   }
  ...
   layer_info_it->second[temporal_idx] = frame->id.picture_id;

The final line of code is where the out-of-bounds write occurs, as the array only contains five elements. This bug also has some limitations not obvious from the above code. To start, there is a check before the write, that checks whether the current value of the memory, casted to a 16-bit unsigned integer is more than the current sequence number. The write only occurs if this is true. Practically, this wasn’t much of a limitation, a crash usually occurred after two or three times when I tested it. A more serious limitation is that the layer_info_it->second field has a 64-bit integer type, but  frame->id.picture_id is a 16-bit integer. This means that while this bug allows an attacker to write up to three 64-bit integers outside of a fixed size heap buffer, the values that can be written are very limited, and are too small to represent pointers.

CVE-2020-6387 is a bug in how the video timing extension is processed by Forward Error Correction (FEC). FEC copies incoming RTP packets, and then clears certain extensions when attempting to correct errors. This vulnerability occurs because extensions of the video timing type are not verified to be of the expected length before they are cleared. The code causing this bug is as follows:

case RTPExtensionType::kRtpExtensionVideoTiming: {
       // Nullify 3 last entries: packetization delay and 2 network timestamps.
       // Each of them is 2 bytes.
       uint8_t* p = WriteAt(extension.offset) + VideoSendTiming::kPacerExitDeltaOffset;
       memset(
           p,
           0, 6);
       break;
     }

The value of VideoSendTiming::kPacerExitDeltaOffset is 7, so this code writes six zeros from offset 7 to offset 13 from the start of the extension in the packet. However, there is no check that the extension data is more than 13 bytes long, or even that the packet has this number of bytes left. The result of this bug is that an attacker can write up to six zeros to the heap at an offset of up to seven bytes from a variable sized heap buffer. This bug is better than CVE-2020-6389 in some ways and worse in others. It is better in that the heap buffer that can be overflowed is variable size, which gives a lot more options of what can be overwritten by this bug on the heap. The offset also offers some flexibility on where the zeros are written, and the write does not have to be aligned, whereas CVE-2020-6389 requires 64-bit alignment. This bug is worse in that the value written has to be zero, and the size of the area that can be written is smaller (six bytes versus 24).

Moving the Instruction Pointer


I started off by seeing if it was possible to use either of these bugs to move the instruction pointer. Modern Android uses jemalloc, a slab allocator which doesn’t use inline heap headers, so corrupting heap metadata was not an option. Instead, I compiled WebRTC for Android with symbols, and loaded it in IDA. I then went through the available object types to see if there was anything that could obviously be used to move the instruction pointer or improve the capabilities of the bug. I didn’t find anything.

I thought maybe I could use CVE-2020-6389 to overwrite a length and cause a larger overflow, but this had some problems. To start, the bug writes a 64-bit integer, meanwhile a lot of length fields are 32-bit integers, which means the write also overwrites something else, and can only write a non-zero value if the length is 64-bit aligned. The location of the bug in processing is also problematic, as it does the overwrite near the end of the incoming packet being processed, meaning that many objects are not accessed again after this point, so any overwritten memory would never be used again. CVE-2020-6389 also overwrites a heap buffer of fixed size 80, which limits the object types that can be affected by this bug. I didn’t think CVE-2020-6387 would be viable for this purpose either, as it can only write zeros, which can only make a length smaller.

I wasn’t sure where to go at this point, so I triggered CVE-2020-6389 a few dozen times on Android to see if there were any crashes at an address wider than 16-bits, hoping they might give me ideas of ways that this bug could influence the behavior of the code other than overwriting a pointer with an invalid 16-bit value. To my surprise, it crashed with the instruction pointer set to a value that had clearly been read off the heap about one in 20 times. 

Analyzing the crash, it turned out that a StunMessage object was being allocated after the overflowed region. The members of the StunMessage class are as follows.

protected:
  std::vector<std::unique_ptr<StunAttribute>> attrs_;
 ...
 private:
  ...
  uint16_t type_;
  uint16_t length_;
  std::string transaction_id_;
  uint32_t reduced_transaction_id_;
  uint32_t stun_magic_cookie_;

So after the vtable, the first member is a vector. How are vectors laid out in memory? It turns out its first two members are as follows.

  pointer __begin_;
  pointer __end_;

These pointers point to the beginning and the end of the vector’s contents in memory. During the crash, the __end_ member was overwritten with a small 16-bit integer. Vector iteration works by starting at the __begin_ pointer and incrementing until the  __end_ pointer is reached, so this change means that the next time the vector is iterated over, usually in the destructor, it will go out of bounds. Since this vector contains virtual objects of type StunAttribute, it will perform a virtual call to each element, to call its destructor. This virtual call on out-of-bounds memory was what was moving the instruction pointer.

This seemed like a reasonable way to control the instruction pointer, except for one problem: in a typical configuration, it is not possible for an attacker at one end of a WebRTC connection to send STUN to the user at the other, instead they each communicate with their own STUN server. I asked Philipp Hancke of webrtchacks if he knew of a way. He suggested this method, which involves specifying a TCP server controlled by the attacker as a potential routable path between two peers, called an ICE candidate. Both the attacker and target device will then communicate through this server, including STUN messages.

This allowed me to send STUN messages with an unusually large number of attributes. This was necessary because in order to control the instruction pointer, I would need to be able to control what showed up in memory after the STUN attribute vector. jemalloc allocates similar sized allocations, determined by predefined size classes in contiguous memory runs. The less used a size class is, the more likely it is that two objects of the same size class will be allocated one after the other. 

Typically, STUN messages have a small number of attributes, which translates to a vector buffer size of 32 or 64 bytes, which are both very frequently used size classes. Instead, I sent STUN messages with 128 attributes, which translated to a vector buffer size of 1024 bytes, which happens to be an infrequently used size class in WebRTC. By sending many STUN messages with this number of attributes, while at the same time sending RTP packets of size 1024 containing the desired pointer value, interspersed with packets containing the bug, I was able to get a virtual call on that pointer value about one in five times. This was good enough for use in an exploit, and I decided to move on to breaking ASLR.

Breaking ASLR


There were two possible approaches for breaking ASLR in this exploit. One was to use one of the above bugs to read memory and send it back to the attacker device or TCP server somehow, the other was to use some sort of crash oracle to determine the memory layout.

I started off by seeing whether it was possible to use one of the bugs to read memory remotely from the target device. Mark Brand suggested that it might be possible to use CVE-2020-6387 to accomplish this by setting the low bytes of a pointer to outgoing data to zero, causing out-of-bounds data to be sent instead of the actual data. This seemed like a promising approach, so I used IDA to look for potential objects.

It turned out there were quite a few, and they all had problems. I spent some time on SendPacketMessageData and DataReceivedMessageData. These objects are used to store pointers to outgoing RTP data while it is queued. They contain a CopyOnWriteBuffer object, and its first member is a ref-counted pointer to an rtc::Buffer object. It was possible to set the bottom bytes of this pointer to be zero using CVE-2020-6387. Unfortunately, the structure of rtc::Buffer made revealing memory this way challenging.

RefCountedObject vtable;
size_t size_;
size_t capacity_;
std::unique_ptr<T[]> data_;

I was hoping that it would be possible to make the clipped pointer to this structure to point to some other object on the heap that had a pointer in the location of the data_ pointer, and that data would get sent instead. However, it turned out that in the process of sending data, all four members on the object above get accessed and need to be reasonably valid. I went through all the available objects in the same size class as the rtc::Buffer class, but couldn’t find one with these exact properties. 

I then considered that instead of using a different object, I could use an rtc::Buffer object that had already been freed, with a specific backing buffer size that could be replaced with an object containing pointers using heap manipulation. This didn’t work out either. This was largely an issue of reliability. To start off, an rtc::Buffer object is 36 bytes, which translates to size class 48 in jemalloc, meaning 48 bytes get allocated. Imagining some contiguous allocations of this type, the addresses would be as follows.

0x[...]0000      buffer 0
0x[...]0030      buffer 1
0x[...]0060      buffer 2
0x[...]0090      buffer 3
0x[...]00c0      buffer 4
0x[...]00f0       buffer 5
0x[...]0120      buffer 6
...
   
If the first byte of buffers 0 through 5 are set to zero by the vulnerability, they will land on a valid buffer, but if buffer 6 is set to zero, it will not, because 256 doesn’t divide evenly into 48. The end result is that every time the bug hits the SendPacketMessageData  object, there is only a one in three chance it will end up pointing to a valid rtc::Buffer. Hitting the object in the first place is also unreliable, because there are many other allocations of a similar size being made by WebRTC. It’s possible to increase the number of these objects on the heap, and the amount of time before they are sent by using the TCP server to make the connection very slow, but even then I could only hit the structure less than 10% of the time. Having to manipulate the heap so that there are many freed rtc::Buffer objects in a row in the first place, and the backing has been replaced by something containing pointers added even more unreliability. I eventually abandoned this approach because I didn’t think I could get it reliable enough to use in an exploit with a reasonable amount of effort, though I think it’s probably possible. The crash behavior of the application being attacked also matters a lot. This would probably work on an application that respawns immediately in the case of a crash, but would be a lot less practical on an application that stops respawning unless there is a certain delay, which is common on Android.

I also looked a lot at how outgoing packets are generated by WebRTC, especially Remote Transport Control Protocol (RTCP), which a peer always sends, even if it is just receiving audio or video. However, most outgoing packets are generated on the stack, so it is not possible to alter them using heap corruption bugs.

I also considered using a crash oracle to break ASLR, but I felt it was unlikely to succeed with these specific bugs. To start, hitting a heap allocation with them is unreliable, so it would be difficult to tell whether a crash had occurred due to a specific condition, or just because the bug had failed. I was also unsure whether it would even be possible to create detectable conditions considering the limited capabilities of these bugs.

I also thought about using CVE-2020-6387 to alter a vtable or a function pointer in order to read memory, cause behavior detectable by a crash oracle or perform offset-based exploitation that doesn’t require ASLR to be broken. I decided not to pursue this path, because the end result would depend on which functions and vtables are loaded at locations ending in zero, which varies greatly between builds. An exploit written using this method would require a large amount of modification to work on even slightly different versions of WebRTC, and there is no guarantee it would work at all.

I decided at this point that I needed to look for new bugs that could break ASLR, as neither of the ones I’d found recently could do it easily.

Stay tuned for Part 2: A Better Bug, which is scheduled for Wednesday, August 5.

Friday, July 31, 2020

The core of Apple is PPL: Breaking the XNU kernel's kernel

Posted by Brandon Azad, Project Zero

While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple's Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL "before" XNU was appealing. In the end, though, I wasn't able to think of a way to break PPL using the physical mapping primitive alone.

PPL's goal is to prevent an attacker from modifying a process's executable code or page tables, even after obtaining kernel read/write/execute privileges. It does this by leveraging APRR to create something of a "kernel inside the kernel" that protects page tables. During normal kernel execution, page tables and page table metadata are read-only, and code that modifies page tables is non-executable; the only way for the kernel to modify page tables is to enter PPL by calling a "PPL routine", which is analogous to a syscall from XNU into PPL. This limits the entry points into the kernel code that can modify page tables to just those PPL routines.

I considered several ideas to bypass PPL using the one-byte technique's physical mapping primitive, including mapping page tables directly, mapping a DART to allow modifying physical memory from a coprocessor, and mapping the I/O addresses used to control clock gating to power down certain components of the system. Unfortunately, none of these ideas panned out.

However, it's not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.

Decompiler output showing a call to pmap_remove_range_options().
Some memory corruption in pmap_remove_options_internal(). Using a kernel function calling primitive, both va_start and size are controlled.

The function pmap_remove_options_internal() is a PPL routine, one of the "PPL syscalls" from the XNU kernel to the even more privileged PPL. It is called by invoking pmap_remove_options() in XNU, which validates arguments and then calls pmap_remove_options_internal() in PPL. Its purpose is to unmap the supplied virtual address range from the physical memory map (pmap) of a process.

MARK_AS_PMAP_TEXT static int
pmap_remove_options_internal(
        pmap_t pmap,
        vm_map_address_t start,
        vm_map_address_t end,
        int options)

The actual work of removing the translation table entries (TTEs) that map the supplied virtual address range is done by calling pmap_remove_range_options(), which takes pointers to the beginning and end of the TTE range to remove from the level 3 (leaf) translation table.

static int
pmap_remove_range_options(
        pmap_t pmap,
        pt_entry_t *bpte,   // The first L3 TTE to remove
        pt_entry_t *epte,   // The end of the TTEs
        uint32_t *rmv_cnt,
        int options)

Unfortunately, when pmap_remove_options_internal() calls pmap_remove_range_options(), it seems to assume that the supplied virtual address range will not cross an L3 translation table boundary, because if it does then the calculated TTE range will span out-of-bounds memory:

remove_count = pmap_remove_range_options(
                   pmap,
                   &l3_table[(va_start >> 14) & 0x7FF],
                   (u64 *)((char *)&l3_table[(va_start >> 14) & 0x7FF]
                         + ((size >> 11) & 0x1FFFFFFFFFFFF8LL)),
                   &rmv_spte,
                   options);

This means that if we have an arbitrary kernel function calling primitive, we can invoke the PPL-entering wrapper function directly and get pmap_remove_options_internal() called with an improper virtual address range, which makes pmap_remove_range_options() try to remove "TTEs" read from out-of-bounds memory while in PPL mode. And since the removed TTEs are zeroed out, this means that we can corrupt PPL-protected memory.

Calling pmap_remove_options_internal() with an address range spanning an L2 TTE boundary (that is, the address range requires two L2 TTEs to map it) will cause the processed TTE array to run off the end of the L3 translation table page, resulting in out-of-bounds TTEs being removed.


But zeroing out-of-bounds TTEs would be a rather annoying primitive to try and leverage for a PPL bypass. Much of the data we'd like to corrupt has probably already been allocated far away from our page tables, and PPL isn't a large enough code base that we're guaranteed to find something interesting we can do just by zeroing memory. And that's to say nothing of the accounting in PPL that would probably detect an attempt to unmap non-existent TTEs!

So instead I chose to focus on a side effect of this out-of-bounds processing: improper TLB invalidation.

Later on in pmap_remove_options_internal(), after the TTEs have been removed, the translation lookaside buffer (TLB) needs to be invalidated in order to ensure that the process cannot continue to access the unmapped pages through stale TLB entries.

    flush_mmu_tlb_region_asid_async(va_start, size, pmap);

This TLB flush occurs on the supplied virtual address range, not the removed TTEs. Thus, there could be a disagreement between the TLB entries invalidated and the L3 TTEs removed if the out-of-bounds TTEs were from a separate region of the process's address space, leaving stale TLB entries for those out-of-bounds TTEs.

By carefully controlling the layout of translation tables, it's possible to transform the out-of-bounds TTE removal into a different bug: improper TLB invalidation. This is because the out-of-bounds TTEs can correspond to discontiguous parts of the virtual address space, causing the set of TTEs removed to differ from the set of TLB entries flushed.


A stale TLB entry would allow a process to continue accessing the physical page after that page has been unmapped and potentially reused for page tables. So if we had a stale TLB entry for an L3 translation table, then we could insert L3 TTEs to map arbitrary PPL-protected pages as writable.

That's pretty much exactly how the PPL bypass works:

  1. Call the kernel function cpm_allocate() to allocate 2 pages of contiguous physical memory called A and B.
  2. Call pmap_mark_page_as_ppl_page() to insert pages A and B at the head of the ppl_page_list so they can be reused for page tables.
  3. Fault in pages for virtual addresses P and Q so that A and B are allocated as L3 TTs for mapping P and Q, respectively. P and Q are discontiguous but have TTEs that are contiguous.
  4. Start a spinner thread bound to a CPU core that reads from page Q in a loop to keep the TLB entry alive.
  5. Call pmap_remove_options() to remove 2 pages starting from virtual address P (which does not include Q). The vulnerability means that TTEs for both P and Q are removed, but only the TLB entry for P is invalidated.
  6. Call pmap_mark_page_as_ppl_page() to insert page Q at the head of the ppl_page_list so it can be reused for page tables.
  7. Fault in a page for virtual address R so that page Q is allocated as an L3 TT for R, even while we continue to have a stale TLB entry for Q.
  8. Using the stale TLB entry, write to page Q to insert an L3 TTE which maps Q itself as writable.

An animation showing the progression of the exploit over time. The vulnerability is used to establish a stale TLB entry for an unmapped page Q which then gets reallocated as an L3 translation table. The stale TLB entry for Q allows us to modify it and insert an L3 TTE mapping Q itself, which can then be used to modify page tables even after the stale TLB entry has been cleared.


This bypass was reported as Project Zero issue 2035 and fixed in iOS 13.6; you can find a POC that demonstrates how to map arbitrary physical addresses into EL0 there. Also, for a much more detailed look at exploiting improper TLB invalidation, check out Jann Horn's excellent blog post on the topic.

This bug demonstrates a common problem when creating a security boundary where none existed before. It's easy for code to make subtle assumptions about the security model (such as where argument validation occurs or what functionality is exposed vs. private) that no longer hold true under the new model. I wouldn't be surprised to see more bugs along this line in PPL.

Overall, though, I came away from this exercise impressed with the design of PPL. I think it's a sound mitigation with a clear security boundary that doesn't introduce more attack surface. My biggest criticism is that the value-add proposition of PPL is still not yet clear to me: What real-world attacks does PPL mitigate? Is it simply laying the groundwork for more sophisticated and powerful mitigations to come? Whatever the answer may be, I still prefer having it. Kudos to Apple for an interesting and well-thought-out mitigation.