Monday, March 28, 2016

Life After the Isolated Heap

Posted by Natalie Silvanovich, Mourner of Lost Exploits

Over the past few months, Adobe has introduced a number of changes to the Flash Player heap with the goal of reducing the exploitability of certain types of vulnerabilities in Flash, especially use-after-frees. I wrote an exploit involving two bugs discovered after the Isolated Heap was implemented to explore how it impacts their exploitability.

The Isolated Heap


The Flash heap, MMgc, is a garbage collected heap that also supports unmanaged fixed allocations. In the past, there have been many exploits in the wild that used certain properties of the heap to aid exploitation. In particular, many exploits used the allocation properties of Vectors to gain read/write access to the entire Flash memory space via heap memory corruption bugs. Exploits that use other object types, such as ByteArray and BitmapData have also been seen in the wild.

MMgc was originally implemented as a type and size bucketed allocator. When memory is requested, the allocator that is called depends on the type of memory that is needed. This is related to the garbage collection properties of the memory. If it is not garbage collected, the Fixed allocator is used, otherwise the Garbage-Collected (GC) allocator is used. Within the GC allocator, there are about eight subtypes of memory that can be allocated, related to whether the memory contains pointers and whether those pointers have custom finalizers or GC routines that need to be called. Within each type, the request is sorted by size, and the memory is allocated on a heap page for that size. Large requests are allocated on their own page.

The Isolated Heap introduces partitioning to the heap, essentially a third factor which determines where memory is allocated. There is separate memory for each partition, which is then split into subsections for different types and sizes. The goal of partitioning is to allocate objects that are likely to contain memory corruption bugs in a different area of memory than objects that are likely to be useful in exploiting memory corruption bugs, and generally add more entropy to the heap.

There are currently three partitions on the heap. The first partition is generally used for objects that contain pointers: script objects, their backing GC-memory and certain pointer arrays. The second partition is used for objects that contain non-pointer data, mostly arrays of primitive types. The third partition is used for a small number of objects that have a history of being used in exploits. These are typically variable-sized data buffer objects. Outside of the Isolated Heap, checksumming has also been implemented to detect and abort if certain sensitive objects are ever altered.

CVE-2016-0998


CVE-2016-0998 was discovered by Mateusz Jurczyk and I while fuzzing the Flash Player (full code for the exploit can be found attached to this bug). It was reported to Adobe on February 3, 2016 and fixed by Adobe on March 10, 2016. It is a good example of a bug that the Isolated Heap makes more difficult to exploit.

The bug is an uninitialized variable in the fix to an ActionScript 2 use-after-free bug. Roughly 80 of these types of issues have been fixed by Adobe in the past year, and two uninitialized variable issues were introduced in the fixes.

This issue is fairly easy to reproduce, a proof-of-concept for this issue in its entirety is:

var o = {};
o.unwatch();

The bug occurs because the use-after-free check in the unwatch method attempts to convert its first parameter to a string by calling toString on it before continuing with the part of the method where toString  could cause problems by freeing an object. However, Flash does not check that this parameter exists before calling toString on it. In pseudo-code, the rough behaviour of this method is:

void* args = alloca( args_size );
for( int i = 0; i < args_size; i++){
// Init args
}

if ( ((int) args[0]) & 6 == 6 )
args[0] = call_toString( args[0] );

if ( args_size < 1)
exit();

There’s a few interesting things to note about this bug. First, on Flash, alloca(0) allocates 16 bytes (the minimum allowed size), but the initialization loop doesn’t run, so this memory contains whatever was on the stack the last time this memory was used, which is not part of the current call. Second, the vulnerable behaviour only occurs if the object on the stack ends in 6. The purpose of this behaviour is to ensure that the parameter is a ScriptObject -- Flash arguments can be many types, such a strings, integers, objects, etc., and the last three bits of the value indicates its type, with 6 indicating a ScriptObject. Finally, this bug bails pretty quickly if the argument array is too small. There’s only one function, call_toString that’s called on the uninitialized value. This function searches through the ScriptObject’s variables for a method called toString, and then calls it, calling some virtual methods in the process.

With the above constraints, there are a few ways to exploit this bug:

  1. Put an absolute pointer value on the stack. The benefit of this is that you can guarantee that it ends in 6. The downside is that you need a separate bug to bypass ASLR, because there’s no way to get your bearings otherwise
  2. Put a pointer to some type of object or buffer that is not a ScriptObject on the stack, and use type confusion for the exploit. This is somewhat challenging for this particular bug, because valid pointers that end in 6 are unusual on the stack, as most of the time they are aligned.  The only situation where unaligned pointers are typically on the stack is when manipulating data buffers such as strings where each byte is accessed individually.
  3. Make this bug into a use-after-free. Put a stale pointer to a ScriptObject on the stack and wait for it to be freed and reallocated, and then use type confusion for the exploit

Option 2 seemed the most practical up front, but the Isolated Heap posed some challenges. As noted above, if a call puts an unaligned pointer on the stack, it is probably manipulating some sort of byte data that does not contain pointers, as pointer access needs to be aligned. So it is probably possible to make args[0] point to a buffer type, such as a ByteArray in ActionScript, but ASLR is still a problem, because call_toString calls virtual functions on args[0] , and without knowing the location of any code addresses, the address in the buffer that will be treated as a vtable can’t be set to a reasonable value. One possible way of solving this problem would be to have args[0] point to a buffer, and then realloc it to be something else that has a valid vtable, but all script-controllable byte buffers are allocated in partition 3, which is not used for any other data types that contain pointers, so this isn’t possible with the Isolated Heap.

I then tried Option 3, and tried to reallocate a different object in the place of a ScriptObject. This bug more amenable to this than a lot of other bugs, because there’s no limitation to when the object needs to be reallocated, other than it needs to be reallocated after the pointer to it is outside of the valid stack (i.e. the stack pointer is higher than the address of the value), and it needs to remain allocated until it is used by the bug. These constraints aren’t very limiting, as basically any object in the Flash Player can be allocated in this window. That said, only allocations with the same partition, type and size as a ScriptObject will be allocated in the freed memory. Looking at object allocations in Flash, only about 10 other objects have these properties, and they all extend the same class, the AS3 ScriptObject class (which is different from the AS2 ScriptObject that is freed). Unfortunately though, the first virtual function that this bug calls on the reallocated buffer maps to ScriptObject::getDescendants, which immediately throws an ActionScript 3 exception, which leads to a null pointer crash, because exception handlers haven’t been properly initialized. So in this case, there isn’t an appropriate object that can be allocated in the place of an AS3 ScriptObject that can make this bug exploitable as a use-after-free.

At this point, I didn’t think it was very likely that this bug would be exploitable without a second information leak vulnerability, so I tried exploiting it with a second bug.

CVE-2016-0984


CVE-2016-0984 is a use-after-free in sound processing in which the freed buffer can only be read. I reported CVE-2016-0984 on January 11, 2016 and Adobe released a patch on February 16, 2016.

A proof-of-concept for the bug is as follows:

var s = new Sound();
var b = new ByteArray();
for( var i = 0; i < 1600; i++){
b.writeByte(1);
}
b.position = 0;
s.loadPCMFromByteArray(b, 100, "float", false, 2.0);
var c = new ByteArray();
for(var i = 0; i < 2; i++){
c.writeByte(1);
}
c.position = 0;
try{
s.loadPCMFromByteArray(c, 1, "float", false, 2.0);
}catch(e:Error){
trace(e.message);
}

var d = new ByteArray();
s.extract(d, 1, 0);

This bug is related to exception handling in the loadPCMFromByteArray method. This method loads sound data from an array that is provided by ActionScript, and then processes it, and stores it internally in the Sound object. The general flow of the function is as follows:

if ( input_size < needed_size ){ // needed_size is wrong
throwASException();
}

delete[] m_pcm;
char* sound_data = new char[input_size];

for( int i = 0; i < input_size; i++){
sound_data = inputArray.readStuff(); // can throw exception
}

m_pcm = sound_data;

The code attempts to check that the array is the right size and throws an exception before it does any pointer manipulation, but there is an arithmetic error in how the size is calculated, so some situations in which the array is too small will get through (see the tracker for exact details on how to trigger this condition). In this case, the input array will throw an exception when it is read, which means that m_pcm will be freed but not reallocated. This is a fairly versatile bug, in that the array that is freed can be of any size, though it is always a character array in partition 2, the data heap.

The first step was to use this bug to obtain the address of a vtable to break ASLR. It wasn’t immediately obvious how to do this, as the heap partition the array is allocated if generally used for primitive arrays. There were two exceptions to this I was aware of. First, arrays of pointer that aren’t void pointers are allocated on this heap, but this isn’t particularly helpful for this bug, as they tend to be pointers to other primitive data types, and even if they were pointers to objects, there’s no way to use this bug to iterate through pointers, it can only be used to read values off the heap without knowing their location. Another property I noticed is that object arrays are also allocated in this partition, so if an array of objects that call virtual methods (or contain function pointers) is allocated, you could read the code pointers off of the heap. When I looked though, I couldn’t find a single array of virtual objects allocated in a way that is script-controllable in Flash, as arrays of pointers are usually used instead.

Eventually, I discovered that the ActionScript JIT LIR implements its own basic heap, and allocates new pages as large char arrays, which are allocated in the same heap partition as other char arrays. These pages have variable sizes, and often contain objects with vtables. By selecting a PCM array allocation size that lines up with a frequent LIR allocation, I was able to read a vtable off the heap.

The next step was to get a pointer to a buffer I could control in script. It is possible to also use CVE-2016-0998 for this, but I suspected that using this bug could do it more simply and reliably. Since arrays of char pointers are allocated on the same heap partition as sound PCM data, char pointers could be easily read. I used the AS3 function LocaleID.determinePreferredLocales to allocate a char* array based on the String vector that is provided as input. Unfortunately though, ActionScript strings aren’t ideal for exploits. They are immutable after they are allocated, and worse, they terminate as soon as a NULL character is reached, which means on 64-bit systems, they can only ever contain one pointer. The best solution to this would be to allocate an array of pointers to something more controllable, such as a byte or int Array, but unfortunately in ActionScript, arrays of these types of pointers are fairly unusual, and when they exist, their size is usually not controllable from script. So instead, I reallocated the string data that the char pointers pointed to, so that they were integer arrays instead. This was possible in part because the strings allocated internal to the LocaleID.determinePreferedLocales method are not true Flash strings that are accessible via script, but character arrays, so they show up in partition 2 of the heap (the data partition) as opposed to partition 3 (the exploitable objects partition).

I used the BitmapData.paletteMap method to allocate integer arrays in the place of the character arrays allocated by LocaleID.determinePreferedLocales. This method accepts input of four integer arrays of size 256, and copies them to a larger int array of length 1024. These arrays are not mutable, but since the vtable pointer, and the pointer to the buffer have been determined at this point, it doesn’t matter, as all values that need to be in the array can be calculated before the array is created. Another undesirable property of this method is that the arrays are freed at the end, but there is a trick for getting around it. If any element of the arrays provided to the method is not an integer, Flash will attempt to call valueOf on it to convert it to a number. So if a late element (for example, the last element of the fourth array) has a valueOf that throws an exception, execution of the method will stop. In this case, the large int array will be allocated, and most of the input will be copied to it, but the part of the function that processes and frees the arrays will be skipped. This is a useful trick to avoid objects being freed that works in a lot of methods.

At this point, we have a pointer to a vtable, and a pointer to a buffer we control, so it’s fairly straightforward to gain code execution using CVE-2016-0998

Putting it all together


For CVE-2016-0998 to use the allocated buffer, a pointer to it with the final three bits set to 6 needs to be put on the stack. There are a few ways to do this, one of them being UTF conversion. UTF-8 to UTF-16 conversion is done on the stack if the length of resulting UTF-16 string is less than 256 bytes, which leads to the character values of the string being written to the stack.

To start off, my exploit grows the stack. This is just to avoid any values that have already been written from causing problems. This isn’t as straightforward as one would expect. The ActionScript stack is not the same as the C++ stack (where this bug occurs), so calling a function recursively in ActionScript won’t grow it. Instead, there needs to be a loop in C++. To cause this, I triggered a situation where toString would be called recursively, which would be done in C++. In ActionScript:

_global.v = 31;
_global.mc = this;
var o = {toString : f};
this.swapDepths(o);

function f(){
if(_global.v > 0){
_global.v = _global.v - 1;
_global.mc.swapDepths(this);
}else{

var t = _global.mc;
t.swapDepths(d, 2);
}
return 
"ffffffffffffffffffffffffffffffffffffffffffffffffffffffff";
}

This code calls swapDepths, which calls toString from C++, which then recursively calls swapDepths again, growing the stack.

The value that is returned from this type of function then gets converted from UTF-8 to UTF-16, so its contents get put on the stack. This can be used to put the pointer to the the controllable buffer on the stack. Unfortunately, conversion only happens if the String is encoded statically in the SWF, a string generated during AS execution won’t work. So a SWF needs to dynamically be created as a part of the exploit.

This process can be seen in test.cgi. The first part of the exploit, soundPCM.swf uses CVE-2016-0984 to break ASLR and create a buffer, and then it passes the location of the buffer to test.cgi via URL parameters in JavaScript. This calls into Python, and adds the correct address into a static string in a SWF called new.swf. Note that a UTF-8 converter is implemented in Python, this is because standard UTF encoding leads to characters of different lengths, and putting different lengths of strings into the SWF causes “movement” in memory when it is loaded, which can cause problems with the exploit which relies on static offsets. The implemented converter always creates three-byte UTF-8 encodings even if a shorter one is possible based on the specific buffer pointer value. Also note that the full string in the SWF will not get fully converted, because the 0x0000 value at the beginning of the 64-bit pointer will be treated as a null, and processing will stop. This isn’t ideal, it means that only one pointer can ever be copied to the stack, but it is a constraint that can be worked around.

At this point, we have a SWF that puts a pointer to the buffer on the stack, we now just need to trigger the exploit. I encoded this into the same SWF for simplicity. Once the SWF has been created, it is loaded once with URL parameter num=15, which sets up the stack, and then with URL parameter num=14, which triggers the bug, causing native toString to be called on the buffer that’s provided.

What’s in the buffer

The last step is to figure out what to put in the buffer. Running through the native call, there’s a few pointers that need to be set to something valid to avoid crashes (I pointed them back to various locations in the same buffer), and then a virtual call is made to the buffer. Setting the memory at the head of the buffer the call sees as the vtable to a location later in the buffer, and creating a fake vtable in that memory, it’s possible to make a call into a gadget. This exploit uses the following one:

mov rdi, rax
call [rax + 0x28]

This sets rdi to the head of the vtable, which is set to a string command, and then rax + 0x28 (0x28 bytes into the vtable) is set a location which calls system in the Flash plug-in, which triggers a call to system.

Conclusion


The Isolated Heap made exploiting CVE-2016-0998 more difficult and time consuming, and also made exploitation require a separate information leak bug, which probably would not have been required before the heap changes. There are a couple weaknesses in the Isolated heap, especially the use of the data partition for JIT allocation, and allocating pointer arrays on the data heap. We are working with Adobe to implement improvements to the Isolated Heap in future versions of Flash. It is challenging to harden a heap against exploitation, especially in the face of high-quality bugs, but the Isolated Heap is a substantial improvement.

Tuesday, March 22, 2016

Race you to the kernel!

Posted by Ian Beer of Google Project Zero

The OS X and iOS kernel code responsible for loading a setuid root binary invalidates the old task port after first swapping the new virtual memory map pointer into the old task object, leaving a short race window where you can manipulate the memory of an euid 0 process before the old task port is invalidated. Going a step further this also allows you to also gain any entitlement and, on OS X, load an unsigned kernel extension.


I reported this bug to Apple on December 16th 2015 and it was patched in OS X 10.11.4/iOS 9.3 as CVE-2016-1757. For more technical details see the original bug report where you can also grab the updated exploit.

Task Ports

If you’ve ever tried to use ptrace on OS X you were probably sorely disappointed. Whilst the syscall does exists it’s severely limited in what it can do. For example there’s no support for peeking and poking memory, a pretty fundamental use-case.


This is one of many areas of XNU where the duality of its kernel is evident; along with ptrace there’s a parallel abstraction layer in the Mach side of the kernel which supports building debugging tools via a set of special mach task ports


Each process executing on OS X has a task port. For example, to allocate a page of memory in your process you might use code like this:


mach_vm_address_t addr = 0;
mach_vm_allocate(mach_task_self(),
                &addr,
                0x1000,
                VM_FLAGS_ANYWHERE);


mach_vm_allocate is a MIG method implemented by the kernel (and defined in mach_vm.defs in the xnu kernel source.) The first argument to this method is the current task port; if you happened to have send rights to another task’s task port and you were to call this method passing that other task’s task port then the allocation would end up in that other task’s virtual address space, even though your task made the call. Similarly the mach_vm_read and mach_vm_write methods allow you to read and write the memory of other processes if you  have their task ports, effectively giving you complete control over them.


By default tasks only have send rights to their own task ports, but it’s easy to give other processes send rights to your task port using the standard mach messaging mechanisms.

execve

Usually new task ports will only be created when new processes are created (via fork or posix_spawn). Just execing a new binary won’t reset the task port. This leads to the obvious question: since having a send right to a process’s task port give you complete control over that process, how do they deal with SUID binaries where the exec call will increase the privileges of the process?


Looking through the exec code we find this snippet:


/*
* Have mach reset the task and thread ports.
* We don't want anyone who had the ports before
* a setuid exec to be able to access/control the
* task/thread after.
*/
ipc_task_reset(p->task);


Sounds sensible. But execve is a really complex syscall and there’s a lot going on here:



This diagram outline the four major phases of the exec syscall which we’re interested in along with the lifetimes of some data structures.


In phase 1 the old task port is still valid, and via the task struct gives us access to the old vm_map which still contains the virtual address space of the task prior to the exec. As part of the load_machfile method the kernel will create a new vm_map into which it will load the new executable image. Note though that at this point the new vm_map isn’t matched up with any task; it’s just a virtual address space with no process.


In phase 2 the parse_machfile method does the actual parse and load of the MachO image into the new virtual address space. It’s also during this phase that any code signatures which the binary has are checked; we’ll come back to that detail later.


In phase 3 load_machfile calls swap_task_map; this sets the task struct to point to the new vm_map into which the target executable was just loaded. From this point on any mach_vm* methods invoked on the old task port will target the new vm_map.


In phase 4 exec_handle_sugid will check if this binary is SUID, and if it is, will reset the task port, at which point the old task port will no longer work.

Somethings not right there…

From this high-level view it’s clear that there’s a fundamental race condition. Between phase 3, when the vm_map is swapped, and phase 4, when the task port is invalidated, we will have full read/write access via the old task port to the new vm_map containing the loaded binary which is about to be executed, even if it’s SUID which means that we can just use mach_vm_write to overwrite its text segment with code we control!


From the diagram it’s hard to tell but there’s actually quite a long period between those two phases, certainly enough time to sneak in a few mach kernel MIG calls.

Building a reliable exploit

There’s one small stumbling block to overcome though: although we can now try to write to the new vm_map, where should we write? Userspace ASLR on OS X (for binaries, not dynamic libraries) is randomized per-exec so we first have to figure out where the kernel loaded the image.


Luckily with the task port we can do a lot more than just blindly read and write the new vm_map; we can also API’s like mach_vm_region to ask about properties of memory regions in the map. In this case we just loop constantly asking for the address of the lowest mapping. When it changes we know that it changed to the base address of the image in the new vm_map and  that we’re now in the race window between phases 3 and 4.


Using this base address we can work out the address of the binary’s entrypoint in the new vm_map, mark that page as RWX using mach_vm_protect and overwrite it with shellcode using mach_vm_write.


Run root_shell.sh in the attached exploit to see this in action. It should work out of the box on any semi-recent OS X system, using otool to extract the entrypoint of the traceroute6 suid-root executable.

Going further: codesigning and entitlements

In the exec phases diagram I mentioned that parse_machfile (in phase 2) is responsible for both loading the MachO as well as verifying any code signatures present. You might ask why this is relevant on OS X since, unlike iOS, there’s no mandatory userspace code signing and we can quite happily execute unsigned code anyway.


But OS X does use userspace code signing to enforce entitlements which are just xml blobs stored in your MachO and signed as part of your code signature.


Kernel code can check whether user space processes have particular entitlements (via IOTaskHasEntitlement) and the kernel kext loading code will now only accept load requests from processes which are both running as root and also have the com.apple.rootless.kext-management entitlement.


This is particularly important since on OS X kext code signatures are enforced by userspace. This means that the kext-management entitlement is a vital part of the separation between root in userspace and kernel code execution.


Two system binaries which do have the kext-management entitlement are kextd and kextload; so in order to defeat kernel code signing we just need to first exec a suid-root binary to get root then exec either kextload or kextd to get the right entitlement.

Patching kextload

kextload is probably the easier target for a binary patch since it’s a standalone tool which can load a kext without talking to to kextd. We just need to patch the checkAccess function so that kextload thinks it can’t reach kextd and will try to load the kext itself. Then patch out the calls to checkKextSignature and checkSignaturesOfDependents such that kextload thinks they succeeded and it will now load any kext, signed or unsigned :-)

The released exploit contain a binary patch file to remove the signature checks in kextload on OS X 10.11.3 as well as a simple HelloWorld kext and load_kext.sh and unload_kext.sh scripts to load and unload it. These can be run as a regular user. Look in Console.app to see the kernel debug printf output.