Friday, May 10, 2019

Trashing the Flow of Data

Posted by Stephen Röttger

In this blog post I want to present crbug.com/944062, a vulnerability in Chrome’s JavaScript compiler TurboFan that was discovered independently by Samuel (saelo@) via fuzzing with fuzzilli, and by myself via manual code auditing. The bug was found in beta and was fixed before it made it into the stable release of Chrome, but I think it’s interesting for a few reasons and decided to write about it. The issue was in TurboFan’s handling of the Array.indexOf builtin and at first it looked like an info leak at best, so it was not clear that you can turn this into an arbitrary write primitive. Besides that, it’s an instance of a common bug pattern in JIT compilers: the code was making assumptions at compile time without inserting the corresponding runtime checks for these assumptions.

The Vulnerability

The vulnerable code was in JSCallReducer::ReduceArrayIndexOfIncludes, in one of the optimization passes of TurboFan that would replace Array.prototype.indexOf with an optimized version when it knows the elements kind of the array. If you’re not familiar with elements kinds in V8, I recommend to check out https://v8.dev/blog/elements-kinds. In short, array elements can be stored in different ways, for example as floats or as tagged pointers in fixed size C++ arrays or in a dictionary in the worst case. Thus, if the elements kind is known the compiler can emit optimized code for the specific case when accessing the elements.

Let’s take a look at the vulnerable function:

// For search_variant == kIndexOf:
// ES6 Array.prototype.indexOf(searchElement[, fromIndex])
// #sec-array.prototype.indexof
// For search_variant == kIncludes:
// ES7 Array.prototype.inludes(searchElement[, fromIndex])
// #sec-array.prototype.includes
Reduction JSCallReducer::ReduceArrayIndexOfIncludes(
   SearchVariant search_variant, Node* node) {
 CallParameters const& p = CallParametersOf(node->op());
 if (p.speculation_mode() == SpeculationMode::kDisallowSpeculation) {
   return NoChange();
 }

 Node* receiver = NodeProperties::GetValueInput(node, 1);
 Node* effect = NodeProperties::GetEffectInput(node);
 Node* control = NodeProperties::GetControlInput(node);

 ZoneHandleSet<Map> receiver_maps;
 NodeProperties::InferReceiverMapsResult result =
     NodeProperties::InferReceiverMaps(broker(), receiver, effect,
                                       &receiver_maps);
 if (result == NodeProperties::kNoReceiverMaps) return NoChange();

 [...]

For Array.p.indexOf, the compiler infers the map for the array using the InferReceiverMaps function. The map presents the shape of the object, e.g. which property is stored where or how the elements are stored. As you can see at the end of the function, the code will bail out if the map is not known and not optimize the function any further (result == kNoReceiverMaps).
On the first look, this seems reasonable until you notice that InferReceiverMaps has three possible return values:

enum InferReceiverMapsResult {
 kNoReceiverMaps,         // No receiver maps inferred.
 kReliableReceiverMaps,   // Receiver maps can be trusted.
 kUnreliableReceiverMaps  // Receiver maps might have changed (side-effect).
};

The returned maps can be unreliable, meaning side effects of previous operations could have changed the map, for example turning float elements into dictionary elements. However, the type is mostly reliable: an object can’t turn into a number.

That means if the maps are unreliable and you want to depend on anything besides the instance type, you will need to guard against changes with a runtime check. There are two ways to add runtime checks for your assumptions. You can for example insert a CheckMaps node in the effect chain here that will bail out if the map is not the expected one.

Or alternatively there’s a feature called compilation dependencies that will not add a check to the code that relies on the assumption but instead to the code that changes the assumption. In this case, if the returned maps are marked as stable, meaning no object having this map ever transitioned out of it to a different map, you could add a StableMapDependency. This will register a callback to deoptimize the current function which gets triggered if an object ever transitions out of that stable map. There’s a potential for bugs here either if the compiler performs such a check without registering the corresponding compilation dependency or if some code path invalidates the assumption without triggering the callbacks (check out this SSD Advisory for an interesting example).

The vulnerability in this case was that the compiler didn’t add any runtime checks at all in the case of unreliable maps and we get an element kind confusion. If you change the elements kind at runtime from float to dictionary, Array.p.indexOf will run out of bounds since a dictionary with an element at index 100000 is much shorter than a packed array of 100000 floating point values. You can see this behavior in Samuel’s proof-of-concept. Here’s a slightly cleaned up version of it:

function f(idx, arr) {
 // Transition to dictionary mode in the final invocation.
 arr.__defineSetter__(idx, ()=>{});
 // Will then read OOB.
 return arr.includes(1234);
}
f('', []);
f('', []);
%OptimizeFunctionOnNextCall(f);
f('1000000', []);

Information Leak

The first step to exploitation will be to turn this bug into an info leak. Note that Array.p.indexOf is checking for strict equality on the elements, which mostly boils down to a pointer comparison for heap objects or a value comparison for primitives. Since the only return value we get back is the index of an element in the array, our only option seems to be to brute force a pointer.
The idea is to turn a float array into dictionary mode to get indexOf to read out of bounds and brute force the searchElement parameter until it returns the OOB index of the pointer we want to leak, in particular the pointer to the storage of an ArrayBuffer. We also need to put our search value in memory behind the leaked pointer so that all the unsuccessful attempts don’t run out of bounds of the heap and trigger a segfault. Our memory layout will look roughly like this:


Our ArrayBuffer pointer will be page aligned which reduces the brute force a bit. But to speed up the attack further we can use another trick that allows us to brute force the upper 32 bits separately. V8 has an optimization that allows it to store small integers (Smi) and heap objects in the same place by using a tagged representation for the latter. On x64, all pointers will be marked with a one in the least significant bit and Smis have the lowest bit set to zero and the upper 32 bit are used to store the actual value. By checking the least significant bit, V8 can distinguish between a Smi and a pointer and doesn’t have to wrap the former in a heap object.
That means, if we use an array with tagged objects and Smis for our OOB access, we can brute force the upper 32 bit of our pointer by using a Smi as our searchElement parameter.

When implementing this attack, I always triggered crashes in which the bug ran out of bounds of the heap without running into the boundary I put in memory. It seemed difficult to get the right memory layout for my two step brute force to work. However after debugging for a bit, the real problem turned out to be a different one. After around 0x2000000 tries with my float brute force, the garbage collector kicked in and moved my objects around, breaking the required heap layout. Fortunately, this problem is easy to avoid: if the brute force was not successful after 0x1000000, reallocate the ArrayBuffer and start over until the pointer ends up in the right range.

Arbitrary Write

Now how can this bug be turned into an arbitrary write? When I said that the strict equality check mostly boils down to a pointer comparison, I didn’t mention a special case: strings. Strings in V8 have multiple representation. The regular string that consists of a length and the bytes in memory is represented by the SeqString class. But there’s also a SlicedString which is a slice of another string, a ConsString which is a concatenation of two other strings and more.
If V8 compares two strings, even with strict equality, it will first check that the length and the first character are the same and if that is true flatten the two strings to a SeqString to be able to compare them easily.

Since we already know the address of controlled memory with the pointer to our ArrayBuffer, we can create a fake string object in there and use the OOB read bug to trigger a type confusion between a float and a heap object. This allows us to call flatten on our fake string. My first idea was to overflow the buffer that is allocated for the result by crafting ConsStrings in which the left length plus the right length is larger than the total length. This works and I believe it might be possible to exploit it this way, however after the overflow I always ended up in a situation in which the next write would happen at an offset close to INT_MIN or INT_MAX and trigger a segfault.

There’s a more generic way to turn this into an arbitrary write by abusing the marking phase of the garbage collector (recommended reading: https://v8.dev/blog/trash-talk). But to understand it, let’s take a step back and look at the memory layout in V8. The heap is organized in multiple so-called spaces. Most new allocations end up in the new space, which itself is split into two. The memory is allocated linearly here, which makes it very easy to control relative offsets between objects. A round of garbage collection will then move objects from one part of the new space into the other and if they survived two rounds of collection they will get moved into the old space. Besides that, there are some special spaces for different objects, for example for large objects, for maps or for code. At the time of writing, spaces are always aligned to 2<<18 or 2<<19 bytes so you can find space-specific metadata for a given heap object by masking off the lower bits of the pointer.

So what happens if the garbage collector kicks in while V8 is processing our fake string object? It will traverse all the live objects and mark them as such. For that, it’s starting with all globals, all objects currently on the stack and depending on the type of GC run optionally all objects in the old space that have pointers to the new space. Since our fake string object currently lives on the stack, the GC will try to mark it as live. To do so, it will mask off the lower bits of the pointer to find the space metadata, take a pointer to the marking bitmap and set the bit at the offset that corresponds to the offset of our object in the space. However, since the storage of our ArrayBuffer is not a V8 heap object and thus doesn’t live in a space itself, the metadata can live in user-controlled memory too. By setting the bitmap pointer to an arbitrary value, we gain a primitive to set a bit at any chosen address. The final memory layout will look roughly like this:


As the target for the overwrite, I chose the length of an array backing store since that gives a very reliable primitive that is easy to turn into an object leak and arbitrary write. The only remaining obstacle is to leak the address of the length field. We could try to brute force that address again but this would take a bit longer since the pointer won’t be page aligned this time.

Instead, we can use the string flattening operation to get a pointer to the new space if we trigger it right. We create a fake ConsString to be flattened using our OOB primitive again. As a reminder, we just need to set the first character and the length correctly to start the flattening. The allocation will then end up either in new or old space depending on if the original string itself was in the new or old space. We control this by setting a flag in our fake space metadata. After the flattening happened, V8 will replace the left pointer of our ConsString with a pointer to the flattened string and the right side with an empty string and we can read these pointers back from our ArrayBuffer. Since allocations are happening linearly in the new space our array to corrupt will be at a predictable offset.

After we corrupted the length field of the array, we can use it to overwrite other data on the heap to gain the primitives we want. There are many write-ups online where to go from here, for example check out this great blog post by Andrea Biondo (@anbiondo). You can find the exploit up to the array length corruption here. It’s written against a manual build of Chrome 74.0.3728.0.

Conclusion

As has been shown many times before, often bugs that don’t seem exploitable at first can be turned into an arbitrary read/write. In this case in particular, triggering the garbage collector while our fake pointer was on the stack gave us a very strong exploitation primitive. The V8 team fixed the bug very quickly as usual. But more importantly, they’re planning to refactor the InferReceiverMaps function to prevent similar bugs in the future. When I noticed this function in the past, I was convinced that one of the callers will get it wrong and audited all the call sites. Back then I couldn’t find a vulnerability but a few months later I stumbled over this newly introduced code that didn’t add the right runtime checks. In hindsight, it would have been worthwhile to point out this dodgy API to the team even without vulnerabilities to show for.

Besides that, a proactive refactoring like this could have been done by an external contributor and would have been a great candidate for the Patch Reward Program. So if you have an idea how to prevent a bug pattern, even for vulnerabilities found by other researchers, think about running it by the V8 team and contributing to the code and it might be eligible for a patch reward.

Tuesday, April 16, 2019

Windows Exploitation Tricks: Abusing the User-Mode Debugger

Posted by James Forshaw, Google Project Zero

I've recently been adding native user-mode debugger support to NtObjectManager. Whenever I add new functionality I have to do some research and reverse engineering to better understand how it works. In this case I wondered what access you need to debug an existing running process and through that noticed an interesting security mismatch between what the user-mode APIs expose and what the kernel actually does which I thought would be interesting to document. What I’ll describe is not a security vulnerability, but it is useful for exploiting certain classes of bugs, which is why it fits in my series on exploitation tricks.

I'm not going to go into any great depth about how the user-mode debugger works under the hood -- if you want to know more Alex Ionescu wrote 3 whitepapers (1, 2, 3) over 12 years ago about the internals on Windows XP, and the internals haven't really changed much since. Given that observation, while I'm documenting the behavior on Windows 10 1809 I'm confident these techniques work on earlier releases Windows.

Attaching to a Running Process

Debugging on modern versions of Windows NT is based around the Debug kernel object, which you can create by calling the NtCreateDebugObject system call. This object acts as a bridge to assign processes for debugging and wait for debug events to be returned. The creation of this object is hidden from you by the Win32 APIs and each thread has its own debug object stored in its TEB.

If you want to attach to an existing process you can call the Win32 API DebugActiveProcess which has the prototype:

BOOL DebugActiveProcess(_In_ DWORD dwProcessId);

This ultimately calls the native API, NtDebugActiveProcess, which has the following prototype:

NTSTATUS NtDebugActiveProcess(
   _In_ HANDLE ProcessHandle,
   _In_ HANDLE DebugObjectHandle);

The DebugObjectHandle comes from the TEB, but requiring a process handle rather than a PID results in a mismatch between the call semantics of the two APIs. This means the Win32 API is responsible for opening a handle to a process, then passing that to the native API.

The question which immediately come to mind when I see code like this is what access does the Win32 API require and then what does the kernel API actually enforce? This isn't just idle musing, a common place for security vulnerabilities to manifest is in mismatched assumptions between interface layers. A good example is the discovery that NTFS hardlinks required a write access when being created from a Win32 application using CreateHardlink due to the user-mode API opening the target file with WRITE_ATTRIBUTES access. However, the kernel APIs didn't require any access (see my blog post about the hard link issue here) allows you to hardlink to any file you can open for any access. To find out what the Win32 API and Kernel APIs enforce we need to look at the code in a disassembler (or look at Alex's original write ups which has the code RE'd as well). The code in DebugActiveProcess calls a helper, ProcessIdToHandle to get the process handle which looks like the following:

HANDLE ProcessIdToHandle(DWORD dwProcessId) {
 NTSTATUS Status;
 OBJECT_ATTRIBUTES ObjectAttributes;
 CLIENT_ID ClientId;
 HANDLE ProcessHandle;
 DWORD DesiredAccess = PROCESS_CREATE_THREAD |
                       PROCESS_VM_OPERATION |
                       PROCESS_VM_READ |
                       PROCESS_VM_WRITE |
                       PROCESS_QUERY_INFORMATION |
                       PROCESS_SUSPEND_RESUME;

 ClientId.UniqueProcess = dwProcessId;
 InitializeObjectAttributes(&ObjectAttributes, NULL, ...)
 
 Status = NtOpenProcess(&ProcessHandle,
                        DesiredAccess,
                        &ObjectAttributes,
                        &ClientId);
 if (NT_SUCCESS(Status))
   return ProcessHandle;
 BaseSetLastNTError(Status);
 return NULL;
}

Nothing shocking about this code, a process is opened by its PID using NtOpenProcess. The code requires all the access rights that a debugger would expect:
  • Creating new threads, which is how the initial break point is injected.
  • Access to read and modify memory to write break points and inspect the running state of the process.
  • Suspending and resuming the entire process.
Once the kernel gets the process handle it needs to convert it back to a process object pointer using ObReferenceObjectByHandle. The API takes an desired access mask which is checked against the open handle access mask and only returns the pointer if the check succeeds. Here's the relevant snippet of code:

NTSTATUS NtDebugActiveProcess(HANDLE ProcessHandle,
                             HANDLE DebugObjectHandle) {
 PEPROCESS Process;
 NTSTATUS status = ObReferenceObjectByHandle(
            ProcessHandle,
            PROCESS_SUSPEND_RESUME,
            PsProcessType,
            KeGetCurrentThread()->PreviousMode,
            &Process);
 // ...
}

Here’s a pretty big mismatch in security expectations. The user mode API opens the process with much higher access than the kernel API requires. The kernel only enforces access to suspend and resume the target process. This makes sense from a kernel perspective (as Raymond Chen might say, looking at it through Kernel Colored Glasses) as the immediate effect of attaching a process to a debug object is to suspend the process. You'd assume that without having other access such as VM Read/Write there's not much debugging going on but from a kernel perspective that's irrelevant. All you need to use the kernel APIs is the ability to suspend/resume the process (through NtDebugContinue) and read events from the debug objects. The fact that you might get memory addresses in the debug events which you can’t access isn't that important from a design perspective.

Where's the problem you might ask? With only PROCESS_SUSPEND_RESUME access you can "debug" a process with limited access, but without the rest of the access rights you'd not be able to do that much. What can we do if we have only PROCESS_SUSPEND_RESUME access?

Accessing Debug Events

The answer to the question is based on the event you receive when you first wait for a debug event, CREATE_PROCESS_DEBUG_INFO. Note the native structure is slightly different but it's close enough for our purposes.

The create process event is received whenever you connect to an active process, it allows the debugger to synchronize its state by providing a set of events such as process creation, thread creation and loaded modules that you'd have received if you were directly starting a process under a debugger. In fact the NtDebugActiveProcess calls the method DbgkpPostFakeProcessCreateMessages which gives away the status of the debug events.

What's interesting about CREATE_PROCESS_DEBUG_INFO is you'll notice there are HANDLE parameters, hFile, hProcess and hThread corresponding to a file handle to the process executable, a handle to the process being debugged and finally a handle to the initial thread. Checking in DbgkpPostFakeProcessCreateMessages the objects are captured, however the handles are not generated. Instead the handles are created inside the NtWaitForDebugEvent system call, specifically in DbgkpOpenHandles. See if you can spot the problem with the following code snippet from that function:

NTSTATUS DbgkpOpenHandles(PDEBUG_EVENT Event,
                         EPROCESS DebugeeProcess,
                         PETHREAD DebugeeThread) {

 // Handle other event types first...
 if (Event->DebugEventCode == CREATE_PROCESS_DEBUG_EVENT) {
   if (ObOpenObjectByPointer(DebugeeThread, 0, NULL,
         THREAD_ALL_ACCESS, PsThreadType,
         KernelMode, &Event->CreateProcess.hThread) < 0) {
     Event->CreateProcess.hThread = NULL;
   }
   
   if (ObOpenObjectByPointer(DebugeeProcess, 0, 0,
         PROCESS_ALL_ACCESS, PsProcessType,
         KernelMode, &Event->CreateProcess.hProcess < 0) {
     Event->CreateProcess.hThread = NULL;
   }
   
   ObDuplicateObject(PsGetCurrentProcess(),
                     Event->CreateProcess.hFile,
                     PsGetCurrentProcess(),
                     &Event->CreateProcess.hFile, 0, 0,
                     DUPLICATE_SAME_ACCESS, KernelMode);
 }
 // ...
}

Did you spot the problem? This code is using the ObOpenObjectByPointer API to convert the debugee process and thread objects back to handles, that in itself is fine. However the problem is the API is called with the access mode set to KernelMode which means the call does not perform any access checking. That's not great but again wouldn’t be an issue if it wasn’t also requesting additional access above PROCESS_SUSPEND_RESUME. This code is effectively giving all access rights to the caller of NtWaitForDebugEvent to the debugged target process.

The result of this behavior is given a process handle with PROCESS_SUSPEND_RESUME it's possible to use that to get full access to the process and initial thread even if the objects wouldn't grant the caller that access. It might be argued that "You've attached the debugger to the process, what did you expect?". Well I would expect the caller would need to have opened suitable process and thread handles before attaching the debugger and use them to access the target, or if the kernel has to create new handles at least do an access check on them. This leads us to our first exploitation trick:

Exploitation trick: given a process handle with PROCESS_SUSPEND_RESUME access you can convert it to a full access process handle through the debugger APIs.

It’s probably rare you’ll encounter this type of bug as PROCESS_SUSPEND_RESUME is considered a write access to a process. Anything which would leak this access to a privileged process would also have leaked the other write accesses such as PROCESS_CREATE_THREAD or PROCESS_VM_WRITE and it’d be game over. To prove this exploit trick works the following is a simple PowerShell script which takes a process ID, opens the process for PROCESS_SUSPEND_RESUME, attaches it to a debugger, steals the handle and returns the full access handle.

param(
   [Parameter(Mandatory)]
   [int]$ProcessId
)

# Get a privileged process handle with only PROCESS_SUSPEND_RESUME.
Import-Module NtObjectManager

Use-NtObject($dbg = New-NtDebug -ProcessId $ProcessId) {
   Use-NtObject($e = Start-NtDebugWait $dbg -Infinite) {
       $dbg.Detach($ProcessId)
       [NtApiDotNet.NtProcess]::DuplicateFrom($e.Process, -1)
   }
}

What about the third handle in CREATE_PROCESS_DEBUG_INFO, the handle to the process executable file? This has a different behavior, instead of opening a raw pointer it duplicates an existing handle. If you look at the code it seems to be duplicating from the current caller’s process and back again, why would it need to do the duplication if it was already in the debugger process? The key is the final parameter, again it’s passing KernelMode, which means ObDuplicateObject will actually duplicate a kernel handle to the current process. The file handle is opened when attaching to the process and uses the following code:

HANDLE DbgkpSectionToFileHandle(PSECTION Section) {
 HANDLE FileHandle;
 UNICODE_STRING Name;
 OBJECT_ATTRIBUTES ObjectAttributes;
 IO_STATUS_BLOCK IoStatusBlock;

 MmGetFileNameForSection(Section, &Name);

 InitializeObjectAttributes(&ObjectAttributes,
     &Name,
     OBJ_CASE_INSENSITIVE |
     OBJ_FORCE_ACCESS_CHECK |
     OBJ_KERNEL_HANDLE);

 ZwOpenFile(&FileHandle, GENERIC_READ | SYNCHRONIZE,
            &ObjectAttributes, &IoStatusBlock,
            FILE_SHARE_ALL, FILE_SYNCHRONOUS_IO_NONALERT);
 return FileHandle;
}

This code is careful to pass OBJ_FORCE_ACCESS_CHECK to the file open call to ensure it doesn’t give the debugger access to arbitrary files. The file handle is stored away to be reclaimed by the later call to NtWaitForDebugEvent. This leads us to our second, and final exploitation trick:

Exploitation trick: with an arbitrary kernel handle closing bug you can steal kernel handles.

The rationale behind this exploitation trick is that once the handle is captured it’s stored indefinitely, at least while the process still exists. The handle can be retrieved at any arbitrary point in time. This gives you a much bigger window to exploit a handle closing bug. An example of this bug class was one I found in a Novell driver, issue 274. In this case the driver wouldn’t check whether ZwOpenFile succeeded when writing a log entry and so would reuse the handle value stored in the stack when it called ZwClose. This results in an arbitrary kernel handle being closed. To exploit the Novell bug using the debugger you’d do the following:

  1. Generate a log entry to create a kernel handle which is then closed. The value on the stack is not overwritten.
  2. Debug a process to get the file handle allocated. Handle allocation is predictable so there’s a good chance that the same handle value will be reused as the one used with the bug.
  3. Trigger the handling closing bug, in this case it’ll close the existing value on the stack which is now allocated by the debugger resulting in a dangling handle value.
  4. Exercise code in the kernel to get the now unused handle value reallocated again. For example SepSetTokenCachedHandles called through NtCreateLowBoxToken will happily duplicate other kernel handles (although since I reported issue 483 there are fairly strict checks on what handles you can use.
  5. Get the debugger to return you the handle.

Handle closing bugs do exist, though they’re perhaps rare. Also you have to be careful, as typically closing an already closed kernel handle can result in a bug check.

Wrapping Up


The behavior of user-mode debugging is another case where there are unexpected consequences to the design of the functionality. Nothing I’ve described here is a security vulnerability, but the behavior is interesting and it’s worth looking out for cases where it could be used.