Wednesday, December 5, 2018

Adventures in Video Conferencing Part 2: Fun with FaceTime

Posted by Natalie Silvanovich, Project Zero

FaceTime is Apple’s video conferencing application for iOS and Mac. It is closed source, and does not appear to use any third-party libraries for its core functionality. I wondered whether fuzzing the contents of FaceTime’s audio and video streams would lead to similar results as WebRTC.

Fuzzing Set-up


Philipp Hancke performed an excellent analysis of FaceTime’s architecture in 2015. It is similar to WebRTC, in that it exchanges signalling information in SDP format and then uses RTP for audio and video streams. Looking at the FaceTime implementation on a Mac, it seemed the bulk of the calling functionality of FaceTime is in a daemon called avconferenced. Opening up the binary that supports its functionality, AVConference in IDA, it contains a function called SRTPEncryptData. This function then calls CCCryptorUpdate, which appeared to encrypt RTP packets below the header.

To do a quick test of whether fuzzing was likely to be effective, I hooked this function and altered the underlying encrypted data. Normally, this can be done by setting the DYLD_INSERT_LIBRARIES environment variable, but since avconferenced is a daemon that restarts automatically when it dies, there wasn’t an easy way to set an environment variable. I eventually used insert_dylib to alter the AVConference binary to load a library on startup, and restarted the process. The library loaded used DYLD_INTERPOSE to replace CCCryptorUpdate with a version that fuzzed every input buffer (using fuzzer q from Part 1) before it was processed. This implementation had a lot of problems: it fuzzed both encryption and decryption, it affected every call to CCCryptorUpdate from avconferenced, not just ones involved in SRTP and there was no way to reproduce a crash. But using the modified FaceTime to call an iPhone led to video output that looked corrupted, and the phone crashed in a few minutes. This confirmed that this function was indeed where FaceTime calls are encrypted, and that fuzzing was likely to find bugs.

I made a few changes to the function that hooked CCCryptorUpdate to attempt to solve these problems. I limited fuzzing the input buffer to the two threads that write audio and video output to RTP, which also solved the problem of decrypted packets being fuzzed, as these threads only ever encrypt. I then added functionality that wrote the encrypted, fuzzed contents of each packet to a series of log files, so that test cases could be replayed. This required altering the sandbox of avconferenced so that it could write files to the log location, and adding spinlocks to the hook, as calling CCCryptorUpdate is thread safe, but logging packets isn’t.

Call Replay


I then wrote a second library that hooks CCCryptorUpdate and replays packets logged by the first library by copying the logged packets in sequence into the packet buffers passed into the function. Unfortunately, this required a small modification to the AVConference binary, as the SRTPEncryptData function does not respect the length returned by CCCryptorUpdate; instead, it assumes that the length of the encrypted data is the same as the length as the plaintext data, which is reasonable when CCCryptorUpdate isn’t being hooked. Since SRTPEncryptData always uses a large fixed-size buffer for encryption, and encryption is in-place, I changed the function to retrieve the length of the encrypted buffer from the very end of the buffer, which was set in the hooked CCCryptorUpdate call. This memory is unlikely to be used for other purposes due to the typical shorter length of RTP packets. Unfortunately though, even though the same encrypted data was being replayed to the target, it wasn’t being processed correctly by the receiving device.

To understand why requires an explanation of how RTP works. An RTP packet has the following format.


It contains several fields that impact how its payload is interpreted. The SSRC is a random identifier that identifies a stream. For example, in FaceTime the audio and video streams have different SSRCs. SSRCs can also help differentiate between streams in a situation where a user could potentially have an unlimited number of streams, for example, multiple participants in a video call. RTP packets also have a payload type (PT in the diagram) which is used to differentiate different types of data in the payload. The payload type for a certain data type is consistent across calls. In FaceTime, the video stream has a single payload type for video data, but the audio stream has two payload types, likely one for audio data and the other for synchronization. The marker (M in the diagram) field of RTP is also used by FaceTime to represent when a packet is fragmented, and needs to be reassembled.

From this it is clear that simply copying logged data into the current encrypted packet won’t function correctly, because the data needs to have the correct SSRC, payload type and marker, or it won’t be interpreted correctly. This wasn’t necessary in WebRTC, because I had enough control over WebRTC that I could create a connection with a single SSRC and payload type for fuzzing purposes. But there is no way to do this in FaceTime, even muting a video call leads to silent audio packets being sent as opposed to the audio stream shutting down. So these values needed to be manually corrected.

An RTP feature called extensions made correcting these fields difficult. An extension is an optional header that can be added to an RTP packet. Extensions are not supposed to depend on the RTP payload to be interpreted, and extensions are often used to transmit network or display features. Some examples of supported extensions include the orientation extension, which tells the endpoint the orientation of the receiving device and the mute extension, which tells the endpoint whether the receiving device is muted.

Extensions mean that even if it is possible to determine the payload type, marker and SSRC of data, this is not sufficient to replay the exact packet that was sent. Moreover, FaceTime creates extensions after the packet is encrypted, so it is not possible to create the complete RTP packet by hooking CCCryptorUpdate, because extensions could be added later.

At this point, it seemed necessary to hook sendmsg as well as CCCryptorUpdate. This would allow the outgoing RTP header to be modified once it is complete. There were a few challenges in doing this. To start, audio and video packets are sent by different threads in FaceTime, and can be reordered between the time they are encrypted and the time they are sent by sendmsg. So I couldn’t assume that if sendmsg received an RTP packet that it was necessarily the last one that was encrypted. There was also the problem that SSRCs are dynamic, so replaying an RTP packet with the same SSRC it is recorded with won’t work, it needs to have the new SSRC for the audio or video stream.

Note that in MacOS Mojave, FaceTime can call sendmsg via either the AVConference binary or the IDSFoundation binary, depending on the network configuration. So to capture and replay unencrypted RTP traffic on newer systems, it is necessary  to hook CCCryptorUpdate in AConference and sendmsg in IDSFoundation (AVConference calls into IDSFoundation when it calls sendmsg). Otherwise, the process is the same as on older systems.

I ended up implementing a solution that recorded packets by recording the unencrypted payload, and then recorded its RTP header, and using a snippet of the encrypted payload to pair headers with the correct unencrypted payload. Then to replay packets, the packets encrypted in CCCryptorUpdate were replaced with the logged packets, and once the encrypted payload came through to sendmsg, the header was replaced with the logged one for that payload. Fortunately, the two streams with unique SSRCs used by FaceTime do not share any payload types, so it was possible to determine the new SSRC for each stream by waiting for an incoming packet with the correct payload type. Then in each subsequent packet, the SSRC was replaced with the correct one.

Unfortunately, this still did not replay a FaceTime call correctly, and calls often experienced decryption failures. I eventually determined that audio and video on FaceTime are encrypted with different keys, and updated the replay script to queue the CCCryptor used by CCCryptorUpdate function based on whether it was audio or video content. Then in sendmsg, the entire logged RTP packet, including the unencrypted payload, was copied into the outgoing packet, the SSRC was fixed, and then the payload encrypted with the next CCCryptor out of the appropriate queue. If a CCCryptor wasn’t available, outgoing packets were dropped until a new one was created. At this point, it was possible to stop using the modified AVConference binary, as all the packet modification was now happening in sendmsg. This implementation still had reliability problems.

Digging more deeply into how FaceTime encryption works, packets are encrypted in CTS mode, which requires a counter. The counter is initialized to a unique value for each packet that is sent. During the initialization of the RTP stream, the peers exchange two 16-byte random tokens, one for audio and one for video. The counter value for each packet is then calculated by exclusive or-ing the token with several values found in the packet, including the SSRC and the sequence number. Only one value in this calculation, the sequence number, changes between each packet. So it is possible to calculate the counter value for each packet by knowing the initial counter value and sequence number, which can be retrieved by hooking CCCryptorCreateWithMode. The sequence number is xor-ed with the random token at index 0x12 when FaceTime constructs a counter, so by xor-ing this location with the initial sequence number and then a packet’s sequence number, the counter value for that packet can be calculated. The key can also be retrieved by hooking CCCryptorCreateWithMode.This allowed me to dispense with queuing cryptors, as I now had all the information I needed to construct a cryptor for any packet. This allowed for packets to be encrypted faster and more accurately.

Sequence numbers still posed a problem though, as the initial sequence number of an RTP stream is randomly generated at the beginning of the call, and is different between subsequent calls. Also, sequence numbers are used to reconstruct video streams in order, so they need to be correct. I altered the replay tool determine the starting sequence number of each stream, and then calculate the difference between the starting sequence number of each logged stream and the sequence number of the logged packet and then add it to this value. These two changes finally made the replay tool work, though replay gets slower and slower as a stream is replayed due to dropped packets.   

Results


Using this setup, I was able to fuzz FaceTime calls and reproduce the crashes. I reported three bugs in FaceTime based on this work. All these issues have been fixed in recent updates.

CVE-2018-4366 is an out-of-bounds read in video processing that occurs on Macs only.

CVE-2018-4367 is a stack corruption vulnerability that affects iOS and Mac. There are a fair number of variables on the stack of the affected function before the stack cookie, and several fuzz crashes due to this issue caused segmentation faults as opposed to stack_chk crashes, so it is likely exploitable.

CVE-2018-4384 is a kernel heap corruption issue in video processing that affects iOS. It is likely similar to this vulnerability found by Adam Donenfeld of Zimperium.

All of these issues took less than 15 minutes of fuzzing to find on a live device. Unfortunately, this was the limit of fuzzing that could be performed on FaceTime, as it would be difficult to create a command line fuzzing tool with coverage like we did for WebRTC as it is closed source.

In Part 3, we will look at video calling in WhatsApp.

Tuesday, December 4, 2018

Adventures in Video Conferencing Part 1: The Wild World of WebRTC

Posted by Natalie Silvanovich, Project Zero

Over the past five years, video conferencing support in websites and applications has exploded. Facebook, WhatsApp, FaceTime and Signal are just a few of the many ways that users can make audio and video calls across networks. While a lot of research has been done into the cryptographic and privacy properties of video conferencing, there is limited information available about the attack surface of these platforms and their susceptibility to vulnerabilities. We reviewed the three most widely-used video conferencing implementations. In this series of blog posts, we describe what we found.

This part will discuss our analysis of WebRTC. Part 2 will cover our analysis of FaceTime. Part 3 will discuss how we fuzzed WhatsApp. Part 4 will describe some attacks against WhatsApp that didn’t work out. And finally, Part 5 will discuss the future of video conferencing and steps that  developers can take to improve the security of their implementation.

Typical Video Conferencing Architecture


All the video conferencing implementations we investigated allow at least two peers anywhere on the Internet to communicate through audiovisual streams. Implementing this capability so that it is reliable and has good audio and video quality presents several challenges. First, the peers need to be able to find each other and establish a connection regardless of NATs or other network infrastructure. Then they need to be able to communicate, even though they could be on different platforms, application versions or browsers. Finally, they need to maintain audio and video quality, even if the connection is low-bandwidth or noisy.

Almost all video conferencing solutions have converged on a single architecture. It assumes that two peers can communicate via a secure, integrity checked channel which may have low bandwidth or involve an intermediary server, and it allows them to create a faster, higher-bandwidth peer-to-peer channel.

The first stage in creating a connection is called signalling. It is the process through which the two peers exchange the information they will need to create a connection, including network addresses, supported codecs and cryptographic keys. Usually, the calling peer sends a call request including information about itself to the receiving peer, and then the receiving peer responds with similar information. SDP is a common protocol for exchanging this information, but it is not always used, and most implementations do not conform to the specification. It is common for mobile messaging apps to send this information in a specially formatted message, sent through the same channel text messages are sent. Websites that support video conferencing often use WebSockets to exchange information, or exchange it via HTTPS using the webserver as an intermediary.

Once signalling is complete, the peers find a way to route traffic to each other using the STUN, TURN and ICE protocols. Based on what these protocols determine, the peers can create UDP, UDP-over-STUN and occasionally TCP connections based of what is favorable for the network conditions.

Once the connection has been made, the peers communicate using Real-time Transport Protocol. Though this protocol is standardized, most implementations deviate somewhat from the standard. RTP can be encrypted using a protocol called Secure RTP (SRTP), and some implementations also encrypt streams using DTLS. Under the encryption envelope, RTP supports features that allow multiple streams and formats of data to be exchanged simultaneously. Then, based on how RTP classifies the data, it is passed on to other processing, such as video codecs.  Stream Control Transmission Protocol (SCTP) is also sometimes used to exchange small amounts of data (for example a text message on top of a call) during video conferencing, but it is less commonly used than RTP.

Even when it is encrypted, RTP often doesn’t include integrity protection, and if it does, it usually doesn’t discard malformed packets. Instead, it attempts to recover them using strategies such as Forward Error Correction (FEC). Most video conferencing solutions also detect when a channel is noisy or low-bandwidth and attempt to handle the situation in a way that leads to the best audio and video quality, for example, sending fewer frames or changing codecs. Real Time Control Protocol (RTCP) is used to exchange statistics on network quality and coordinate adjusting properties of RTP streams to adapt to network conditions.

WebRTC


WebRTC is an open source project that enables video conferencing. It is by far the most commonly used implementation. Chrome, Safari, Firefox, Facebook Messenger, Signal and many other mobile applications use WebRTC. WebRTC seemed like a good starting point for looking at video conferencing as it is heavily used, open source and reasonably well-documented.

WebRTC Signalling


I started by looking at WebRTC signalling, because it is an attack surface that does not require any user interaction. Protocols like RTP usually start being processed after a user has picked up the video call, but signalling is performed before the user is notified of the call. WebRTC uses SDP for signalling.

I reviewed the WebRTC SDP parser code, but did not find any bugs. I also compiled it so it would accept an SDP file on the commandline and fuzzed it, but I did not find any bugs through fuzzing either. I later discovered that WebRTC signalling is not implemented consistently across browsers anyhow. Chrome uses the main WebRTC implementation, Safari has branched slightly and Firefox uses their own implementation. Most mobile applications that use WebRTC implement their own signalling in a protocol that is not SDP as well. So it is not likely that a bug in WebRTC signalling would affect a wide variety of targets.

RTP Fuzzing


I then decided to look at how RTP is processed in WebRTC. While RTP is not an interaction-less attack surface because the user usually has to answer the call before RTP traffic is processed, picking up a call is a reasonable action to expect a user to take. I started by looking at the WebRTC source, but it is very large and complex, so I decided fuzzing would be a better approach.

The WebRTC repository contains fuzzers written for OSS-Fuzz for every protocol and codec supported by WebRTC, but they do not simulate the interactions between the various parsers, and do not maintain state between test cases, so it seemed likely that end-to-end fuzzing would provide additional coverage.

Setting up end-to-end fuzzing was fairly time intensive, so to see if it was likely to find many bugs, I altered Chrome to send malformed RTP packets. I changed the srtp_protect function in libsrtp so that it ran the following fuzzer on every packet:

void fuzz(char* buf, int len){

int q = rand()%10;

if (q == 7){
int ind = rand()%len;
buf[ind] = rand();
}

if(q == 5){
for(int i = 0; i < len; i++)
buf[i] = rand();

}
RTP fuzzer (fuzzer q)

When this version was used to make a WebRTC call to an unmodified instance of Chrome, it crashed roughly every 30 seconds.

Most of the crashes were due to divide-by-zero exceptions, which I submitted patches for, but there were three interesting crashes. I reproduced them by altering the WebRTC source in Chrome so that it would generate the packets that caused the same crashes, and then set up a standalone build of WebRTC to reproduce them, so that it was not necessary to rebuild Chrome to reproduce the issues.

The first issue, CVE-2018-6130 is an out-of-bounds memory issue related to the use of std::map find in processing VP9 (a video codec). In the following code, the value t10_pic_idx is pulled out of an RTP packet unverified (GOF stands for group of frames).

if (frame->frame_type() == kVideoFrameKey) {
   ...
   GofInfo info = gof_info_.find(codec_header.tl0_pic_idx)->second;
   FrameReceivedVp9(frame->id.picture_id, &info);
   UnwrapPictureIds(frame);
   return kHandOff;
 }


If this value does not exist in the gof_info_ array, std::map::find returns the end value of the map, which points to one element past the allocated values for the map. Depending on memory layout, dereferencing this iterator will either crash or return the contents of unallocated memory.

The second issue, CVE-2018-6129 is a more typical out-of-bounds read issue, where the index of a field is read out of an RTP packet, and not verified before it is used to index a vector.

The third issue, CVE-2018-6157 is a type confusion issue that occurs when a packet that looks like a VP8 packet is sent to the H264 parser. The packet will eventually be treated like an H264 packet even though it hasn’t gone through the necessary checks for H264. The impact of this issue is also limited to reading out of bounds.
There are a lot of limitations to the approach of fuzzing in a browser. It is very slow, the issues are difficult to reproduce, and it is difficult to fuzz a variety of test cases, because each call needs to be started manually, and certain properties, such as the default codec, can’t change in the middle of the call. After I reported these issues, the WebRTC team suggested that I use the video_replay tool, which can be used to replay RTP streams recorded in a patched browser. The tool was not able to reproduce a lot of my issues because they used non-default WebRTC settings configured through signalling, so I added the ability to load a configuration file alongside the RTP dump to this tool. This made it possible to quickly reproduce vulnerabilities in WebRTC.

This tool also had the benefit of enabling much faster fuzzing, as it was possible to fuzz RTP by fuzzing the RTP dump file and loading it into video_replay. There were some false positives, as it was also possible that fuzzing caused bugs in parsing the RTP dump file format, but most of the bugs were actually in RTP processing.
Fuzzing with the video_replay tool with code coverage and ASAN enabled led to four more bugs. We ran the fuzzer on 50 cores for about two weeks to find these issues.

CVE-2018-6156 is probably the most exploitable bug uncovered. It is a large overflow in FEC. The buffer WebRTC uses to process FEC packets is 1500 bytes, but it does no size checking of these packets once they are extracted from RTP. Practically, they can be up to about 2000 bytes long.

CVE-2018-6155 is a use-after-free in a video codec called VP8. It is interesting because it affects the VP8 library, libvpx as opposed to code in WebRTC, so it has the potential to affect software that uses this library other than WebRTC. A generic fix for libvpx was released as a result of this bug.

CVE-2018-16071 is a use-after-free in VP9 processing that is somewhat similar to CVE-2018-6130. Once again, an untrusted index is pulled out of a packet, but this time it is used as the upper bounds of a vector erase operation, so it is possible to delete all the elements of the vector before it is used.

CVE-2018-16083 is an out-of-bounds read in FEC that occurs due to a lack of bounds checking.

Overall, end-to-end fuzzing found a lot of bugs in WebRTC, and a few were fairly serious. They have all now been fixed. This shows that end-to-end fuzzing is an effective approach for finding vulnerabilities in this type of video conferencing solution. In Part 2, we will try a similar approach on FaceTime. Stay tuned!

Friday, November 30, 2018

Injecting Code into Windows Protected Processes using COM - Part 2

Posted by James Forshaw, Project Zero

In my previous blog I discussed a technique which combined numerous issues I’ve previously reported to Microsoft to inject arbitrary code into a PPL-WindowsTCB process. The techniques presented don’t work for exploiting the older, stronger Protected Processes (PP) for a few different reasons. This blog seeks to remedy this omission and provide details of how I was able to also hijack a full PP-WindowsTCB process without requiring administrator privileges. This is mainly an academic exercise, to see whether I can get code executing in a full PP as there’s not much more you can do inside a PP over a PPL.

As a quick recap of the previous attack, I was able to identify a process which would run as PPL which also exposed a COM service. Specifically, this was the “.NET Runtime Optimization Service” which ships with the .NET framework and uses PPL at CodeGen level to apply cached signing levels to Ahead-of-Time compiled DLLs to allow them to be used with User-Mode Code Integrity (UMCI). By modifying the COM proxy configuration it was possible to induce a type confusion which allowed me to load an arbitrary DLL by hijacking the KnownDlls configuration. Once running code inside the PPL I could abuse a bug in the cached signing feature to create a DLL signed to load into any PPL and through that escalate to PPL-WindowsTCB level.

Finding a New Target

My first thought to exploit full PP would be to use the additional access we were granted from having code running at PPL-WindowsTCB. You might assume you could abuse the cached signed DLL to bypass security checks to load into a full PP. Unfortunately the kernel’s Code Integrity module ignores cached signing levels for full PP. How about KnownDlls in general? If we have administrator privileges and code running in PPL-WindowsTCB we can directly write to the KnownDlls object directory (see another of my blog posts link for why you need to be PPL) and try to get the PP to load an arbitrary DLL. Unfortunately, as I mentioned in the previous blog, this also doesn’t work as full PP ignores KnownDlls. Even if it did load KnownDlls I don’t want to require administrator privileges to inject code into the process.

I decided that it’d make sense to rerun my PowerShell script from the previous blog to discover which executables will run as full PP and at what level. On Windows 10 1803 there’s a significant number of executables which run as PP-Authenticode level, however only four executables would start with a more privileged level as shown in the following table.

Path
Signing Level
C:\windows\system32\GenValObj.exe
Windows
C:\windows\system32\sppsvc.exe
Windows
C:\windows\system32\WerFaultSecure.exe
WindowsTCB
C:\windows\system32\SgrmBroker.exe
WindowsTCB

As I have no known route from PP-Windows level to PP-WindowsTCB level like I had with PPL, only two of the four executables are of interest, WerFaultSecure.exe and SgrmBroker.exe. I correlated these two executables against known COM service registrations, which turned up no results. That doesn’t mean these executables don’t expose a COM attack surface, the .NET executable I abused last time also doesn’t register its COM service, so I also performed some basic reverse engineering looking for COM usage.

The SgrmBroker executable doesn’t do very much at all, it’s a wrapper around an isolated user mode application to implement runtime attestation of the system as part of Windows Defender System Guard and didn’t call into any COM APIs. WerFaultSecure also doesn’t seem to call into COM, however I already knew that WerFaultSecure can load COM objects, as Alex Ionescu used my original COM scriptlet code execution attack to get PPL-WindowsTCB level though hijacking a COM object load in WerFaultSecure. Even though WerFaultSecure didn’t expose a service if it could initialize COM perhaps there was something that I could abuse to get arbitrary code execution? To understand the attack surface of COM we need to understand how COM implements out-of-process COM servers and COM remoting in general.

Digging into COM Remoting Internals

Communication between a COM client and a COM server is over the MSRPC protocol, which is based on the Open Group’s DCE/RPC protocol. For local communication the transport used is Advanced Local Procedure Call (ALPC) ports. At a high level communication occurs between a client and server based on the following diagram:


In order for a client to find the location of a server the process registers an ALPC endpoint with the DCOM activator in RPCSS ①. This endpoint is registered alongside the Object Exporter ID (OXID) of the server, which is a 64 bit randomly generated number assigned by RPCSS. When a client wants to connect to a server it must first ask RPCSS to resolve the server’s OXID value to an RPC endpoint ②. With the knowledge of the ALPC RPC endpoint the client can connect to the server and call methods on the COM object ③.

The OXID value is discovered either from an out-of-process (OOP) COM activation result or via a marshaled Object Reference (OBJREF) structure. Under the hood the client calls the ResolveOxid method on RPCSS’s IObjectExporter RPC interface. The prototype of ResolveOxid is as follows:

interface IObjectExporter {
  // ...
  error_status_t ResolveOxid(
    [in] handle_t hRpc,
    [in] OXID* pOxid,
    [in] unsigned short cRequestedProtseqs,
    [in] unsigned short arRequestedProtseqs[],
    [out, ref] DUALSTRINGARRAY** ppdsaOxidBindings,
    [out, ref] IPID* pipidRemUnknown,
    [out, ref] DWORD* pAuthnHint
);

In the prototype we can see the OXID to resolve is being passed in the pOxid parameter and the server returns an array of Dual String Bindings which represent RPC endpoints to connect to for this OXID value. The server also returns two other pieces of information, an Authentication Level Hint (pAuthnHint) which we can safely ignore and the IPID of the IRemUnknown interface (pipidRemUnknown) which we can’t.

An IPID is a GUID value called the Interface Process ID. This represents the unique identifier for a COM interface inside the server, and it’s needed to communicate with the correct COM object as it allows the single RPC endpoint to multiplex multiple interfaces over one connection. The IRemUnknown interface is a default COM interface every COM server must implement as it’s used to query for new IPIDs on an existing object (using RemQueryInterface) and maintain the remote object’s reference count (through RemAddRef and RemRelease methods). As this interface must always exist regardless of whether an actual COM server is exported and the IPID can be discovered through resolving the server’s OXID, I wondered what other methods the interface supported in case there was anything I could leverage to get code execution.

The COM runtime code maintains a database of all IPIDs as it needs to lookup the server object when it receives a request for calling a method. If we know the structure of this database we could discover where the IRemUnknown interface is implemented, parse its methods and find out what other features it supports. Fortunately I’ve done the work of reverse engineering the database format in my OleViewDotNet tool, specifically the command Get-ComProcess in the PowerShell module. If we run the command against a process which uses COM, but doesn’t actually implement a COM server (such as notepad) we can try and identify the correct IPID.


In this example screenshot there’s actually two IPIDs exported, IRundown and a Windows.Foundation interface. The Windows.Foundation interface we can safely ignore, but IRundown looks more interesting. In fact if you perform the same check on any COM process you’ll discover they also have IRundown interfaces exported. Are we not expecting an IRemUnknown interface though? If we pass the ResolveMethodNames and ParseStubMethods parameters to Get-ComProcess, the command will try and parse method parameters for the interface and lookup names based on public symbols. With the parsed interface data we can pass the IPID object to the Format-ComProxy command to get a basic text representation of the IRundown interface. After cleanup the IRundown interface looks like the following:

[uuid("00000134-0000-0000-c000-000000000046")]
interface IRundown : IUnknown {
   HRESULT RemQueryInterface(...);
   HRESULT RemAddRef(...);
   HRESULT RemRelease(...);
   HRESULT RemQueryInterface2(...);
   HRESULT RemChangeRef(...);
   HRESULT DoCallback([in] struct XAptCallback* pCallbackData);
   HRESULT DoNonreentrantCallback([in] struct XAptCallback* pCallbackData);
   HRESULT AcknowledgeMarshalingSets(...);
   HRESULT GetInterfaceNameFromIPID(...);
   HRESULT RundownOid(...);
}

This interface is a superset of IRemUnknown, it implements the methods such as RemQueryInterface and then adds some more additional methods for good measure. What really interested me was the DoCallback and DoNonreentrantCallback methods, they sound like they might execute a “callback” of some sort. Perhaps we can abuse these methods? Let’s look at the implementation of DoCallback based on a bit of RE (DoNonreentrantCallback just delegates to DoCallback internally so we don’t need to treat it specially):

struct XAptCallback {
 void* pfnCallback;
 void* pParam;
 void* pServerCtx;
 void* pUnk;
 void* iid;
 int   iMethod;
 GUID  guidProcessSecret;
};

HRESULT CRemoteUnknown::DoCallback(XAptCallback *pCallbackData) {
 CProcessSecret::GetProcessSecret(&pguidProcessSecret);
 if (!memcmp(&pguidProcessSecret,
             &pCallbackData->guidProcessSecret, sizeof(GUID))) {
   if (pCallbackData->pServerCtx == GetCurrentContext()) {
     return pCallbackData->pfnCallback(pCallbackData->pParam);
   } else {
     return SwitchForCallback(
                  pCallbackData->pServerCtx,
                  pCallbackData->pfnCallback,
                  pCallbackData->pParam);
   }
 }
 return E_INVALIDARG;
}

This method is very interesting, it takes a structure containing a pointer to a method to call and an arbitrary parameter and executes the pointer. The only restrictions on calling the arbitrary method is you must know ahead of time a randomly generated GUID value, the process secret, and the address of a server context. The checking of a per-process random value is a common security pattern in COM APIs and is typically used to restrict functionality to only in-process callers. I abused something similar in the Free-Threaded Marshaler way back in 2014.

What is the purpose of DoCallback? The COM runtime creates a new IRundown interface for every COM apartment that’s initialized. This is actually important as calling methods between apartments, say calling a STA object from a MTA, you need to call the appropriate IRemUnknown methods in the correct apartment. Therefore while the developers were there they added a few more methods which would be useful for calling between apartments, including a general “call anything you like” method. This is used by the internals of the COM runtime and is exposed indirectly through methods such as CoCreateObjectInContext. To prevent the DoCallback method being abused OOP the per-process secret is checked which should limit it to only in-process callers, unless an external process can read the secret from memory.

Abusing DoCallback

We have a primitive to execute arbitrary code within any process which has initialized COM by invoking the DoCallback method, which should include a PP. In order to successfully call arbitrary code we need to know four pieces of information:

  1. The ALPC port that the COM process is listening on.
  2. The IPID of the IRundown interface.
  3. The initialized process secret value.
  4. The address of a valid context, ideally the same value that GetCurrentContext returns to call on the same RPC thread.

Getting the ALPC port and the IPID is easy, if the process exposes a COM server, as both will be provided during OXID resolving. Unfortunately WerFaultSecure doesn’t expose a COM object we can create so that angle wouldn’t be open to us, leaving us with a problem we need to solve. Extracting the process secret and context value requires reading the contents of process memory. This is another problem, one of the intentional security features of PP is preventing a non-PP process from reading memory from a PP process. How are we going to solve these two problems?

Talking this through with Alex at Recon we came up with a possible attack if you have administrator access. Even being an administrator doesn’t allow you to read memory directly from a PP process. We could have loaded a driver, but that would break PP entirely, so we considered how to do it without needing kernel code execution.

First and easiest, the ALPC port and IPID can be extracted from RPCSS. The RPCSS service does not run protected (even PPL) so this is possible to do without any clever tricks other than knowing where the values are stored in memory. For the context pointer, we should be able to brute force the location as there’s likely to be only a narrow range of memory locations to test, made slightly easier if we use the 32 bit version of WerFaultSecure.

Extracting the secret is somewhat harder. The secret is initialized in writable memory and therefore ends up in the process’ working set once it’s modified. As the page isn’t locked it will be eligible for paging if the memory conditions are right. Therefore if we could force the page containing the secret to be paged to disk we could read it even though it came from a PP process. As an administrator, we can perform the following to steal the secret:

  1. Ensure the secret is initialized and the page is modified.
  2. Force the process to trim its working set, this should ensure the modified page containing the secret ends up paged to disk (eventually).
  3. Create a kernel memory crash dump file using the NtSystemDebugControl system call. The crash dump can be created by an administrator without kernel debugging being enabled and will contain all live memory in the kernel. Note this doesn’t actually crash the system.
  4. Parse the crash dump for the Page Table Entry of the page containing the secret value. The PTE should disclose where in the paging file on disk the paged data is located.
  5. Open the volume containing the paging file for read access, parse the NTFS structures to find the paging file and then find the paged data and extract the secret.

After coming up with this attack it seemed far too much like hard work and needed administrator privileges which I wanted to avoid. I needed to come up with an alternative solution.

Using WerFaultSecure for its Original Purpose

Up to this point I’ve been discussing WerFaultSecure as a process that can be abused to get arbitrary code running inside a PP/PPL. I’ve not really described why the process can run at the maximum PP/PPL levels. WerFaultSecure is used by the Windows Error Reporting service to create crash dumps from protected processes. Therefore it needs to run at elevated PP levels to ensure it can dump any possible user-mode PP. Why can we not just get WerFaultSecure to create a crash dump of itself, which would leak the contents of process memory and allow us to extract any information we require?

The reason we can’t use WerFaultSecure is it encrypts the contents of the crash dump before writing it to disk. The encryption is done in a way to only allow Microsoft to decrypt the crash dump, using asymmetric encryption to protect a random session key which can be provided to the Microsoft WER web service. Outside of a weakness in Microsoft’s implementation or a new cryptographic attack against the primitives being used getting the encrypted data seems like a non-starter.

However, it wasn’t always this way. In 2014 Alex presented at NoSuchCon about PPL and discussed a bug he’d discovered in how WerFaultSecure created encrypted dump files. It used a two step process, first it wrote out the crash dump unencrypted, then it encrypted the crash dump. Perhaps you can spot the flaw? It was possible to steal the unencrypted crash dump. Due to the way WerFaultSecure was called it accepted two file handles, one for the unencrypted dump and one for the encrypted dump. By calling WerFaultSecure directly the unencrypted dump would never be deleted which means that you don’t even need to race the encryption process.

There’s one problem with this, it was fixed in 2015 in MS15-006. After that fix WerFaultSecure encrypted the crash dump directly, it never ends up on disk unencrypted at any point. But that got me thinking, while they might have fixed the bug going forward what prevents us from taking the old vulnerable version of WerFaultSecure from Windows 8.1 and executing it on Windows 10? I downloaded the ISO for Windows 8.1 from Microsoft’s website (link), extracted the binary and tested it, with predictable results:


We can take the vulnerable version of WerFaultSecure from Windows 8.1 and it will run quite happily on Windows 10 at PP-WindowsTCB level. Why? It’s unclear, but due to the way PP is secured all the trust is based on the signed executable. As the signature of the executable is still valid the OS just trusts it can be run at the requested protection level. Presumably there must be some way that Microsoft can block specific executables, although at least they can’t just revoke their own signing certificates. Perhaps OS binaries should have an EKU in the certificate which indicates what version they’re designed to run on? After all Microsoft already added a new EKU when moving from Windows 8 to 8.1 to block downgrade attacks to bypass WinRT UMCI signing so generalizing might make some sense, especially for certain PP levels.

After a little bit of RE and reference to Alex’s presentation I was able to work out the various parameters I needed to be passed to the WerFaultSecure process to perform a dump of a PP:

Parameter
Description
/h
Enable secure dump mode.
/pid {pid}
Specify the Process ID to dump.
/tid {tid}
Specify the Thread ID in the process to dump.
/file {handle}
Specify a handle to a writable file for the unencrypted crash dump
/encfile {handle}
Specify a handle to a writable file for the encrypted crash dump
/cancel {handle}
Specify a handle to an event to indicate the dump should be cancelled
/type {flags}
Specify MIMDUMPTYPE flags for call to MiniDumpWriteDump

This gives us everything we need to complete the exploit. We don’t need administrator privileges to start the old version of WerFaultSecure as PP-WindowsTCB. We can get it to dump another copy of WerFaultSecure with COM initialized and use the crash dump to extract all the information we need including the ALPC Port and IPID needed to communicate. We don’t need to write our own crash dump parser as the Debug Engine API which comes installed with Windows can be used. Once we’ve extracted all the information we need we can call DoCallback and invoke arbitrary code.

Putting it All Together

There’s still two things we need to complete the exploit, how to get WerFaultSecure to start up COM and what we can call to get completely arbitrary code running inside the PP-WindowsTCB process.

Let’s tackle the first part, how to get COM started. As I mentioned earlier, WerFaultSecure doesn’t directly call any COM methods, but Alex had clearly used it before so to save time I just asked him. The trick was to get WerFaultSecure to dump an AppContainer process, this results in a call to the method CCrashReport::ExemptFromPlmHandling inside the FaultRep DLL resulting in the loading of CLSID {07FC2B94-5285-417E-8AC3-C2CE5240B0FA}, which resolves to an undocumented COM object. All that matters is this allows WerFaultSecure to initialize COM.

Unfortunately I’ve not been entirely truthful during my description of how COM remoting is setup. Just loading a COM object is not always sufficient to initialize the IRundown interface or the RPC endpoint. This makes sense, if all COM calls are to code within the same apartment then why bother to initialize the entire remoting code for COM. In this case even though we can make WerFaultSecure load a COM object it doesn’t meet the conditions to setup remoting. What can we do to convince the COM runtime that we’d really like it to initialize? One possibility is to change the COM registration from an in-process class to an OOP class. As shown in the screenshot below the COM registration is being queried first from HKEY_CURRENT_USER which means we can hijack it without needing administrator privileges.


Unfortunately looking at the code this won’t work, a cut down version is shown below:

HRESULT CCrashReport::ExemptFromPlmHandling(DWORD dwProcessId) {
 CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
 IOSTaskCompletion* inf;
 HRESULT hr = CoCreateInstance(CLSID_OSTaskCompletion,
     NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&inf));
 if (SUCCEEDED(hr)) {
   // Open process and disable PLM handling.
 }
}

The code passes the flag, CLSCTX_INPROC_SERVER to CoCreateInstance. This flag limits the lookup code in the COM runtime to only look for in-process class registrations. Even if we replace the registration with one for an OOP class the COM runtime would just ignore it. Fortunately there’s another way, the code is initializing the current thread’s COM apartment as a STA using the COINIT_APARTMENTTHREADED flag with CoInitializeEx. Looking at the registration of the COM object its threading model is set to “Both”. What this means in practice is the object supports being called directly from either a STA or a MTA.

However, if the threading model was instead set to “Free” then the object only supports direct calls from an MTA, which means the COM runtime will have to enable remoting, create the object in an MTA (using something similar to DoCallback) then marshal calls to that object from the original apartment. Once COM starts remoting it initializes all remote features including IRundown. As we can hijack the server registration we can just change the threading model, this will cause WerFaultSecure to start COM remoting which we can now exploit.

What about the second part, what can we call inside the process to execute arbitrary code? Anything we call using DoCallback must meet the following criteria, to avoid undefined behavior:

  1. Only takes one pointer sized parameter.
  2. Only the lower 32 bits of the call are returned as the HRESULT if we need it.
  3. The callsite is guarded by CFG so it must be something which is a valid indirect call target.

As WerFaultSecure isn’t doing anything special then at a minimum any DLL exported function should be a valid indirect call target. LoadLibrary clearly meets our criteria as it takes a single parameter which is a pointer to the DLL path and we don’t really care about the return value so the truncation isn’t important. We can’t just load any DLL as it must be correctly signed, but what about hijacking KnownDlls?

Wait, didn’t I say that PP can’t load from KnownDlls? Yes they can’t but only because the value of the LdrpKnownDllDirectoryHandle global variable is always set to NULL during process initialization. When the DLL loader checks for the presence of a known DLL if the handle is NULL the check returns immediately. However if the handle has a value it will do the normal check and just like in PPL no additional security checks are performed if the process maps an image from an existing section object. Therefore if we can modify the LdrpKnownDllDirectoryHandle global variable to point to a directory object inherited into the PP we can get it to load an arbitrary DLL.

The final piece of the puzzle is finding an exported function which we can call to write an arbitrary value into the global variable. This turns out to be harder than expected. The ideal function would be one which takes a single pointer value argument and writes to that location with no other side effects. After a number of false starts (including trying to use gets) I settled on the pair, SetProcessDefaultLayout and GetProcessDefaultLayout in USER32. The set function takes a single value which is a set of flags and stores it in a global location (actually in the kernel, but good enough). The get method will then write that value to an arbitrary pointer. This isn’t perfect as the values we can set and therefore write are limited to the numbers 0-7, however by offsetting the pointer in the get calls we can write a value of the form 0x0?0?0?0? where the ? can be any value between 0 and 7. As the value just has to refer to the handle inside a process under our control we can easily craft the handle to meet these strict requirements.

Wrapping Up

In conclusion to get arbitrary code execution inside a PP-WindowsTCB without administrator privileges process we can do the following:

  1. Create a fake KnownDlls directory, duplicating the handle until it meets a pattern suitable for writing through Get/SetProcessDefaultLayout. Mark the handle as inheritable.
  2. Create the COM object hijack for CLSID {07FC2B94-5285-417E-8AC3-C2CE5240B0FA} with the ThreadingModel set to “Free”.
  3. Start Windows 10 WerFaultSecure at PP-WindowsTCB level and request a crash dump from an AppContainer process. During process creation the fake KnownDlls must be added to ensure it’s inherited into the new process.
  4. Wait until COM has initialized then use Windows 8.1 WerFaultSecure to dump the process memory of the target.
  5. Parse the crash dump to discover the process secret, context pointer and IPID for IRundown.
  6. Connect to the IRundown interface and use DoCallback with Get/SetProcessDefaultLayout to modify the LdrpKnownDllDirectoryHandle global variable to the handle value created in 1.
  7. Call DoCallback again to call LoadLibrary with a name to load from our fake KnownDlls.

This process works on all supported versions of Windows 10 including 1809. It’s worth noting that invoking DoCallback can be used with any process where you can read the contents of memory and the process has initialized COM remoting. For example, if you had an arbitrary memory disclosure vulnerability in a privileged COM service you could use this attack to convert the arbitrary read into arbitrary execute. As I don’t tend to look for memory corruption/memory disclosure vulnerabilities perhaps this behavior is of more use to others.

That concludes my series of attacking Windows protected processes. I think it demonstrates that preventing a user from attacking processes which share resources, such as registry and files is ultimately doomed to fail. This is probably why Microsoft do not support PP/PPL as a security boundary. Isolated User Mode seems a much stronger primitive, although that does come with additional resource requirements which PP/PPL doesn’t for the most part.  I wouldn’t be surprised if newer versions of Windows 10, by which I mean after version 1809, will try to mitigate these attacks in some way, but you’ll almost certainly be able to find a bypass.