Tuesday, September 23, 2014

Exploiting CVE-2014-0556 in Flash

Posted by Chris Evans, Kidnapper of RIP

A couple of weeks ago, Adobe released security bulletin APSB14-21, including 8 fixes for bugs reported by Project Zero. Full details of these bugs are now public in our bug tracker. Some of the more interesting ones are a double free in the RTMP protocol, or an integer overflow concatenating strings. Again, we’d like to thank Adobe for a response time well ahead of our standard 90-day disclosure deadline.

Before we get started, though, it’s worth briefly noting why there is so much value in writing an exploit. Finding and eliminating bugs obviously improves software correctness, but writing exploits is always a significant learning opportunity. Throughout the history of the security industry, there’s a long track record of offense driving defense, leading to technologies such as stack canaries, NX support in processors and ASLR.

Project Zero is not just a bug hunting initiative. We’re doing our part to continue the tradition of practical and public research of exploitation techniques -- and deriving defenses from them. For example, our glibc defensive patch was accepted as a follow-on from our glibc exploit.

The case of this particular exploit starts with some irony on account of my overly hasty initial triage of the bug based on instincts which were later proved wrong by a more in-depth analysis of exploitation opportunities. In the bug history, you can see the claim “almost certainly 64-bit only” (wrong!) and then “does not work in Chrome 64-bit Linux”. We learned not to declare anything as unexploitable in our previous post about exploiting a subtle condition in glibc. Therefore, I had to declare shenanigans on myself and tackle the challenge: exploit this bug on Chrome 64-bit Linux.

The bug
The bug is triggered by calling BitmapData.copyPixelsToByteArray() with a reference to a ByteArray that has its position property set very large -- close to 2^32. This results in an integer overflow in 32-bit arithmetic. This occurs even on 64-bit because the relevant positions and length variables are (quite reasonably) stored in 32-bit variables. The code then believes that it can copy the pixels, starting to write them at position, and stay within the bounds of the buffer. Instead, a buffer overflow occurs. On 32-bit, the out-of-bounds write will be written before the start of the buffer because the pointer will wrap. On 64-bit, things are not as kind to the attacker. On a typical 64-bit Linux process setup with a 1MB buffer, the situation will look like this:

… | buffer: 1MB | heap, libs, binary |                 !!

The out-of-bounds write (in red) is at approximately buffer + 4GB. This will not wrap around the massive 64-bit address space, leading to a write way off in unmapped space. Insta-crash. The most obvious way to avoid the crash is to make the buffer massive, almost 4GB, leading to this situation:

… | buffer: 4GB                                      | !! heap, libs, binary |

This is readily exploitable. However, 64-bit Chrome on Linux has a defensive measure where the amount of mapped address space is limited to 4GB. So the large buffer allocation will fail and prevent that particular attack.

The heap groom
We’re going to need a trick to exploit this without slamming into the 4GB address space limit. The breakthrough -- that did not occur to me before attempting to develop an exploit -- comes when we realize that we don’t need to have the address space contiguously mapped. The out-of-bounds write will happily still go ahead even if it “jumps over” a hole in the address space. By having a hole in the address space, perhaps we can usefully trigger the corruption with less than 4GB mapped.

But how do we put this hole where we want it? Looking at how the Flash allocator works using the strace system tool, we see that very large allocations are serviced using unhinted mmap(). The Linux standard algorithm for servicing unhinted mmap() calls is to stack them adjacent and downwards in address space, as long as there isn’t a hole that can satisfy the request. So let’s see what happens when we allocate two 1GB chunks:

… | buffer2: 1GB | buffer1: 1GB | heap, libs, binary |

And the free the first one (a direct munmap() call is seen):

… | buffer2: 1GB |   1GB hole   | heap, libs, binary |

And then allocate a 2GB buffer (too big to fit in the hole):

… | buffer3: 2GB        | buffer2: 1GB |   1GB hole   | !! heap, libs, binary |

Aha! We’ve managed to engineer a situation where we’ve never had more than 4GB of address space mapped at any given moment, and at the end, a corruption at buffer3 + 4GB will land right in a writable region: the heap.

The corruption target
Now that we have a reasonably controlled memory corruption situation, we need to pick something to corrupt. As is pretty standard in modern heap buffer overflow exploitation in a scripting environment, we’re going to try and clobber a length of an array-like object. If we clobber any such length to be larger, we will then be able to read and write arbitrary relative heap memory. Once we’ve achieved such a powerful primitive, it’s essentially game over. Successful exploitation is pretty much assured: defeat ASLR by reading the value of a vtable and then write a new vtable that causes execution redirection to a sequence of opcodes that we choose.

We decide to corrupt a Vector.<uint> buffer object. This is a fairly standard, documented technique. I recommend Haifei Li’s excellent paper as background reading. Corrupting this buffer object is an obvious target because of three properties it possesses:

  • The attacker can choose arbitrary sizes for these objects, meaning there is a lot of control over where in the heap they are placed relative to the pending heap corruption.
  • The object starts with a length field, and corrupting it results in arbitrary heap relative read/write being exposed to script.
  • The object is resilient to corruption in general. Aside from the length field, there is just a single pointer and trashing this pointer does not affect the ability to use the Vector, or otherwise cause noticeable stability issues during the course of exploitation. (We could even restore its value post-exploitation if we wished.)

To proceed, we simply create many (32) Vector.<uint> objects, all with buffers sized at about 2MB. These typically end up being stacked downwards at the top of the 1GB hole. In reality, the 1GB and 2GB allocations end up being a little larger than expected under the covers. This means that the corruption address of buffer3 + 4GB actually ends up corrupting objects within the 1GB hole instead of after it. This is ideal because we can make sure that only our large buffers are corrupted. In terms of the actual data to write, we just use the default values in an empty BitmapData, which are 0xffffffff (white pixels with a full alpha channel). 0xffffffff is a plenty large enough length to proceed with the exploit!

Proceeding onwards
There is nothing particularly exciting or unique about how the exploit proceeds to demonstrate code execution, so we’ll skip the lengthy explanation here. I’ve made an attempt to fully comment the exploit source code, so if you want to continue to follow along I recommend you read the materials attached to the public bug.

The only part I’d flag as mildly interesting -- because it differs from the previously quoted paper -- is how we get known data at a known heap address. We do it with a Vector.<uint> object again. Each of these is in fact a pair of objects: a script object, which is a fixed sized and contains metadata; and the buffer object which contains the arbitrary data prefixed by the length. The script object forms a distinct pattern in memory and also contains a pointer to the buffer object. By locating any Vector.<uint> script object, we can then use a raw memory edit to change a property of the object. This property change will be visible to ActionScript so we then know which handle corresponds to a buffer at which raw address.

Conclusions, and turning what we’ve learned into generic defenses
Various technologies would have changed the exploitation landscape here, and can now be investigated in more detail:

  • Randomized placement of large memory chunks. Non-deterministic placement of large allocations would have broken the heap grooming aspect of the exploit.
  • Isolation of Vector.<uint> buffers. As we’ve seen, corruption of these buffers is an extremely dangerous condition. Some of the most recent advances in memory corruption defenses have been “isolated” or “partitioned” heaps. These technologies seem applicable here. (They would need to be applied not just to the Vector buffers, but to the general case: partitioning off read/write objects where the attacker controls both the size and the content.)

Given the open-source nature of the ActionScript engine, and the open-source nature of some potentially helpful technologies, a prototype of a generic defense is now on the Project Zero TODO list!


  1. burp,

    imo it would be more reliable to fill the 1GB hole with two Vector.<uint> objects, say v1, v2 and one Vector.<Object>, say v3, each with a size of about 1/3GB. Then you can do the following:

    1. You can corrupt the size field of the first (v1) and set it to 0xffffffff.

    2. You can now read the elements of v3. Before the actual data in v3 begins, the metadata of the vector holds the *absolute* address where v3's data begins, say x. The base address of v3 is x & 0xfffff000 and once you corrupt its size field you can achieve memory r/w from/to absolute memory addresses.

    3. Read v1[v1_size + 1] which will give you the address of v2's GC object.

    You can now:

    1. Parse the GC heap and locate 100% reliably any object you like and write any of its fields.

    2. You can automatically build ROP chains (hint hint longjmp is a neat gadget).

    3. $$.$$$$


    1. Thanks, I'll have a look a Vector.Object again. I didn't see the pointer you describe but I will re-check.

      Any links to the longjmp trick? I did consider it but it does not restore "caller saved" registers like RDI, RSI.

    2. Also, I think the data in Vector.Object is not raw. Isn't it tagged pointers? If so, the read/write data won't be arbitrary?

    3. > Thanks, I'll have a look a Vector.Object again. I didn't see the pointer you describe but I will re-check.

      Sorry but I only have a 32bit installation here to play with, I can check it later at home for 64bits if you're interested. I have neither reversed the relevant code in Flash Player nor checked the corresponding source code, I just noticed the following by pure observation.

      0:007> dd 0x0a000000 L?2c
      0a000000 01010400 00000018 0aa24000 03638530
      0a000010 0a001000 09ff1e00 09fff000 00000000
      0a000020 00000000 00000000 01000000 0a000040
      0a000030 00000000 00000000 00000000 00000000
      0a000040 6e490170 00000001 0980e748 09841718
      0a000050 00000000 00000000 6e490170 00000001
      0a000060 0980e748 09841718 00000000 00000000
      0a000070 6e490170 00000001 0980e748 09841718
      0a000080 00000000 00000000 6e490170 00000001
      0a000090 0980e748 09841718 00000000 00000000
      0a0000a0 6e490170 ...

      The first few dwords look like a GC heap block header, so, these are "managed" allocations. Notice the dword at 0x0a00002c (0x0a000040), it points to the first vtable in the vector (ScriptObject maybe?).

      > Also, I think the data in Vector.Object is not raw. Isn't it tagged pointers? If so, the read/write data won't be arbitrary?

      Well... you're right. The fact that v3 is Vector.<Object> will indeed cause some problems, but once you know the base address of v3 you can easily calculate v1's address too (assuming an ordered placement in the heap of course). Then you can r/w any memory address x using the following formula:

      v1[(x - v1_base) / 4] = 0xdeadbeef; for x >= v1_base
      ...or sth similar for x < v1_base.

      It gets really easy because v1 holds uints.

      > Any links to the longjmp trick? I did consider it but it does not restore "caller saved" registers like RDI, RSI.

      The interesting thing about longjmp is that you can very easily do a stack pivot since rsp/esp is restored by a user controlled location. On Windows 64bit Flash Player, rsp is restored by [rcx+x] which will most of the times point somewhere in an object you control. On 32bits, to do a stack pivot you can point EIP to a "push ecx; call [ecx+x];" gadget and make "[ecx+x]" point to longjmp. Both are more or less equivalent to calling "longjmp(this);" or sth similar... it has worked great for me so far.


  2. This comment has been removed by the author.