Friday, January 2, 2015

Finding and exploiting ntpd vulnerabilities

Posted by Stephen Röttger, Time Lord


[Foreword by Chris Evans: this post by Stephen represents the first Project Zero guest blog post. From time to time, we’ll be featuring guest blog posts for top-tier security research. In this instance, we’ve been impressed by the remotely exploitable nature of these vulnerabilities, as well as the clever chain of bugs and quirks that eventually leads to remote code execution. You’ve probably seen the recent ntpd vulnerability disclosures and this blog post tells the story from one of the researchers who discovered the issues. Over to Stephen…]

A few months ago I decided to get started on fuzzing. I chose the reference implementation of the Network Time Protocol (NTP), ntpd, as my first target, since I have some background with NTP and the protocol seemed simple enough to be a good learning experience. Also, ntpd is available for many platforms and widely in use, including being part of the default OS X installation.
While looking at the source to get a better understanding of the protocol I noticed that its processing is far more complex than I expected. Besides the time synchronization packets, ntpd supports symmetric and asymmetric (Autokey) authentication and so called private and control mode packets that let you query the daemon for stats or perform configuration changes (if I’m not mistaken, this is the protocol spoken by ntpdc and ntpq respectively). I quickly stumbled over a bug in the code processing Autokey protocol messages and decided to dig deeper and perform a manual code review of the other parts as well. This resulted in finding CVE-2014-9295 and writing my first ever OS X exploit for which I will present a write up today.

tl;dr: a global buffer overflow can be triggered on common configurations by an attacker on the local network through an IPv6 packet with a spoofed ::1 source. If your ntpd is not patched yet, add nomodify or noquery to every restrict line in your config, even the ones for localhost.

But enough of that, let's jump into the details.

The Bug
The most severe bug that turned out to be exploitable on OS X Mavericks is a buffer overflow in the code which handles control packets. Control mode responses are fragmented if they exceed the size of the buffer used to store them, as implemented in the following function:

static void
ctl_putdata(
const char *dp,
unsigned int dlen,
int bin /* set to 1 when data is binary */
)
{
//[...]
 
/*
* Save room for trailing junk
*/
if (dlen + overhead + datapt > dataend) {
/*
* Not enough room in this one, flush it out.
*/
ctl_flushpkt(CTL_MORE);
}
memmove((char *)datapt, dp, (unsigned)dlen);
datapt += dlen;
datalinelen += dlen;
}

As you can see, if the data to be written doesn't fit into the remaining buffer space <ctl_flushpkt> is called, which will send out the current packet and reset the datapt to point to the beginning of the buffer. However, memmove will be called in any case and if dlen is bigger than the total buffer size it will overflow the buffer. Note that the overflow happens in a global buffer and thus stack cookies won’t help in this case. So let's see if we can find a code path that will trigger this.

In most invocations, the data to be written comes from a fixed size buffer that is smaller then the output buffer and thus won't overflow.
The function <configure> which handles ntp.conf style remote configurations sent by a privileged client will send any error messages back to the client using <ctl_putdata>. By sending a configuration with enough errors, the error message string will exceed the buffer size.
However, the fact that the written data is restricted to a fixed set of error messages makes exploitation difficult.

A more powerful overwrite can be found in <read_variables>. The NTP daemon keeps a list of name=value variables that can be set through the configuration and read back with a control mode packet. If a variable bigger than the output buffer is read back, it will overflow and corrupt whatever is stored behind the buffer.

Setting Variables
So how can we set variables? As mentioned before, there is a control mode packet through which we can send configuration commands to ntpd and thereby set any variable we want. But this is obviously a privileged operation and protected by two mechanisms:

  1. Access to private and control mode queries can be restricted in ntp.conf based on the source IP. Default installations usually prohibit these queries for every source IP except for 127.0.0.1 and ::1. This is what e.g. Ubuntu, Debian and OS X do.
  2. The packet needs to be authenticated with a MAC for which the shared key has to be specified in ntp.conf, which again shouldn't be set on default installations.

Bypassing the first one is actually not that hard if you’re on the same network. As we all know IP addresses can be spoofed. But can we spoof the address of localhost? It turns out OS X and the Linux Kernel behave similarly in this case. Any IP packet arriving on an external interface and with the source IP 127.0.0.1 will be dropped immediately. But if we use IPv6 instead we can actually spoof ::1 and send control mode packets to the daemon (some Linux distributions have firewall rules in place that protect against this, e.g. Red Hat). Thus, if we are on the same local network, we can send spoofed packets to the link-local address of the target and bypass the IP restrictions. But what about requirement number 2? This one sounds tough: how can you have a valid MAC if no key is specified?

Quest for the Key
Let’s back up and discuss a little bit of background first. Through ntp.conf you can specify multiple keys and assign key ids to them. These key ids can then be assigned to different roles, i.e., a requestkey can be used to authenticate private mode packets and a controlkey is used for control mode packets. We need a controlkey to send our configuration requests but a requestkey would actually suffice since a private mode packet exists that will set the controlkey id to a specified value.

And that’s where another bug comes into play that was discovered by Neel Mehta. Let’s take a look what ntpd does if no requestkey was specified in the config:

/* if doesn't exist, make up one at random */
if (authhavekey(req_keyid)) {
//[...]
} else {
unsigned int rankey;
 
rankey = ntp_random();
req_keytype = NID_md5;
req_hashlen = 16;
MD5auth_setkey(req_keyid, req_keytype,
  (u_char *)&rankey, sizeof(rankey));
authtrust(req_keyid, 1);
}

That’s right, if no key was specified, a random 31 bit key will be generated, which means we can brute force it by sending 2^31 packets to the vulnerable daemon with a 68 byte payload each. But wait, there’s more! The random key is created by a custom random number generator implementation that is seeded with a 32 bit value and we can get the output of this generator through standard time synchronization requests. Part of the receive timestamp that we get by querying the time from the daemon is a random value from this generator and each query allows us to recover around 12 bits of the output which we can use to brute force the seed offline. However, the feasibility of a naive brute force approach highly depends on the uptime of ntpd since the number of random values that have been created will increase the search space. To give an idea of the time complexity, my single core implementation takes a few hours on my laptop even if I limit the search space to the first 1024 random values, but you can throw more cores at the problem or precompute as much as possible and build a lookup table.
At this point, we have an overflow in a global buffer that can be triggered remotely on standard configurations. Neat!

The Overflow
Now that we have the key, we can send configuration commands and write arbitrary variables. When reading them back from the daemon, you can optionally specify the variables that you’re interested in. ntpd will iterate through them, write them (separated by a comma) to the global buffer through the function <ctl_putdata> and finally flush them out with <ctl_flushpkt>. There are still some restrictions on this overflow that make exploitation notably harder.

  1. We can’t write 0x00, 0x22 (“) and 0xff.
  2. Some data will be appended after our overwrite. That is, “, “ between two variable writes and “\r\n” on the final flush.

How to proceed from here depends on which OS/distribution/architecture you target since protection mechanisms and the memory layout of the global data structures will differ. A few examples:

  • On x64, the inability to write null bytes prevents us from completely overwriting pointers since the most significant bytes are null bytes. This poses a problem since “\r\n” is appended to our data, which will limit the control over partial pointer overwrites. On x86 however, this shouldn’t be an issue.
  • At least on Debian, some compile time protections are not enabled for ntpd. I.e. the executable is not position independent and the global offset table (GOT) is writable during runtime.
  • On OS X Mavericks, the datapt variable which points to the current position in the buffer is located after the buffer itself while on Debian and Ubuntu the pointer is in front of the buffer and can’t be overwritten.

I chose to try my luck on a 64 bit OS X Mavericks. Since I have no prior experience with OS X, please bear with me if I missed something obvious or use the wrong nomenclature :).
The environment looks like this:

  • The binary, stack, heap and shared libraries are individually randomized with 16 bit entropy.
  • The address of the shared libraries is randomized at boot time.
  • On a crash, ntpd is restarted automatically with approximately 10 seconds delay.
  • ntpd is compiled with stack cookies (which doesn’t matter in our case since we overflow a global buffer).
  • The global offset table (GOT) is writable during runtime.

For a reliable exploit we will have to bypass ASLR somehow, so let’s leak some pointers. This one is actually quite easy since the datapt variable, which as you might remember points to the current write location, is located after the buffer itself:
We just have to overwrite the two least significant bytes of the datapt variable and as a consequence, ntpd will miscalculate the length and send you data after the buffer which leaks a pointer into the ntpd binary as well as a heap pointer. After that, the datapt variable is conveniently reset to point to the beginning of the buffer again.
Note that usually “\r\n” would get appended to our data and corrupt the partial pointer overwrite. But since we overwrite the write pointer itself, the newline sequence will be written to the new destination instead.

With the same trick, we can turn the bug into a slightly restricted write-what-where primitive: partially overwrite the datapt variable to point to where you want to write (minus a few bytes to make room for the separator) and then write arbitrary data with a second ntpd variable. Again, the fact that garbage is appended to our data is no issue for the first write since it will be written to the new location instead and won’t corrupt the pointer. Note that we can only write arbitrary data in front of the buffer since a higher address will trigger a flush and reset the datapt (after writing the separator, so this might still be used to corrupt a length field).

Unfortunately, the appended bytes still pose a problem. If we try to do a partial pointer overwrite through this, the “\r\n” sequence will always corrupt the pointer before it is used. Well, almost always. The GOT, and this took me way too long to figure out, is actually writable and used twice before our overwrite gets corrupted by the addition of “\r\n”. Between writing a variable and flushing the packet, <strlen> and <free> are called. That means, if we partially overwrite the GOT entry of either of those functions, the pointer will be used before it gets corrupted and we control rip.

Info leak, again
Since we know the base address of the binary and can overwrite GOT entries we can just find a nice gadget in the binary and jump to it, right? Unfortunately, that doesn’t work. To see why, let’s take a look at a couple of example addresses from the binary and libsystem_c:

0x000000010641c000 /usr/sbin/ntpd
0x00007fff88791000 /usr/lib/system/libsystem_c.dylib

The addresses of system libraries have two null bytes as their most significant bytes while the binary address starts with three null bytes. Thus if we overwrite the GOT entry of <strlen> with an address from the binary, there will still be 0x7f byte left from the library address (remember: we can’t write nul bytes).
To obtain the address of a system library we could try to turn our overwrite into a better leak, e.g. by overwriting some length field. But there is a lazier approach due to a weakness of ASLR on OS X Mavericks.
The most common libraries are loaded in the split library region (as “man vmmap” calls it) which is shared by all processes in the system. The load address of this region is randomized during boot. This means that the addresses stay the same if a program is restarted and that even libraries which are not used by the program are loaded in its address space and can be used for ROP gadgets. This and the fact that ntpd is restarted automatically when it crashes makes it possible to brute force the library addresses for <strlen> (libsystem_c) or <free> (libsystem_malloc) bytewise.
If you reboot your system a few times, you can observe that the load address of the split library region is always of the form 0x00007fff8XXXX000, providing 16 bit of entropy or 17 bit in our case since the region can extend to 0x00007fff9XXXX000. Let’s use the libsystem_c address from the example before: 0x00007fff88791000. We know that <strlen> is located at offset 0x1720 and thus 0x00007fff88792720 is the address we’re trying to brute force.
We start by brute forcing the upper 4 bits of the second least significant byte. We overwrite the GOT entry of <strlen> with 0x0720, resulting in the new entry 0x00007fff88790720. Since we didn’t hit the correct address ntpd will crash and won’t send us any replies anymore. In that case, we increment the address to 0x1720 and try it again. If ntpd does send us a reply, which will happen at 0x2720, we know that we found the correct byte and continue with the next one (0x012720).
This way, we can recover the libsystem_c address in 304 tries (4 bit + 8 bit + 5 bit) in the worst case. OS X will restart ntpd approximately every 10 seconds but you will need to brute force the key anew for every try, so bring your supercomputer. Also, if you’re unlucky you will run into an endless loop and ntpd has to be killed manually.

Arbitrary Code Execution
If it wasn’t for the fact that ntpd runs in a sandbox we would be finished now. Just overwrite the GOT entry of <strlen> with the address of <system> and execute arbitrary commands since it will get called with a user controlled string. But all you get out of this is the following line in /var/log/system.log:

sandboxd[405] ([41]): ntpd(41) deny process-exec /bin/sh

Instead, we need to find a nice gadget to control the stack pointer and make it point to a ROP chain. The usual way to do this would be a stack pivot but the data we control on the stack is limited.
On the stack, we control data in 3 locations which we can fill with arbitrary pointers, this time without any restrictions. Besides that, we completely control the contents of a global buffer at a known address in the binary and if we can get the stack pointer (rsp) to point to this buffer we can execute an arbitrary ROP chain.
Since our exploit overwrites the GOT, we only control the instruction pointer once, i.e. we can’t chain multiple calls. Thus, our first gadget needs to increment the stack pointer by either 0x80, 0x90 or 0xb8 so that it will use one of our addresses on return and do something useful at the same time. Fortunately, I found the following gadget in libsystem_c.dylib:

add rsp, 0x88
pop rbx
pop r12
pop r13
pop r14
pop r15
pop rbp
ret

This gadget returns to our address at rsp+0xb8 and at the same time loads the value from rsp+0x90 into r12. Since we now control a register, we can chain gadgets that end in a call qword [reg+n] where reg points to the global buffer that we control. For example, the second gadget looks like this:

mov rdi, r12
mov rsi, r14
mov rdx, r13
call qword [r12+0x10]

With a few gadgets of this kind, we control rsi and can load it into rsp:

push rsi
pop rsp
xor eax, eax
pop rbp
ret

And with that, we’re done. This will crash on a ret instruction with rsp pointing to user-controlled and thus arbitrary code execution is straightforward. Since we control the stack, we can build a ROP chain that loads and executes shellcode and from there try to break out of the sandbox by attacking the kernel or IPC channels. But that is left as an exercise for the reader :).

Exploit Summary
  1. Send a bunch of regular time synchronization requests to leak random values.
  2. Brute force the seed and calculate the requestkey (which has the keyid 65535).
  3. Send a private mode packet signed with the requestkey and with a spoofed source IP of ::1 to the server to set the controlkey id to 65535.
  4. Send a configuration change to lift all restrictions for our IP address.
  5. Add our IP to get async notifications (we have to do this, since we overwrite a flag later that triggers if responses are sent directly or asynchronously).
  6. Trigger the overflow by setting a long variable and reading it back and leak the binary base address.
  7. Use the overflow again as a write-what-where primitive to brute force the address of <strlen> bytewise.
  8. Prepare the data on the stack and in the global buffer.
  9. Call the gadgets to control rsp and execute a ROP chain.

Mitigation
In case your ntpd is not patched yet, these bugs can be effectively protected against through changes in your ntp.conf. The vulnerable <ctl_putdata> function is used by the processing of control mode packets and this can be blocked completely by adding “noquery” to every restrict line in the configuration. As explained before, it is important to also add “noquery” to the restrict lines for localhost, since the IP based access restrictions can often be bypassed through spoofing. But note that this will prevent ntpq from working and you won’t be able to query for peer information and other stats anymore.

For example, if your configurations includes multiple “restrict” lines:

restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict -6 ::1

make sure that “noquery” is included in all of those:

restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1 noquery
restrict -6 ::1 noquery

11 comments:

  1. Is this attack on ntp server or ntp client? Please let me know.

    ReplyDelete
  2. The ntpd daemon runs as a peer (both client and server, with query types allowed defined by the configuration).

    ReplyDelete
    Replies
    1. I am a bit new to ntp concept and I have done several readings, but I am not able to understand how this is going to affect a ntp client. Please let me know. Thanks in advance.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I checked the NTPd code and found out that it doesnt support fragmented Control Messages:

    ntpd/ntp_control.c

    Line 697:

    /*
    * If the length is less than required for the header, or
    * it is a response or a fragment, ignore this.
    */
    if (rbufp->recv_length < CTL_HEADER_LEN
    || pkt->r_m_e_op & (CTL_RESPONSE|CTL_MORE|CTL_ERROR)
    || pkt->offset != 0) {
    DPRINTF(1, ("invalid format in control packet\n"));
    if (rbufp->recv_length < CTL_HEADER_LEN)
    numctltooshort++;
    if (pkt->r_m_e_op & CTL_RESPONSE)
    numctlinputresp++;
    if (pkt->r_m_e_op & CTL_MORE)
    numctlinputfrag++;
    if (pkt->r_m_e_op & CTL_ERROR)
    numctlinputerr++;
    if (pkt->offset != 0)
    numctlbadoffset++;
    return;
    }


    My question is, how did you manage to set the variable to be big engough? I guess more than one packet, since I tried one with max value that one packet can fit and it isn't enough to overflow the buffer.

    ReplyDelete
  6. I think I know how, via readvar sys_var_list while before setting long variables names or with long values. That seems to do the job

    ReplyDelete
  7. Actually, it didn't work. The biggest dlen in ctl_pudata() I was able to get was 466, but the global buffersize as I understand is 504 bytes.

    I tried to set a long variable name in order to trigger the overflow.

    :config setvar bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb = 1

    Than read it via

    :rv 0 sys_var_list

    It displays blank, but that's how stack looks like:

    (gdb) x/400x 0x684b0c
    0x684b0c : 0x5f737973 0x5f726176 0x7473696c 0x656c223d
    0x684b1c : 0x732c7061 0x74617274 0x702c6d75 0x69636572
    0x684b2c : 0x6e6f6973 0x6f6f722c 0x6c656474 0x722c7961
    0x684b3c : 0x64746f6f 0x2c707369 0x69666572 0x65722c64
    0x684b4c : 0x6d697466 0x63742c65 0x6565702c 0x666f2c72
    0x684b5c : 0x74657366 0x6572662c 0x6e657571 0x732c7963
    0x684b6c : 0x6a5f7379 0x65747469 0x6c632c72 0x696a5f6b
    0x684b7c : 0x72657474 0x6f6c632c 0x702c6b63 0x65636f72
    0x684b8c : 0x726f7373 0x7379732c 0x2c6d6574 0x73726576
    0x684b9c : 0x2c6e6f69 0x5f6b6c63 0x646e6177 0x732c7265
    0x684bac : 0x765f7379 0x6c5f7261 0x2c747369 0x2c696174
    0x684bbc : 0x7061656c 0x2c636573 0x69707865 0x6d2c65---Type to continue, or q to quit---
    72
    0x684bcc : 0x63746e69 0x6561642c 0x5f6e6f6d 0x73726576
    0x684bdc : 0x2c6e6f69 0x74746573 0x6f656d69 0x79616466
    0x684bec : 0x6363612c 0x5f737365 0x696c6f70 0x612c7963
    0x684bfc : 0x6262622c 0x62626262 0x62626262 0x62626262
    0x684c0c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c1c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c2c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c3c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c4c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c5c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c6c : 0x62626262 0x62626262 0x62626262 0x62626262
    ---Type to continue, or q to quit---
    0x684c7c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c8c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684c9c : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684cac : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684cbc : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684ccc : 0x62626262 0x62626262 0x62626262 0x62626262
    0x684cdc : 0x0a0d2262 0x00000000 0x00000000 0x00000000
    0x684cec : 0x00000000 0x00000000 0x00000000 0x00000000
    0x684cfc : 0x00000000 0x00000000 0x00000401 0x00000000
    0x684d0c: 0x00000000 0x00000000 0x00000000 0x00000000
    0x684d1c: 0x00000000 0x756e694c 0x00000078 0x00000000


    How did you overflow it than?

    ReplyDelete
  8. OK, figured it out :) Created the packet from hand, instead of using ntpq client

    ReplyDelete
  9. It is mentioned that data is controlled on the stack, but the origin of this data is not explained. What is the data that is controllable on the stack and where does it come from?

    ReplyDelete