Thursday, August 2, 2018

Adventures in vulnerability reporting

Posted by Natalie Silvanovich, Project Zero

At Project Zero, we spend a lot of time reporting security bugs to vendors. Most of the time, this is a fairly straightforward process, but we occasionally encounter challenges getting information about vulnerabilities into the hands of vendors. Since it is important to user security that software vendors fix reported vulnerabilities in a timely matter, and vendors need to actually receive the report for this to happen, we have decided to share some of our experiences. We hope to show that good practices by software vendors can avoid delays in vulnerability reporting.

Effective Vulnerability Reporting Processes
There are several aspects of a bug reporting process that make reporting vulnerabilities easier from the bug reporter’s perspective. To start off, it’s important for a bug reporting process to be easy to find and use. We sometimes have difficulty figuring out how to report a vulnerability in a piece of software if the vulnerability reporting process is not documented on the project or vendor’s website, or if outdated material is not removed and instructions for reporting vulnerabilities are inconsistent. This can lead to delays in reporting. Effective vulnerability reporting processes are clearly documented, and the documentation is easy to find.

We also appreciate when the process for reporting a vulnerability is short and straightforward. Occasionally, we report dozens of vulnerabilities in a vendor’s products, and it is helpful when reporting does not require a lot of clicks and reading. Reporting processes that use email or bug trackers are usually the easiest, though webforms can be easy if they are not excessively long. While Project Zero will always report a vulnerability, even if reporting it is very time consuming, this is not necessarily the case for other bug reporters. Long bug reporting processes can cause bug reporters to report bugs more slowly, spend less time working on a piece of software or even give up on reporting a bug. The easier a bug reporting process is, the more likely it is that someone will go through with it.

It’s also important for bug reporting processes to be well-tested. While the majority we encounter are, we’ve occasionally had bug reporting email addresses bounce, webforms reject necessary information (like the reporter’s name) and security issues go unnoticed in bug trackers for months despite following the documented process. Vendors with good processes usually test that their process and any systems it involves works correctly on a regular basis.

Mandatory legal agreements in the reporting process are another problem that we encounter every so often. If a legal agreement contains language about disclosure or any other subject we don’t feel comfortable entering an agreement about on behalf of our company, deciding whether to enter the agreement can require a lengthy discussion, delaying the bug report. While legal agreements are sometimes necessary for rewards programs and code contributions, good vulnerability reporting processes allow bug reporters to report bugs without them.

It is also helpful when vendors confirm that vulnerability reports have been received in a timely manner. Since bug reports can get lost for a number of reasons, including bugs in the reporting interface and human error, it is a good idea to let reporters know that their report has been received, even if it won’t be processed right away. This lets the reporter know that they’ve reported the bug correctly, and don’t need to spend any more time reporting it, and makes it more likely that bug reporters will reach out if a bug report gets lost, as they will be expecting a confirmation.

Finally, even if good practices are followed in creating the bug reporting process, it is still possible that a bug reporting process has problems, so it is very helpful if vendors provide a way to give feedback on the process. It’s very rare for vendors to intentionally make bug reporting difficult, but unexpected problems happen fairly frequently, so it is good to provide a way bug reporters can reach out for help as a last resort if a reporting a bug fails for any reason.

Examples
One example of a bug we had difficulty reporting due to a vendor not following the practices described above is CVE-2018-10751.  CVE-2018-10751 is a remote memory corruption vulnerability in OMACP affecting the Samsung S7 Edge. The issue can be triggered by sending a single SMS to the target device, and does not require any user interaction. The payload can be sent from an app on an Android device without root access or any special equipment. It is similar to CVE-2016-7990, which is described in detail here.

Samsung’s Vulnerability Reporting Process
CVE-2018-10751 is a serious vulnerability, and I wanted to report it immediately. I started off by reading Samsung Mobile’s Security Reporting page. This page has a button to create a bug report.


Pressing the button led to a sign-up page. I didn’t have a Samsung account, so I tried to sign up. Unfortunately, it led to this page:


Not speaking Korean, I wasn’t sure what to do here. I eventually went back to the previous page and tried the ‘Sign-in’ button.

This brought me to an English sign-up page, which then brought me to the account creation page. According to this page, I had to read and agree to some terms. Clicking the links led to over twenty separate agreements, most of which had nothing to do with vulnerability reporting.
https://account.samsung.com Accessed February 22, 2018

That’s a lot of text to read and review. Let’s just say I skimmed a bit. Once I clicked ‘Agree’, I was taken to a page where I could enter account information. The page required my birthdate and zip code, which I wasn’t thrilled to have to provide to report a vulnerability, but I wanted to get the issue reported, so I entered them. Finally, my account was created! I logged in, hoping to start reporting the bug, only to be greeted with more conditions.

https://account.samsung.com Accessed February 22, 2018

These ones were in Korean, and I couldn’t figure out how to change the language. Eventually, I just selected confirm. Finally, I got to the form where I could report bugs!


I filled out the vulnerability information, and scrolled down, and there was one more set of terms to agree to:

These terms included:

- You MUST hold off disclosing the vulnerability in reasonable time, and you MUST get Samsung’s  consent or inform Samsung about the date before disclosing the vulnerability.
- In some cases, Samsung may request not to disclose the vulnerability at all.

I was not able to submit this form without agreeing to allow Samsung some level of control over disclosure of reported vulnerability. I looked around Samsung’s security page to see if they provided an email address I could report the issue to, but they did not provide one. I was not comfortable reporting this bug through the mechanisms Samsung provides for vulnerability reporting on their website.

Problems with Vulnerability Reporting Processes

I encountered several problems while trying to report the above vulnerability—most of which have been since resolved by Samsung.

To start off, Samsung’s bug reporting process did not seem adequately tested. The many times that Korean text showed up while attempting to report this vulnerability suggests that it was not tested in English. As described above, is important for vendors to test vulnerability reporting processes, including for internationalization issues. The workflow is also excessively long, and requires the reporter to agree to a very large number of agreements, many of which have nothing to do with vulnerability reports. I suspect that the people testing this interface might have already had accounts, and not seen how long the process is for someone who just wants to report a bug.

This isn’t an uncommon problem. The Android security reporting template requires creating a GMail account, which can require clicking through many screens and verification via SMS in some circumstances. As a result of our feedback, the Android Security team has improved the documentation that vulnerability reports can be filed via email (security@android.com), although using the web form is still required to participate in the Android Security rewards program.

Another problem was that in order to report a bug, a reporter had to agree to the terms of the rewards program. This is an issue that Project Zero has been seeing increasingly often. When software vendors start rewards programs, they often remove existing mechanisms for reporting vulnerabilities, leaving bug reporters with no way to report vulnerabilities without entering into agreements.

This also occurred when Tavis Ormandy attempted to report the vulnerability he reluctantly dubbed CloudBleed. Cloudflare’s vulnerability reporting process is tied to its rewards program with HackerOne, and there is no clear way to report a vulnerability without creating a HackerOne account in their Vulnerability Disclosure Policy. The policy even states “We agree with their disclosure philosophy, and if you do too, please submit your vulnerability reports here” without providing an alternative for vulnerability reporters who don’t agree or don’t want to participate in the program for whatever reason. In Project Zero’s case, our disclosure deadline is 90 days meanwhile HackerOne’s deadline is 180 days. This vulnerability was also very urgent as it was actively leaking user data onto the Internet, and we didn’t want to delay reporting the issue while we read through HackerOne’s terms to determine whether they were compatible with our disclosure policy.

We find that vendors generally don’t intend to prevent bug reports from anyone who won’t agree to their disclosure rules, but this was the end result of Samsung and Cloudflare replacing their bug reporting process with a rewards program.

The specific terms of Samsung’s agreement were also fairly vague. In particular, it wasn’t clear what the consequences of breaking the terms would be. For example:

- You MUST hold off disclosing the vulnerability in reasonable time, and you MUST get Samsung’s  consent or inform Samsung about the date before disclosing the vulnerability.

Does this mean that if someone discloses a vulnerability without permission, they are not eligible for a reward? Does it mean that if someone discloses the vulnerability without permission, Samsung can take legal action against them? While requiring that bug reporters not disclose vulnerabilities to receive rewards is a policy with debatable benefit, I would have been much more comfortable agreeing to these terms if they had spelled out that violating them would simply mean I would not receive a reward, as opposed to other legal consequences.
Overall, the issues of poorly tested bug reporting interfaces and requiring legal agreements to report vulnerabilities have come up multiple times, and led to delays of Project Zero reporting vulnerabilities. We recommend that vendors test their vulnerability reporting interfaces from the perspective of someone who’s never reported a bug from outside of their corporate network, and make sure to do localized testing. It is also important to allow bug reports without requiring the reporter to enter into excessive legal agreements.

While only accepting vulnerability reports via web forms can reduce the number of invalid reports, which is a major challenge for teams accepting vulnerability reports, they can also be unreliable and prevent vulnerability reporting in situations that were not expected by those designing them, unless they are very well tested. Having an alternate email address that vulnerability reporters can use to report bugs if they encounter problems is a good way to prevent this type of problem.

Reporting the Bug
I eventually contacted some members of the Knox security team at Samsung that I had worked with on previous bugs and they recommended reporting the issue to mobile.security@samsung.com. This email is not documented on the Samsung website, except for a single blog post from 2015.

The difficulty I encountered reporting this serious vulnerability delayed my report one week. It might have caused a longer delay if I did not have contacts at Samsung who could help.

Samsung started rolling out updates for CVE-2018-10751 (Samsung’s identifier SVE-2018-11463) in their April maintenance release.

Samsung has updated their account creation page so that it always displays English text if the language is set to English. Also, the vulnerability report form can now be submitted without agreeing to the terms for the Samsung’s rewards program, though the user still has to agree to two other agreements. They have also updated their bug reporting page to provide an email address as well as a webform. We appreciate the changes they have made to make reporting vulnerabilities in Samsung products easier for everyone.

Conclusion
Project Zero has occasionally had difficulty reporting vulnerabilities, leading to delays in reporting the bug. Usually, these are due to problems in the reporting process that were not intended or expected by the vendor. A difficult vulnerability reporting process can have a negative impact on user security due to delays in vulnerability reports, lost vulnerability reports and even bug reporters choosing not to report a vulnerability. We appreciate when vendors do the following to make their bug reporting processes easier for bug reporters:

  • Vendors should regularly test their vulnerability reporting interfaces in all supported languages
  • Vendors should streamline their vulnerability reporting processing as much as possible, and remove excessive clicks and legal agreements
  • Vendors should regularly solicit feedback on their vulnerability reporting mechanisms from vulnerability reporters and people they think are likely to report vulnerabilities

Thursday, July 26, 2018

Drawing Outside the Box: Precision Issues in Graphic Libraries

By Mark Brand and Ivan Fratric, Google Project Zero

In this blog post, we are going to write about a seldom seen vulnerability class that typically affects graphic libraries (though it can also occur in other types of software). The root cause of such issues is using limited precision arithmetic in cases where a precision error would invalidate security assumptions made by the application.

While we could also call other classes of bugs precision issues, namely integer overflows, the major difference is: with integer overflows, we are dealing with arithmetic operations where the magnitude of the result is too large to be accurately represented in the given precision. With the issues described in this blog post, we are dealing with arithmetic operations where the magnitude of the result or a part of the result is too small to be accurately represented in the given precision.

These issues can occur when using floating-point arithmetic in operations where the result is security-sensitive, but, as we’ll demonstrate later, can also occur in integer arithmetic in some cases.

Let’s look at a trivial example:

 float a = 100000000;
 float b = 1;
 float c = a + b;

If we were making the computation with arbitrary precision, the result would be 100000001. However, since float typically only allows for 24 bits of precision, the result is actually going to be 100000000. If an application makes the normally reasonable assumption that a > 0 and b > 0 implies that a + b > a, then this could lead to issues.

In the example above, the difference between a and b is so significant that b completely vanishes in the result of the calculation, but precision errors also happen if the difference is smaller, for example

 float a = 1000;
 float b = 1.1111111;
 float c = a + b;

The result of the above computation is going to be 1001.111084 and not 1001.1111111 which would be the accurate result. Here, only a part of b is lost, but even such results can sometimes have interesting consequences.

While we used the float type in the above examples, and in these particular examples using double would result in more accurate computation, similar precision errors can happen with double as well.

In the remainder of this blog post, we are going to show several examples of precision issues with security impact. These issues were independently explored by two Project Zero members: Mark Brand, who looked at SwiftShader, a software OpenGL implementation used in Chrome, and Ivan Fratric, who looked at the Skia graphics library, used in Chrome and Firefox.

SwiftShader

SwiftShader is “a high-performance CPU-based implementation of the OpenGL ES and Direct3D 9 graphics APIs”. It’s used in Chrome on all platforms as a fallback rendering option to work around limitations in graphics hardware or drivers, allowing universal use of WebGL and other advanced javascript rendering APIs on a far wider range of devices.

The code in SwiftShader needs to handle emulating a wide range of operations that would normally be performed by the GPU. One operation that we commonly think of as essentially “free” on a GPU is upscaling, or drawing from a small source texture to a larger area, for example on the screen. This requires computing memory indexes using non-integer values, which is where the vulnerability occurs.

As noted in the original bug report, the code that we’ll look at here is not quite the code which is actually run in practice - SwiftShader uses an LLVM-based JIT engine to optimize performance-critical code at runtime, but that code is more difficult to understand than their fallback implementation, and both contain the same bug, so we’ll discuss the fallback code. This code is the copy-loop used to copy pixels from one surface to another during rendering:

 source->lockInternal((int)sRect.x0, (int)sRect.y0, sRect.slice, sw::LOCK_READONLY, sw::PUBLIC);
 dest->lockInternal(dRect.x0, dRect.y0, dRect.slice, sw::LOCK_WRITEONLY, sw::PUBLIC);

 float w = sRect.width() / dRect.width();
 float h = sRect.height() / dRect.height();

 const float xStart = sRect.x0 + 0.5f * w;
 float y = sRect.y0 + 0.5f * h;
 float x = xStart;

 for(int j = dRect.y0; j < dRect.y1; j++)
 {
   x = xStart;

   for(int i = dRect.x0; i < dRect.x1; i++)
   {
     // FIXME: Support RGBA mask
     dest->copyInternal(source, i, j, x, y, options.filter);

     x += w;
   }

   y += h;
 }

 source->unlockInternal();
 dest->unlockInternal();
}

So - what highlights this code as problematic? We know prior to entering this function that all the bounds-checking has already been performed, and that any call to copyInternal with (i, j) in dRect and (x, y) in sRect will be safe.

The examples in the introduction above show cases where the resulting precision error means that a rounding-down occurs - in this case that wouldn’t be enough to produce an interesting security bug. Can we cause floating-point imprecision to result in a larger-than-correct value, leading to (x, y) values that are larger than expected?

If we look at the code, the intention of the developers is to compute the following:

 for(int j = dRect.y0; j < dRect.y1; j++)
 {
   for(int i = dRect.x0; i < dRect.x1; i++)
   {
     x = xStart + (i * w);
     Y = yStart + (j * h);
     dest->copyInternal(source, i, j, x, y, options.filter);
   }
 }

If this approach had been used instead, we’d still have precision errors - but without the iterative calculation, there’d be no propagation of the error, and we could expect the eventual magnitude of the precision error to be stable, and in direct proportion to the size of the operands. With the iterative calculation as performed in the code, the errors start to propagate/snowball into a larger and larger error.

There are ways to estimate the maximum error in floating point calculations; and if you really, really need to avoid having extra bounds checks, using this kind of approach and making sure that you have conservative safety margins around those maximum errors might be a complicated and error-prone way to solve this issue. It’s not a great approach to identifying the pathological values that we want here to demonstrate a vulnerability; so instead we’ll take a brute-force approach.

Instinctively, we’re fairly sure that the multiplicative implementation will be roughly correct, and that the implementation with iterative addition will be much less correct. Given that the space of possible inputs is small (Chrome disallows textures with width or height greater than 8192), we can just run a brute force over all ratios of source width to destination width, comparing the two algorithms, and seeing where the results are most different. (Note that SwiftShader also limits us to even numbers). This leads us to the values of 5828, 8132; and if we compare the computations in this case (left side is the iterative addition, right side is the multiplication):

0:    1.075012 1.075012
1:    1.791687 1.791687
...
1000: 717.749878 717.749878   Up to here (at the precision shown) the values are still identical
1001: 718.466553 718.466553
...
2046: 1467.391724 1467.391724 At this point, the first significant errors start to occur, but note
2047: 1468.108398 1468.108521 that the "incorrect" result is smaller than the more precise one.
...
2856: 2047.898315 2047.898438
2857: 2048.614990 2048.614990 Here our two computations coincide again, briefly, and from here onwards
2858: 2049.331787 2049.331787 the precision errors consistently favour a larger result than the more
2859: 2050.048584 2050.048340 precise calculation.
...
8129: 5827.567871 5826.924805
8130: 5828.284668 5827.641602
8131: 5829.001465 5828.358398 The last index is now sufficiently different that int conversion results in an oob index.

(Note also that there will also be error in the “safe” calculation; it’s just that the lack of error propagation means that that error will remain directly proportional to the size of the input error, which we expect to be “small.”)

We can indeed see that, the multiplicative algorithm would remain within bounds; but that the iterative algorithm can return an index that is outside the bounds of the input texture!

As a result, we read an entire row of pixels past the end of our texture allocation - and this can be easily leaked back to javascript using WebGL. Stay tuned for an upcoming blog post in which we’ll use this vulnerability together with another unrelated issue in SwiftShader to take control of the GPU process from javascript.

Skia

Skia is a graphics library used, among other places, in Chrome, Firefox and Android. In the web browsers it is used for example when drawing to a canvas HTML element using CanvasRenderingContext2D or when drawing SVG images. Skia is also used when drawing various other HTML elements, but canvas element and SVG images are more interesting from the security perspective because they enable more direct control over the objects being drawn by the graphic library.

The most complex type of object (and therefore, most interesting from the security perspective) that Skia can draw is a path. A path is an object that consists of elements such as lines, but also more complex curves, in particular quadratic or cubic splines.

Due to the way software drawing algorithms work in Skia, the precision issues are very much possible and quite impactful when they happen, typically leading to out-of-bounds writes.

To understand why these issues can happen, let’s assume you have an image in memory (represented as a buffer with size = width x height x color size). Normally, when drawing a pixel with coordinates (x, y) and color c, you would want to make sure that the pixel actually falls within the space of the image, specifically that 0 <= x < width and 0 <= y < height. Failing to check this could result in attempting to write the pixel outside the bounds of the allocated buffer. In computer graphics, making sure that only the objects in the image region are being drawn is called clipping.

So, where is the problem? Making a clip check for every pixel is expensive in terms of CPU cycles and Skia prides itself on speed. So, instead of making a clip check for every pixel, what Skia does is, it first makes the clip check on an entire object (e.g. line, path or any other type of object being drawn). Depending on the clip check, there are three possible outcomes:

  1. The object is completely outside of the drawing area: The drawing function doesn’t draw anything and returns immediately.

  1. The object is partially inside the drawing area: The drawing function proceeds with per-pixel clip enabled (usually by relying on SkRectClipBlitter).

  1. The entire object is in the drawing area: The drawing function draws directly into the buffer without performing per-pixel clip checks.

The problematic scenario is c) where the clip check is performed only per-object and the more precise, per-pixel checks are disabled. This means, if there is a precision issue somewhere between the per-object clip check and the drawing of pixels and if the precision issue causes the pixel coordinates to go outside of the drawing area, this could result in a security vulnerability.

We can see per-object clip checks leading to dropping per-pixel checks in several places, for example:

  • In hair_path (function for drawing a path without filling), clip is initially set to null (which disables clip checks). The clip is only set if the bounds of the path, rounded up and extended by 1 or 2 depending on the drawing options don’t fit in the drawing area. Extending the path bounds by 1 seems like a pretty large safety margin, but it is actually the least possible safe value because drawing objects with antialiasing on will sometimes result in drawing to nearby pixels.

  • In SkScan::FillPath (function for filling a path with antialiasing turned off), the bounds of the path are first extended by kConservativeRoundBias and rounded to obtain the “conservative” path bounds. A SkScanClipper object is then created for the current path. As we can see in the definition of SkScanClipper, it will only use SkRectClipBlitter if the x coordinates of the path bounds are outside the drawing area or if irPreClipped is true (which only happens when path coordinates are very large).

Similar patterns can be seen in other drawing functions.

Before we take a closer look at the issues, it is useful to quickly go over various number formats used by Skia:

  • SkScalar is a 32-bit floating point number

  • SkFDot6 is defined as an integer, but it is actually a fixed-point number with 26 bits to the left and 6 bits to the right of the decimal point. For example, SkFDot6 value of 0x00000001 represents the number 1/64.

  • SkFixed is also a fixed-point number, this time with 16 bits to the left and 16 bits to the right of the decimal point. For example, SkFixed value of 0x00000001 represents 1/(2**16)

Precision error with integer to float conversion

We discovered the initial problem when doing DOM fuzzing against Firefox last year. This issue where Skia wrote out-of-bounds caught our eye so we investigated further. It turned out the root cause was a discrepancy in the way Skia converted floating point to ints in several places. When making the per-path clip check, the lower coordinates (left and top of the bounding box) were rounded using this function:

static inline int round_down_to_int(SkScalar x) {
   double xx = x;
   xx -= 0.5;
   return (int)ceil(xx);
}

Looking at the code you see that it will return a number greater or equal to zero (which is necessary for passing the path-level clip check) for numbers that are strictly larger than -0.5. However, in another part of the code, specifically SkEdge::setLine if SK_RASTERIZE_EVEN_ROUNDING is defined (which is the case in Firefox), floats are rounded to integers differently, using the following function:

inline SkFDot6 SkScalarRoundToFDot6(SkScalar x, int shift = 0)
{
   union {
       double fDouble;
       int32_t fBits[2];
   } tmp;
   int fractionalBits = 6 + shift;
   double magic = (1LL << (52 - (fractionalBits))) * 1.5;

   tmp.fDouble = SkScalarToDouble(x) + magic;
#ifdef SK_CPU_BENDIAN
   return tmp.fBits[1];
#else
   return tmp.fBits[0];
#endif
}

Now let’s take a look at what these two functions return for a number -0.499. For this number, round_down_to_int returns 0 (which always passes the clipping check) and SkScalarRoundToFDot6 returns -32 which corresponds to -0.5, so we actually end up with a number that is smaller than the one we started with.

That’s not the only problem, though, because there’s another place where a precision error occurs in SkEdge::setLine.

Precision error when multiplying fractions

SkEdge::setLine calls SkFixedMul which is defined as:

static inline SkFixed(SkFixed a, SkFixed b) {
   return (SkFixed)((int64_t)a * b >> 16);
}

This function is for multiplying two SkFixed numbers. An issue comes up when using this function to multiply negative numbers. Let’s look at a small example. Let’s assume a = -1/(2**16) and b = 1/(2**16). If we multiply these two numbers on paper, the result is -1/(2**32). However, due to the way SkFixedMul works, specifically because the right shift is used to convert the result back to SkFixed format, the result we actually end up with is 0xFFFFFFFF which is SkFixed for  -1/(2**16). Thus, we end up with a result with a magnitude much larger than expected.

As the result of this multiplication is used by SkEdge::setLine to adjust the x coordinate of the initial line point here, we can use the issue in SkFixedMul to cause an additional error up to 1/64 of a pixel to go outside of the drawing area bounds.

By combining the previous two issues, it was possible to get the x coordinate of a line sufficiently small (smaller than -0.5), so that, when a fractional representation was rounded to an integer here, Skia attempted to draw at coordinates with x = -1, which is clearly outside the image bounds. This then led to an out-of-bounds write as can be seen in the original bug report. This bug could be exploited in Firefox by drawing an SVG image with coordinates as described in the previous section.

Floating point precision error when converting splines to line segments

When drawing paths, Skia is going to convert all non-linear curves (conic shapes, quadratic and cubic splines) to line segments. Perhaps unsurprisingly, these conversions suffer from precision errors.

The conversion of splines into line segments happen in several places, but the most susceptible to floating-point precision errors are hair_quad (used for drawing quadratic curves) and hair_cubic (used for drawing cubic curves). Both of these functions are called from hair_path, which we already mentioned above. Because (unsurprisingly), larger precision errors occur when dealing with cubic splines, we’ll only consider the cubic case here.

When approximating the spline, first the cubic coefficients are computed in SkCubicCoeff. The most interesting part is:

fA = P3 + three * (P1 - P2) - P0;
fB = three * (P2 - times_2(P1) + P0);
fC = three * (P1 - P0);
fD = P0;

Where P1, P2 and P3 are input points and fA, fB, fC and fD are output coefficients. The line segment points are then computed in hair_cubic using the following code

const Sk2s dt(SK_Scalar1 / lines);
Sk2s t(0);

...

Sk2s A = coeff.fA;
Sk2s B = coeff.fB;
Sk2s C = coeff.fC;
Sk2s D = coeff.fD;
for (int i = 1; i < lines; ++i) {
   t = t + dt;
   Sk2s p = ((A * t + B) * t + C) * t + D;
   p.store(&tmp[i]);
}

Where p is the output point and lines is the number of line segments we are using to approximate the curve. Depending on the length of the spline, a cubic spline can be approximated with up to 512 lines.

It is obvious that the arithmetic here is not going to be precise. As identical computations happen for x and y coordinates, let’s just consider the x coordinate in the rest of the post.

Let’s assume the width of the drawing area is 1000 pixels. Because hair_path is used for drawing path with antialiasing turned on, it needs to make sure that all points of the path are between 1 and 999, which is done in the initial, path-level clip check. Let’s consider the following coordinates that all pass this check:

p0 = 1.501923
p1 = 998.468811
p2 = 998.998779
p3 = 999.000000

For these points, the coefficients are as follows

a = 995.908203
b = -2989.310547
c = 2990.900879
d = 1.501923

If you do the same computation in larger precision, you’re going to notice that the numbers here aren’t quite correct. Now let’s see what happens if we approximate the spline with 512 line segments. This results in 513 x coordinates:

0: 1.501923
1: 7.332130
2: 13.139574
3: 18.924301
4: 24.686356
5: 30.425781
...
500: 998.986389
501: 998.989563
502: 998.992126
503: 998.994141
504: 998.995972
505: 998.997314
506: 998.998291
507: 998.999084
508: 998.999695
509: 998.999878
510: 999.000000
511: 999.000244
512: 999.000000

We can see that the x coordinate keeps growing and at point 511 clearly goes outside of the “safe” area and grows larger than 999.

As it happens, this isn’t sufficient to trigger an out-of-bounds write, because, due to how drawing antialiased lines works in Skia, we need to go at least 1/64 of a pixel outside of the clip area for it to become a security issue. However, an interesting thing about the precision errors in this case is that the larger the drawing area, the larger the error that can happen.

So let’s instead consider a drawing area of 32767 pixels (maximum canvas size in Chrome). The initial clipping check then checks that all path points are in the interval [1, 32766]. Now let’s consider the following points:

p0 = 1.7490234375
p1 = 32765.9902343750
p2 = 32766.000000
p3 = 32766.000000

The corresponding coefficients

a = 32764.222656
b = -98292.687500
c = 98292.726562
d = 1.749023

And the corresponding line approximation

0: 1.74902343
1: 193.352295
2: 384.207123
3: 574.314941
4: 763.677246
5: 952.295532
505: 32765.925781
506: 32765.957031
507: 32765.976562
508: 32765.992188
509: 32766.003906
510: 32766.003906
511: 32766.015625
512: 32766.000000

You can see that we went out-of-bounds significantly more at index 511.

Fortunately for Skia and unfortunately for aspiring attackers, this bug can’t be used to trigger memory corruption, at least not in the up-to-date version of skia. The reason is SkDrawTiler. Whenever Skia draws using SkBitmapDevice (as opposed to using a GPU device) and the drawing area is larger than 8191 pixels in any dimension, instead of drawing the whole image at once, Skia is going to split it into tiles of size (at most) 8191x8191 pixels. This change was made in March, not for security reasons, but to be able to support larger drawing surfaces. However, it still effectively prevented us from exploiting this issue and will also prevent exploiting other cases where a surface larger than 8191 is required to reach the precision error of a sufficient magnitude.

Still, this bug was exploitable before March and we think it nicely demonstrates the concept of precision errors.

Integer precision error when converting splines to line segments

There is another place where splines are approximated as line segments when drawing (in this case: filling) paths that was also affected by a precision error, in this case an exploitable one. Interestingly, here the precision error wasn’t in floating-point but rather in fixed-point arithmetic.

The error happens in SkQuadraticEdge::setQuadraticWithoutUpdate and SkCubicEdge::setCubicWithoutUpdate. For simplicity, we are again going to concentrate just on the cubic spline version and, again, only on the x coordinate.

In SkCubicEdge::setCubicWithoutUpdate, the curve coordinates are first converted to SkFDot6 type (integer with 6 bits used for fraction). After that, parameters corresponding to the first, second and third derivative of the curve at the initial point are going to be computed:

SkFixed B = SkFDot6UpShift(3 * (x1 - x0), upShift);
SkFixed C = SkFDot6UpShift(3 * (x0 - x1 - x1 + x2), upShift);
SkFixed D = SkFDot6UpShift(x3 + 3 * (x1 - x2) - x0, upShift);

fCx     = SkFDot6ToFixed(x0);
fCDx    = B + (C >> shift) + (D >> 2*shift);    // biased by shift
fCDDx   = 2*C + (3*D >> (shift - 1));           // biased by 2*shift
fCDDDx  = 3*D >> (shift - 1);                   // biased by 2*shift

Where x0, x1, x2 and x3 are x coordinates of the 4 points that define the cubic spline and shift and upShift depend on the length of the curve (this corresponds to the number of linear segments the curve is going to be approximated in). For simplicity, we can assume shift = upShift = 6 (maximum possible values).

Now let’s see what happens for some very simple input values:

x0 = -30
x1 = -31
x2 = -31
x3 = -31

Note that x0, x1, x2 and x3 are of the type SkFDot6 so value -30 corresponds to -0.46875 and -31 to -0.484375. These are close to -0.5 but not quite and are thus perfectly safe when rounded. Now let’s examine the values of the computed parameters:

B = -192
C = 192
D = -64

fCx = -30720
fCDx = -190
fCDDx = 378
fCDDDx = -6

Do you see where the issue is? Hint: it’s in the formula for fCDx.

When computing fCDx (first derivation of a curve), the value of D needs is right-shifted by 12. However, D is too small to do that precisely, and since D is negative, the right shift

D >> 2*shift

Is going to result in -1, which is larger in magnitude than the intended result. (Since D is of type SkFixed its actual value is -0.0009765625 and the shift, when interpreted as division by 4096, would result in -2.384185e-07). Because of this, the whole fCDx ends up as a larger negative value than it should (-190 vs. -189.015).

Afterwards, the value of fCDx gets used when calculating the x value of line segments. This happens in SkCubicEdge::updateCubic on this line:

newx    = oldx + (fCDx >> dshift);

The x values, when approximating the spline with 64 line segments (maximum for this algorithm), are going to be (expressed as index, integer SkFixed value and the corresponding floating point value):

index raw      interpretation
0:    -30720   -0.46875
1:    -30768   -0.469482
2:    -30815   -0.470200
3:    -30860   -0.470886
4:    -30904   -0.471558
5:    -30947   -0.472214
...
31:   -31683   -0.483444
32:   -31700   -0.483704
33:   -31716   -0.483948
34:   -31732   -0.484192
35:   -31747   -0.484421
36:   -31762   -0.484650
37:   -31776   -0.484863
38:   -31790   -0.485077
...
60:   -32005   -0.488358
61:   -32013   -0.488480
62:   -32021   -0.488602
63:   -32029   -0.488724
64:   -32037   -0.488846

You can see that for the 35th point, the x value (-0.484421) ends up being smaller than the smallest input point (-0.484375) and the trend continues for the later points. This value would still get rounded to 0 though, but there is another problem.

The x values computed in SkCubicEdge::updateCubic are passed to SkEdge::updateLine, where they are converted from SkFixed type to SkFDot6 on the following lines:

x0 >>= 10;
x1 >>= 10;

Another right shift! And when, for example, SkFixed value -31747 gets shifted we end up with SkFDot6 value of -32 which represents -0.5.

At this point we can use the same trick described above in the “Precision error when multiplying fractions” section to go smaller than -0.5 and break out of the image bounds. In other words, we can make Skia draw to x = -1 when drawing a path.

But, what can we do with it?

In general, given that Skia allocates image pixels as a single allocation that is organized row by row (as most other software would allocate bitmaps), there are several cases of what can happen with precision issues. If we assume an width x height image and that we are only able to go one pixel out of bounds:

  1. Drawing to y = -1 or y = height immediately leads to heap out-of-bounds write
  2. Drawing to x = -1 with y = 0 immediately leads to a heap underflow of 1 pixel
  3. Drawing to x = width with y = height - 1 immediately leads to heap overflow of 1 pixel
  4. Drawing to x = -1 with y > 0 leads to a pixel “spilling” to the previous image row
  5. Drawing to x = height with y < height-1 leads to a pixel “spilling” to the next image row

What we have here is scenario d) - unfortunately we can’t draw to x = 1 with y = 0 because the precision error needs to accumulate over the growing values of y.

Let’s take a look at the following example SVG image:

<svg width="100" height="100" xmlns="http://www.w3.org/2000/svg">
<style>
body {
margin-top: 0px;
margin-right: 0px;
margin-bottom: 0px;
margin-left: 0px
}
</style>
<path d="M -0.46875 -0.484375 C -0.484375 -0.484375, -0.484375 -0.484375, -0.484375 100 L 1 100 L 1 -0.484375" fill="red" shape-rendering="crispEdges" />
</svg>

If we render this in an unpatched version of Firefox what we see is shown in the following image. Notice how the SVG only contains coordinates on the left side of the screen, but some of the red pixels get drawn on the right. This is because, due to the way images are allocated, drawing to x = -1 and y = row is equal to drawing to x = width - 1 and y = row - 1.


Opening an SVG image that triggers a Skia precision issue in Firefox. If you look closely you’ll notice some red pixels on the right side of the image. How did those get there? :)

Note that we used Mozilla Firefox and not Google Chrome because, due to SVG drawing internals (specifically: Skia seems to draw the entire image at once, while Chrome uses additional tiling) it is easier to demonstrate the issue in Firefox. However, both Chrome and Firefox were equally affected by this issue.

But, other than drawing a funny image, is there real security impact to this issue? Here, SkARGB32_Shader_Blitter comes to the rescue (SkARGB32_Shader_Blitter is used whenever shader effects are applied to a color in Skia). What is specific about SkARGB32_Shader_Blitter is that it allocates a temporary buffer of the same size as a single image row. When SkARGB32_Shader_Blitter::blitH is used to draw an entire image row, if we can make it draw from x = -1 to x = width - 1 (alternately from x = 0 to x = width), it will need to write width + 1 pixels into a buffer that can only hold width pixels, leading to a buffer overflow as can be seen in the ASan log in the bug report.

Note how the PoCs for Chrome and Firefox contain SVG images with a linearGradient element - the linear gradient is used specifically to select SkARGB32_Shader_Blitter instead of drawing pixels to the image directly, which would only result in pixels spilling to the previous row.

Another specific of this issue is that it can only be reached when drawing (more specifically: filling) paths with antialiasing turned off. As it is not currently possible to draw paths to a HTML canvas elements with antialiasing off (there is an imageSmoothingEnabled property but it only applies to drawing images, not paths), an SVG image with shape-rendering="crispEdges" must be used to trigger the issue.

All precision issues we reported in Skia were fixed by increasing kConservativeRoundBias. While the current bias value is large enough to cover the maximum precision errors we know about, we should not dismiss the possibility of other places where precision issues can occur.

Conclusion

While precision issues, such as described in this blog post, won’t be present in most software products, where they are present they can have quite serious consequences. To prevent them from occurring:

  • Don’t use floating-point arithmetic in cases where the result is security-sensitive. If you absolutely have to, then you need to make sure that the maximum possible precision error cannot be larger than some safety margin. Potentially, interval arithmetic could be used to determine the maximum precision error in some cases. Alternately, perform security checks on the result rather than input.

  • With integer arithmetic, be wary of any operations that can reduce the precision of the result, such as divisions and right shifts.

When it comes to finding such issues, unfortunately, there doesn’t seem to be a great way to do it. When we started looking at Skia, initially we wanted to try using symbolic execution on the drawing algorithms to find input values that would lead to drawing out-of-bounds, as, on the surface, it seemed this is a problem symbolic execution would be well suited for. However, in practice, there were too many issues: most tools don’t support floating point symbolic variables and, even when running against just the integer parts of the simplest line drawing algorithm, we were unsuccessful in completing the run in a reasonable time (we were using KLEE with STP and Z3 backends).

In the end, what we ended up doing was a combination of the more old-school methods: manual source review, fuzzing (especially with values close to image boundaries) and, in some cases, when we already identified potentially problematic areas of code, even bruteforcing the range of all possible values.

Do you know of other instances where precision errors resulted in security issues? Let us know about them in the comments.