Quantcast
Channel: Symbolic Execution – Trail of Bits Blog
Viewing all articles
Browse latest Browse all 10

Fuzzing Unit Tests with DeepState and Eclipser

$
0
0

If unit tests are important to you, there’s now another reason to use DeepState, our Google-Test-like property-based testing tool for C and C++. It’s called Eclipser, a powerful new fuzzer very recently presented in an ICSE 2019 paper. We are proud to announce that Eclipser is now fully integrated into DeepState.

Eclipser provides many of the benefits of symbolic execution in fuzzing, without the high computational and memory overhead usually associated with symbolic execution. It combines “the best of both white-box and grey-box fuzzing” using only lightweight instrumentation and, most critically, never calling an expensive SMT or SAT solver. Eclipser is the first in what we hope (perhaps with your help) to make a series of push-button front-ends to promising tools that require more work to apply than AFL or libFuzzer. Eclipser allows DeepState to quickly detect more hard-to-reach bugs.

What Makes Eclipser Special?

Traditional symbolic execution, supported by DeepState through tools such as Manticore and angr, keeps track of path constraints: conditions on a program’s input such that the program will take a particular path given an input satisfying the constraint. Unfortunately, solving such conditions is difficult and expensive, especially since many constraints are infeasible: they cannot be solved.

Many workarounds for the high cost of solving path constraints have been proposed, but most symbolic-execution based tools are still limited in scalability and prone to failure when asked to produce long paths or handle complex code. Eclipser builds on ideas developed in KLEE and MAYHEM to substitute approximate path constraints for path constraints. These conditions are (as the name suggests) less precise, but much easier to solve. Critically, they don’t require a slow solver. Eclipser still has to solve these approximate, “easy” constraints, but it can assume they are either simple and linear (in which case inexpensive techniques suffice) or at least monotonic, in which case Eclipser uses a binary search instead of a solver call. If the real constraint is neither linear nor monotonic, Eclipser will not be able to generate relevant inputs, but fuzzing may let it make progress despite this failure. In practice, symbolic execution will also often fail because of such constraints, but with a solver timeout, after wasting considerable computational effort. Eclipser will produce some input much more quickly (though not necessarily one satisfying the too-hard-to-solve conditions).

Why Should You Care?

Eclipser is interesting primarily because the authors report that it performed better in terms of code coverage on coreutils than KLEE, better in terms of bugs detected on LAVA-M benchmarks than AFLFast, LAF-intel, VUzzer, and Steelix and, most compellingly, better in terms of bugs detected on real Debian packages than AFLFast and LAF-intel. The Debian experiments produced eight new CVEs.

Given this promising performance, we decided to integrate Eclipser into DeepState, making it easy to apply the Eclipser approach to your unit testing. Out of the box, DeepState could already be used with Eclipser. The fuzzer works with any binary that takes a file as input. DeepState works with all file-based fuzzers we have tried. However, it is important to use the right arguments to DeepState with Eclipser, or else Eclipser’s QEMU-based instrumentation will not work. It also takes some manual effort to produce standalone test cases and crashing inputs for DeepState, since Eclipser stores tests in a custom format not usable by other tools. We therefore added a simple front-end to make your life (and our life) easier.

The Eclipser Paper Example

The DeepState examples directory has the code for a DeepState-ized version of the main example used in the Eclipser paper:

#include <deepstate/DeepState.hpp>
using namespace deepstate;
#include <assert.h>

int vulnfunc(int32_t intInput, char * strInput) {
   if (2 * intInput + 1 == 31337)
      if (strcmp(strInput, "Bad!") == 0)
         assert(0);
   return 0;
}

TEST(FromEclipser, CrashIt) {
   char *buf = (char*)DeepState_Malloc(9);
   buf[8] = 0;
   vulnfunc(*((int32_t*) &buf[0]), &buf[4]);
}

The easiest way to try this example out is to build the DeepState docker image (yes, DeepState now makes it easy to create a full-featured docker image):

$ git clone https://github.com/trailofbits/deepstate
$ cd deepstate
$ docker build -t deepstate . -f docker/Dockerfile
$ docker run -it deepstate bash

Building the docker image will take a while: DeepState, AFL, and libFuzzer are quick, but building Eclipser is a fairly involved process.

Once you are inside the DeepState docker image:

$ cd deepstate/build/examples
$ deepstate-eclipser ./FromEclipser --timeout 30 --output_test_dir eclipser-FromEclipser

Eclipser doesn’t need the full 30 seconds; it produces a crashing input almost immediately, saving it in eclipser-FromEclipser/crash-0. The other fuzzers we tried, AFL and libFuzzer, fail to find a crashing input even if given four hours to generate tests. They generate and execute, respectively, tens and hundreds of millions of inputs, but none that satisfy the conditions to produce a crash. Even using libFuzzer’s value profiles does not help.

Running the experiments yourself is easy:

$ mkdir foo; echo foo > foo/foo
$ afl-fuzz -i foo -o afl-FromEclipser -- ./FromEclipser_AFL --input_test_file @@ --no_fork --abort_on_fail

and

$ mkdir libFuzzer-FromEclipser
$ export LIBFUZZER_EXIT_ON_FAIL=TRUE
$ ./FromEclipser_LF libFuzzer-FromEclipser -use_value_profile=1

You’ll want to interrupt both of these runs, when you get tired of waiting.

Both angr and Manticore find this crashing input in a few seconds. The difference is that while Eclipser is able to handle this toy example as well as a binary analysis tool, the binary analysis tools fail to scale to complex problems like testing an ext3-like file system, testing Google’s leveldb, or code requiring longer tests to hit interesting behavior, like a red-black-tree implementation. Eclipser is exciting because it outperforms libFuzzer on both the file system and the red-black-tree, but can still solve “you need symbolic execution” problems like FromEclipser.cpp.

Behind the Scenes: Adding Eclipser Support to DeepState

As noted above, in principle there’s literally “nothing to” adding support for Eclipser, or most file-based fuzzers. DeepState makes it easy for a fuzzer that uses files as a way to communicate inputs to a program to generate values for parameterized unit tests. However, figuring out the right DeepState arguments to use with a given fuzzer can be difficult. At first we thought Eclipser wasn’t working because it doesn’t, if DeepState forks to run tests. Once we ran DeepState with no_fork, everything went smoothly. Part of our goal in producing front-ends like deepstate-eclipser is to make sure you never have to deal with such mysterious failures. The full code for setting up Eclipser runs, parsing command line options (translating DeepState tool argument conventions into Eclipser’s arguments), and getting Eclipser to produce standalone test files from the results takes only 57 lines of code. We’d love to see users submit more simple “front-ends” to other promising fuzzers that require a little extra setup to use with DeepState!

So, Is This the Best Fuzzer?

Will some advance in test generation technology, like Eclipser, obsolete DeepState’s goal of supporting many different back-ends? The answer is “not likely.” While Eclipser is exciting, our preliminary tests indicate that it performs slightly worse than everyone’s favorite workhorse fuzzer, AFL, on both the file system and red-black-tree. In fact, even with the small set of testing problems we’ve explored in some depth using DeepState, we see instances where Eclipser performs best, instances where libFuzzer performs best, and instances where AFL performs best. Some bugs in the red black tree required a specialized symbolic execution test harness to find (and Eclipser doesn’t help, we found out). Moreover, even when one fuzzer performs best overall for an example, it may not be best at finding some particular bug for that example.

The research literature and practical wisdom of fuzzer use repeatedly show that, even when a fuzzer is good enough to “beat” other fuzzers (and thus get a paper published at ICSE), it will always have instances where it performs worse than an “old,” “outdated” fuzzer. In fuzzing, diversity is not just helpful, it’s essential, if you really want the best chance to find every last bug. No fuzzer will be best for all programs under test, or for all bugs in a given real-world program.

The authors of the Eclipser paper recognize this, and note that their technique is complimentary to that used in the Angora fuzzer. Angora shares some of Eclipser’s goals, but relies on metaheuristics about branch distances, rather than approximate path conditions, and uses fine-grained taint analysis to penetrate some branches Eclipser cannot handle. Angora also requires source code. One big advantage of Eclipser is that unlike AFL (in non-QEMU mode) or libFuzzer, it doesn’t require you to rebuild any libraries you want to test with DeepState with additional instrumentation. At the time the Eclipser paper was written, Angora was not available to compare with, but it was recently released and is another good candidate for full integration with DeepState.

Eclipser is a great horse to add to your fuzzer stable, but it won’t win every race. As new and exciting fuzzers emerge, DeepState’s ability to support many fuzzers will only become more important. Using a diverse array of fuzzers is easy if it’s a matter of changing a variable and doing FUZZER=FOO make; deepstate-foo ./myprogram, and practically impossible if it requires rewriting your tests for every tool. In the near future, we plan to make life even easier, and support an automated ensemble mode where DeepState makes use of multiple fuzzers to test your code even more aggressively, without any effort on your part other than deciding how many cores you want to use.


Viewing all articles
Browse latest Browse all 10

Trending Articles