Each year, Trail of Bits runs a month-long winter internship “winternship” program. This year we were happy to host 4 winterns who contributed to 3 projects. This is the first in a series of blog posts covering the 2019 Wintern class.
Our first report is from Vaibhav Sharma (@vbsharma), a PhD student at the University of Minnesota. Vaibhav’s research focuses on improving symbolic executors and he took a crack at introducing a new optimization to Manticore:
Symbolic Path Merging in Manticore
My project was about investigating the use of path-merging techniques in Manticore, a symbolic execution engine that supports symbolic exploration of binaries compiled for X86, X64, and Ethereum platforms. A significant barrier for symbolic exploration of many practical programs is path explosion. As a symbolic executor explores a program, it encounters branch instructions with two feasible sides. The symbolic executor needs to explore both sides of the branch instruction. Manticore explores such branch instructions by forking the path that reached the branch instruction into two paths each of which explores a feasible side. A linear increase in the number of branch instructions with both sides feasible causes an exponential increase in the number of paths Manticore needs to explore through the program. If we hit enough of these branch conditions, Manticore may never finish exploring all the states.
Path merging reduces the number of paths to be explored. The central idea is to merge paths at the same program location that are similar. Manticore uses the notion of a “state” object to capture the processor, memory, and file system information into a single data structure at every point of symbolic exploration through a program. Hence, path merging can be specialized to “state merging” in Manticore where merging similar states that are at the same program location leads to an exponential reduction in the number of paths to explore. With a simple program, I observed Manticore could cut its number of explored execution paths by 33% if it merged similar states at the same program location.
State merging can be implemented statically or dynamically. Static state merging explores the control-flow graph of the subject program in topological order and merges states at the same program location when possible. Veritesting is a path-merging technique that is similar to static state merging, it requires paths to be at same program location to merge them. Dynamic state merging does not require two states to be at the same program location for them to be considered for merging. Given two states a1, a2 at different program locations l1, l2 respectively, if a transitive successor a1′ of a1 has a high and beneficial similarity to a2, dynamic state merging fast-forwards a1 to a1′ and merges it with a2. The fast-forwarding involves overriding the symbolic executor’s search heuristic to reach l2. Dynamic state merging uses the intuition that if two states are similar, their successors within a few steps are also likely to be similar.
While it is possible to implement either state merging technique in Manticore, I chose dynamic state merging as described by Kuznetsov et al. as a better fit for Manticore’s use of state-based instead of path-based symbolic executors. Also, static state merging is less suited for symbolic exploration guided towards a goal and more suited for exhaustive exploration of a subject program. Since static state merging can only merge states at the same program location, when directed towards a goal it tends to cover less code than dynamic state merging in the same time budget. This was also a conclusion of Kuznetsov et al (see Figure 8 from their paper below). Since we often tend to use symbolic execution to reach an exploration goal, static state merging is less suited to our needs.
Engineering Challenges
Both static and dynamic state merging require the use of an external static analysis tool like Binary Ninja to find the topological ordering of program locations. Given the short duration of my winternship, I chose to implement opportunistic state merging which only merges states that happen to be at the same program location. While this approach does not give the full benefit of dynamic state merging, it is easier to implement because it does not rely on integration with an external static analysis tool to obtain topological ordering. This approach is also easily extensible to dynamic state merging since it uses many of the same primitive operations like state comparison and state merging.
Implementation
I implemented opportunistic state merging for Manticore. The implementation checks if two states at the same program location have semantically equivalent input, output socket buffers, memory, and system call traces in an “isMergeable” predicate. If this predicate is satisfied, the implementation merges CPU register values that are semantically inequivalent.
Results
I used a simple example where I could see two states saved in Manticore’s queue that are at the same program location making them good candidates to be merged. I present the partial CFG of this example program below.
The two basic blocks highlighted in red cause control flow to merge at the basic block highlighted in green. The first highlighted red block causes control flow to jump directly to the green block. The second highlighted red block moves a constant (0x4a12dd
) to the edi
register and then jumps to the green block. To explore this example, Manticore creates two states, one which explores the first red block and jumps to the green block, and another state that explores the second red block and jumps to the green block. Since the only difference between these two states which are at the same program location (the green block) is the value present in their edi
register, Manticore can merge these two states into a single state with the value for edi
set to be an if-then-else expression. This if-then-else expression will use the condition that chooses which side of the branch (jbe 0x40060d
) gets taken. If the condition is satisfied, the if-then-else expression will evaluate to the value that is present in edi
in the first red block. If the condition is not satisfied, it will evaluate to 0x4a12dd
(the constant set in the second red block). Thus, Manticore merges two control flow paths into one path opportunistically which eventually leads to Manticore cutting its number of execution paths by 33% on the binary compiled with the -Os
optimization option with gcc and by 20% if the binary is compiled with the -O3
optimization option.
Directions for future improvement:
- This implementation can be extended to get the full benefits of dynamic state merging by integrating Manticore with a tool that can provide a topological ordering of program locations.
- State merging always creates new symbolic data since it converts all concrete writes in a region of code to symbolic writes.
- Check if new symbolic data introduced by state merging causes more branching later during exploration. We need to implement re-interpret heuristics such as query count estimation by Kuznetsov et al. so that we may use dynamic state merging only when it is most useful.
Path merging is a technique that needs to be re-interpreted to fit the needs of a symbolic executor. This winternship allowed me to understand the inner workings of Manticore, a state-based symbolic executor, and re-interpret path merging to better fit the use-case of binary symbolic execution with Manticore. My implementation of opportunistic state merging merges similar states if they are at the same program location. The implementation can be used in a Python script by registering a plugin called Merger with Manticore. basic_statemerging.py under examples/script is an example of such use of state merging.