Symbolic Execution Debugger (SED) - Tutorial

The Symbolic Execution Debugger (SED) is a language independent extension of the Eclipse debug platform for symbolic execution. By default SED comes with a symbolic execution engine based on KeY supporting sequential Java annotated with JML specifications. The use of the SED platform using KeY as symbolic execution engine is explained in detail in this tutorial.

The tutorial is structured as follows:

  1. Installation
  2. Example Project
  3. Symbolic Execution and Basic Tool Usage
  4. Locate the Origin of a Bug
  5. Review Code and Specifications Usage
  6. KeY as Symbolic Execution Engine

Steps to perform are illustrated via screenshots (may differ from the latest release) and are surrounded by paragraphs providing details and background knowledge. For new users we recommend to follow all sections of this tutorial in the given order.

1 Installation

This section describes how to add the Symbolic Execution Debugger (SED) to an existing Eclipse installation. Eclipse Indigo (3.7) and newer are supported. The screenshots are created using Eclipse Mars (4.5).

2 Example Project

This sections explains how to create the "SED Examples", an example Java project to try out SED (with KeY as symbolic execution engine). The following sections of this tutorial work with this project. Its content is organized as follows:

  • src - The source code used to explain the functionality of the SED
  • lib_specs - Stubs for Java API classes and methods referenced by the source code (required for symbolic execution with KeY). This folder is added to KeY's class path in the project properties page "KeY". For more details see section KeY as symbolic execution engine.
  • Readme.txt - A file explaining the SED Examples

3 Symbolic Execution and Basic Tool Usage

We use class Number as a first example to explain symbolic execution and the basic functionality of SED (using KeY as symbolic execution engine). It has an instance field content and method equals, which compares the current Number instance with the given one.

Symbolic execution means to execute a program with symbolic values in lieu of concrete values performed by a symbolic execution engine. Consequently, it is usually not possible to decide which execution path to take. Hence, execution splits into multiple branches to cover all feasible execution paths. The result is a so called symbolic execution tree.

The SED allows to start symbolic execution directly at any method or even to execute only selected statements.

View "Symbolic Debug" provides all views relevant for symbolic execution:

  • Debug - Allows to switch between debug sessions and to control symbolic execution
  • Symbolic Execution Tree - Visualizes the symbolic execution trees of selected debug sessions
  • Symbolic Execution Tree (Thumbnail) - Miniature diagram for navigation purpose
  • Variables - Shows the symbolic state of the selected node in the symbolic execution tree
  • Breakpoints - Allows to manage breakpoints
  • Symbolic Execution Settings - Allows to customize the symbolic execution engine
  • Properties - Provides additional details about the selected node
The selection of view "Debug" is always in sync with the one of view "Symbolic Execution Tree". The corresponding part in the source code is highlighted as well.

Symbolic execution can be done stepwise or resumed until a breakpoint is hit. If no breakpoint is available or hit, symbolic execution continues until a maximal number of steps has been performed on each path. The limit is 1000 by default and can be changed on the preference page "Run/Debug, Symbolic Execution Debugger (SED), KeY".

Different node kinds are used to highlight the meaning of symbolic execution tree nodes. The root of a symbolic execution tree is always a start node representing the program fragment to execute.

The new selected node is a method call node indicating that method equals will be inline executed next.

The selected branch statement node represents the program state before the first statement of method equals is executed. The SED separates between branch, loop and normal statements not supposed to branch.

Inspecting the state of the branch statement node in view "Variables" shows that at this point nothing is known about parameter n. Thus it can be null or a reference to an existing object. If multiple values are possible the SED shows the condition under which each value would be taken. The Java this reference is here named self. The instance field content of self and n (if not null) have symbolic values representing any number. The exc variable is used by KeY to separate between normal and exceptional termination.

It depends on the actual state how the if guard is evaluated. The symbolic state does not restrict any of the possible outcomes. Consequently, symbolic execution splits into several branches to capture all feasible execution paths. Here, a first case distinction has to be made whether parameter n is null or not. Branch condition nodes show the condition under which each path is taken. The right most branch where parameter n is null terminates with an uncaught NullPointerException as shown by the exceptional termination node. The left branch splits again to distinguish the cases where the content field of both instance has the same value or different values.

Symbolic execution is always continued on all not terminated child branches of the selected node. Here, two method return nodes that show the return value of method equals have been added. If multiple return values are possible, all of them are shown together with the condition when each is returned.

The left most branch terminates normally without a thrown exception.

Resume continues execution until a breakpoint is hit on any branch or until the maximal number of performed steps per path is reached. Here, the last branch terminates also normally without a thrown exception.

SED supports the set of breakpoints provided by JDT (except class breakpoints) and extends them by the "KeY Watchpoint", which is a line independent state definition. In classical debugging using concrete execution, the user is responsible to start the program under inspection in an initial state, which leads to the state hitting a given breakpoint. Using symbolic execution all execution paths are explored simultaneously, in particular, execution paths which lead to a symbolic state hitting the breakpoint. No input values need to be provided!

Inspecting the symbolic state in view "Variables" shows that instance field content of self and n has the same symbolic value. This knowledge was discovered during symbolic execution by the branch conditions. The path condition limiting the initial state in order to take the path is shown in view "Properties". It is a conjunction over all branch conditions on that path.

Here, the path condition is satisfied, if self and n are different objects with the same content. But it is also true, if self and n point to the same object. Unintended aliasing is a source of bugs. SED helps to find these source of bugs by determining and visualizing all possible memory layouts w.r.t. the path condition.

The SED illustrates a memory layout as symbolic object diagram. Similar to a UML object diagram, a symbolic object diagram shows dependencies between objects, the values of object fields and the local variables of the current state.

Here, local variables self and n point to two different objects. The value of instance field content is the same for both objects.

In the shown alternative memory layout the variables self and n are aliases, i.e., they point to the same object.

Selecting initial state shows how the memory layout must have looked before the execution in order to end up in the current state. Here, the memory layouts for the initial and the current state are identical, because the executed code is side effect free, i.e. no location has been changed.

4 Locate the Origin of a Bug

The second example shows how SED can be used to find the origin of a bug. Class Mergesort (modified version of the algorithm presented at javabeginners.de) provides a buggy implementation of the Mergesort algorithm. For our scenario, assume that we experienced a StackOverflowError, while running a large application. It was possible to reproduce the error with a few steps implemented by class MergesortExample.

The stack trace shows that method sortRange calls itself until the stack is full. Leveraging that using SED a debug session can start close to the suspected location of the bug, namely, at method sort.

Symbolic execution starts in an arbitrary state. The discovered NullPointerException is spurious as method parameter as intArr should not be null. Such knowledge can be provided as precondition in the debug configuration to restrict the initial state.

The explicit rendering of different control flow branches in the SE tree constitutes a major advantage over traditional debuggers. Unexpected or missing expected branches are good candidates for possible sources of bugs.

Here, we would actually expect that the branch statement node splits symbolic execution and are surprised that this is not the case. This is suspicious and deserves closer attention. Inspecting the if guard shows that the comparison should have been l < r and the source of the bug is found. Without the precondition the bug can be observed as well, but a little later.

5 Review Code and Specifications Usage

The third example shows how specifications can be considered during symbolic execution. By default, method calls are treated by inlining possible method implementations and loops are unrolled. This causes in general symbolic execution trees of infinite depth. The SED continues symbolic execution until the maximal number of applied steps per path is reached.

Symbolic execution trees of finite depth can be guaranteed by applying method contracts and loop invariants instead of inlining and unrolling. The use of method contracts allows also to deal with methods for which no implementation is available.

Method indexOf of class ArrayUtil returns the first index in the given array accepted by the filter. For interface Filter is no implementation available, but its method accept is specified with JML. Its contract allows to call the method in any state that satisfies the precondition (here: true), and implicitly, in which the invariant (a property limiting states of an object) of the Filter implementation is satisfied. The contract guarantees then that the method terminates without throwing an exception or changing any location.

The method contract of indexOf requires that the method is called in a state where the invariant of filter is fulfilled and ensures that no exception will be thrown. Method indexOf invokes the accept method on the instance given as filter, using contracts KeY needs to ensure that at invocation time the method's precondition and the invariant of the filter object are satisfied. To ensure this, the invariant for filter must already be required to hold when invoking method indexOf. Consequently, \invariant_for(filter) must be added as part of the precondition of method indexOf. Note, in general this has to be done for all object parameters and fields on which a method is called.

For the while loop a loop invariant is provided, which ensures termination and limits the range of local variable i. A loop invariant is a property which has to be preserved by each loop iteration. Thus, if the loop invariant is satisfied before the loop is entered, it is also satisfied after an arbitrary number of loop iterations when execution is continued after the loop. Notice that the provided loop invariant is not strong enough for verification, but sufficient for debugging.

Debugging when providing a method contract differs from debugging a method or some statements without a contract for two reasons: First, the precondition of the method contract is used to limit the initial state. This includes also that JML defaults will be used. In particular, fields are considered as not null as long as not explicitly specified as nullable. Second, the postcondition is proven for the termination nodes. If not satisfied, the termination node indicates this by a red crossed icon.

To apply method contracts and loop invariants during symbolic execution, the symbolic execution engine has to be configured before performing a step or resuming execution.

That a loop is treated by applying a loop invariant instead of unrolling it is shown by a loop invariant node. If the loop invariant is initially not valid the node will indicate it. Applying a loop invariant splits symbolic execution into two branches.

The "Body Preserves Invariant" branch represents an arbitrary loop iteration and stops when the method body is completely executed (shown by a Loop Body Termination node). Here, we can see that the left most branch does not preserve the loop invariant. The reason is that i is not increased which is required by the decreasing clause of the loop invariant. The bug can be solved by moving the statement i++ after the if statement. Note that in case of a thrown exception or a jump outside the loop (break, continue or return) the loop invariant does not have to hold and execution is continued outside the loop.

The "Use Case" branch continues execution after the loop is exited. Even if the loop invariant is not preserved, we can see in the method return nodes that the value of i is returned instead of the found index.

6 KeY as Symbolic Execution Engine

KeY is a semi-automatic verification tool that proves the correctness of sequential Java programs (without garbage collection, dynamic class loading and floats) annotated with JML specifications. A symbolic execution engine based on KeY is developed as part of the SED supporting the same set of Java and JML features than KeY.

KeY and its symbolic execution engine operates on source code and not on compiled byte code. This requires to provide the source code of all used types or at least stubs of called methods. Method stubs can be specified with JML. KeY itself provides such stubs and specifications only for a very small part of the Java API.

The SED Examples make use of Java API classes like System or PrintStream, which are not contained in the stubs provided by KeY. Thus it is required to provide the necessary method stubs and to place them in the folder "lib_specs". A folder containing such stubs should not be part of the class path used by Eclipse's JDT. Instead it has to be added to KeY's class path defined in the project properties on page "KeY". For the SED examples this is already done automatically by the wizard (see Example Project).

KeY separates between two different kinds of class paths:

  • Boot class path - A single path providing source code and specifications of classes. Full method implementations will be considered. If not specified, the minimal fraction of KeY's specifications for Java API classes will be used.
  • Class path - Multiple paths providing method stubs and specifications. Only method specifications will be considered and not method implementations.

The following steps explain how to check and modify KeY's class path:

KeY provides some general settings, so called "Taclet Options", which influence symbolic execution and proofs performed by KeY in general. Of special interest for symbolic execution are:

  • JavaCard - Enable or disable support of JavaCard
  • assertions - Treatment of Java assert statements
  • bigint - Support for JML datatype \bigint
  • intRules - Used integer semantic (mathematical (default) or overflow sensitive)
  • runtimeExceptions - How to deal with runtime exceptions
Taclet options have to be changed before a debug session is started. However, the default settings should be sufficient in most cases and most likely their is no need to change them.

The following steps explain how to change the "Taclet Options":

Webmaster
02-Sep-2024