CS计算机代考程序代写 python x86 c/c++ compiler c++ Fortran Haskell arm algorithm interpreter Building Program Reasoning Tools using LLVM and Z3

Building Program Reasoning Tools using LLVM and Z3
Elizabeth Dinella, Pardis Pashakhanloo,
Anthony Canino, Mayur Naik

POPL’20 Tutorial – January 20, 2020
‹#›

Tutorial Overview
Learn to build program reasoning tools using LLVM and Z3 via three examples:
A Static Dataflow Analyzer
A Symbolic Execution Engine
A Static Assertion Verifier
These examples capture the core functionality of corresponding full-fledged tools Clang Static Analyzer, KLEE, and Seahorn.
Primary focus: how to encode diverse program reasoning tasks via constraints extracted using LLVM and solved using Z3; we will NOT delve into Z3 internals!
‹#›

Tooling for Program Reasoning
LLVM Intermediate Representation
Encoding Program Facts using Relations
Using Z3 via C++ API Bindings
Solving Fixpoint Constraints using Z3
Principles of Program Reasoning
Static Dataflow Analysis
Specifying Fixpoint Constraints in Datalog
Dynamic Symbolic Execution
Verification Condition Generation
Constrained Horn Clauses

Learning Objectives
‹#›

Getting Started
Follow step-by-step instruction guide from the tutorial website:
docker pull petablox/popl2020-tutorial
docker run -it –name popl2020 petablox/popl2020-tutorial
git pull origin master
wget http://rightingcode.org/tutorials/popl20/popl2020-soln.zip
Prepare to use your favorite editor!
Pre-installed: vim, emacs
VSCode: install the docker plugin from https://code.visualstudio.com/docs/azure/docker
Github
git clone https://github.com/petablox/popl2020-tutorial

‹#›

LLVM Architecture
Clang C/C++/ObjC Frontend
llvm-gcc Frontend
GHC Frontend
LLVM Optimizer
LLVM X86 Backend
PowerPC Backend
LLVM ARM Backend
C
Fortran
Haskell
X86
PowerPC
ARM
LLVM IR
LLVM IR
‹#›

LLVM Architecture
Clang C/C++/ObjC Frontend
llvm-gcc Frontend
GHC Frontend
LLVM Optimizer
LLVM X86 Backend
PowerPC Backend
LLVM ARM Backend
C
Fortran
Haskell
X86
PowerPC
ARM
LLVM IR
LLVM IR
Tools built as
Optimization passes

‹#›

LLVM IR
int x = 10
int y;
if (x > 5)
y = 0;
else
y = 1;
return y;
‹#›

LLVM IR
int x = 10
int y;
if (x > 5)
y = 0;
else
y = 1;
return y;
%x = alloca
store 10 %x
%y = alloca
%0 = load %x
%cmp = icmp sgt %0 5
br %cmp %then %else
then:
store 0 %y
br %end
else:
store 1 %y
br %end
end:
%1 = load %y
ret %1

‹#›

store 10 %x
Value Operand
Pointer Operand
‹#›

‹#›

%0 = load %x
Instruction Result
Value Operand
‹#›

%cmp = icmp sgt %0 0
Predicate
Operand 1
Operand 2
‹#›

br %cmp %then %else
Condition
True Branch
False Branch
‹#›

‹#›

LLVM Pass Structure
bool runOnModule(Module &M) {
for (auto FuncItr = M.begin(); FuncItr != M.end(); FuncItr++) {
for (auto InstItr = FuncItr->begin(); InstItr != FuncItr->end(); InstItr++) {
Instruction *I = &*InstItr;
if (StoreInst *SI = dyn_cast(I)) {
cout << “Storing ” << SI->getValueOperand() << “ to “ << SI->getPointerOperand() << endl; } else if (LoadInst *LI = dyn_cast(I)) {
cout << “Loading” << LI->getPointerOperand() << “ to “ << LI << endl; } } } } ‹#› Illustrate how LI itself holds the value that will get loaded Talk about dyn_cast Z3 Architecture Solvers SMT Fixedpoint NLSat SAT QSAT Tactics Preprocessing Cube & Conquer Tacticals C++ Bindings C API SMT2LIB Optimization Python Bindings ‹#› Part 1: Static Dataflow Analysis ‹#› In this section we will use static analysis (don’t run the code) to infer program properties using sets of rules or CONSTRAINTS Static Dataflow Analysis Static analysis reasoning about the flow of data in program Different kinds of data: constants, variables, expressions, ... We will use a dominant approach called constraint-based analysis, which has two parts: Constraint generation: program text → LLVM → constraints Constraint resolution: constraints → Z3 → analysis result ‹#› TRUE or FALSE? The assignment y := 1 reaches P1 TRUE The assignment y := 1 reaches P2 FALSE The assignment y := x*y reaches P1 TRUE Reaching Definition Analysis A definition d reaches a point p if there exists a path from d to p such that d is not overwritten along that path x := y y := 1 while ( x != 1 ) P1 y := x * y x := x - 1 P2 ‹#› Reaching Definition Analysis x := 1 P1 y := x P2 if ( x > 0 ) P3
x := y P4

P2: { , }

P3: { , }

before P4: { , }

after P4: { , }

IN(n) = set of facts at the entry of node n
OUT(n) = set of facts at the exit of node n
‹#›

At each point in the program there is a set of definitions that reach that program point

What definitions reach just before P2?
What definitions reach P3?
What definitions reach P4? Just after P4?

A transfer function abstracts execution: OUT(n) = f( IN(n) )
Simulate execution in the abstract domain until we reach a fixed point

Reaching Definition Analysis
Relations:
Def(X,Y): Variable X is defined at instruction Y
Use(X,Y): Variable X is used at instruction Y
Next(X,Y): Instruction Y is an immediate successor of instruction X
Gen(X,Y): Definition Y is generated by instruction X
Kill(X,Y): Definition X is killed by instruction Y

‹#›

Reaching Definition Analysis – Example

n IN(n) OUT(n)
1 — {,}
2 {,}
3
4
5
6
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} GEN(X,2)
3
4
5
6
7

Definition X is generated by node 2
‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,}
4
5
6
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} GEN(Y,3)
4
5
6
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6)
5
6
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6)
6
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) KILL(Y,5)
6
7

Definition Y is killed by node 5
‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) GEN(Y,5)
6
7

Definition Y is generated by node 5
‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) {,}
and OUT(6)
6 {,} and OUT(6)
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) {,}
and OUT(6)
6 {,} and OUT(6) KILL(X,6) and GEN(X,6)
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) {,}
and OUT(6)
6 {,} and OUT(6) {,}
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,}
and OUT(6) {,}
and OUT(6)
5 {,}
and OUT(6) {,}
and OUT(6)
6 {,} and OUT(6) {,}
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,,,} {,,,}
5 {,,,} {,,}
6 {,, } {,}
7

‹#›

Reaching Definition Analysis – Example
n IN(n) OUT(n)
1 — {,}
2 {,} {,}
3 {,} {,}
4 {,,,} {,,,}
5 {,,,} {,,}
6 {,,} {,}
7 {,,,} —

‹#›

Continue until we reach a fixed point

Defining Rules in Datalog
Declarative logic programming language
Specify “If … then … “ logic

0
1
2
3
4
Input Relations:
edge(x, y)

Output Relations:
path(x,y)

Rules:
path(x, x)
path(x, z) :- path(x, y), edge(y, z)
Is there a path from node x to node y?
If there is a path from x to y, and there is an edge from y to z, then there is a path from x to z.
‹#›

Rules:
Kill(I, J) :- Definition I is killed by instruction J
Instruction I
Instruction J
x := y
…

x := 15 + z
‹#›

Rules:
Kill(Y, Z) :- Def(X, Y) & Def(X, Z)

Instruction X
Instruction Y
x := y
…

x := 15 + z
‹#›

Rules:
Kill(Y, Z) :- Def(X, Y) & Def(X, Z)
Out(X, Y) :- Definition Y may reach the program point just after instruction X
‹#›

Rules:
Kill(Y, Z) :- Def(X, Y) & Def(X, Z)
Out(X, Y) :- Gen(X, Y)
Out(X, Y) :- In(X, Y) & !Kill(X,Y)

‹#›

Rules:
Kill(Y, Z) :- Def(X, Y) & Def(X, Z)
Out(X, Y) :- Gen(X, Y)
Out(X, Y) :- In(X, Y) & !Kill(X,Y)
In(X, Y) :- Definition Y may reach the program point just before instruction X

‹#›

Rules:
Kill(Y, Z) :- Def(X, Y) & Def(X, Z)
Out(X, Y) :- Gen(X, Y)
Out(X, Y) :- In(X, Y) & !Kill(X,Y)
In(X, Y) :- Out(Z,Y) & Next(Z, X)

‹#›

Taint Analysis
Any variable that can be set or modified by a user poses a potential
security risk
We call these variables tainted

The tainted value reaches y and z which are now also tainted
x := input()
y := x
z := 4 / y
‹#›

Taint Analysis
Any variable that can be set or modified by a user poses a potential
security risk
We call these variables tainted

This poses a security risk as attackers could intentionally set x to zero to cause an exploitable divide by zero bug
x := input()
y := x
z := 4 / y
‹#›

x := tainted_input()
y := sanitizer(x)
z := 4 / y

y := not_sanitizer(x)
z := 4 / y
Taint Analysis – is there an exploitable bug?
Sanitizes the tainted input
SAFE
ALARM
ALARM
if (argc > 2):
x := 0
y := x
else:
x := tainted_input()
y := x

z := 4 / y
‹#›

Taint Analysis Rules
Relations:
Def(X,Y): Variable X is defined at instruction Y
Use(X,Y): Variable X is used at instruction Y
In(X,Y): Instruction Y may reach the program point just before instruction X
Taint(X): There exists a function call at instruction X that reads a tainted input
Sanitizer(X): There exists a function call at instruction X that sanitizes a tainted input
Div(X,Y): There exists a division operation at instruction Y whose divisor is variable X

Rules:
Edge(I, J) :- There exists an immediate data-flow from instruction I to J
x := tainted_input()
y := x
‹#›

Rules:
Edge(Y, Z) :- Def(X, Y) & Use(X, Z) & In(Z,Y)
Instruction Y
Instruction Z

x := tainted_input()
y := x
‹#›

Rules:
Edge(Y, Z) :- Def(X, Y) & Use(X, Z) & In(Z,Y)
Path(I, J) :- There exists a transitive tainted data-flow from instruction I to J
‹#›

Rules:
Edge(Y, Z) :- Def(X, Y) & Use(X, Z) & In(Z,Y)
Path(I, J) :- Edge(I,J) & Taint(X)

‹#›

Rules:
Edge(Y, Z) :- Def(X, Y) & Use(X, Z) & In(Z,Y)
Path(I, J) :- Edge(I,J) & Taint(I)
Path(I, K) :- Path(I,J) & Edge(J,K) & !Sanitizer(J)
Instruction I
Instruction J
Instruction K

x := tainted_input()
y := x
z := y + 15
‹#›

Rules:
Edge(Y, Z) :- Def(X, Y) & Use(X, Z) & In(Z,Y)
Path(I, J) :- Edge(I,J) & Taint(I)
Path(I, K) :- Path(I,J) & Edge(J,K) & !Sanitizer(J)
Alarm( I ) :- There exists a potential exploitable divide-by-zero error at instruction I
ALARM
x := input()
y := x
z := 4 / y
‹#›

Rules:
Edge(I, J) :- Def(X, I) & Use(X, J) & In(J, I)
Path(I, J) :- Edge(I,J) & Taint(I)
Path(I, K) :- Path(I,J) & Edge(J,K) & !Sanitizer(J)
Alarm( K ) :- Path(I, J) & Div(K,Y)
ALARM

x := input()
y := x
z := 4 / y
‹#›

For rule #4 Path(I,J) implies that Y is a tainted variable because in LLVM the instructions are referenced by the var names

Goal:
Implement a constraint-based static dataflow analysis to
find exploitable divide-by-zero errors.
Steps:
Write Datalog rules to define the reaching and taint analyses
Extract Datalog facts from LLVM IR Instructions
Feed constraints to Z3
‹#›

Now, we’ve talked about how to write data log rules and how to extract Datalog facts from the program, but what happens when we feed them to Z3

x := input()
y := x
z := 4 / y

Edge(“%call = call i32 (…) @tainted_input()”, “store i32 %call, i32* %x, align 4”)
Edge(“store i32 %call, i32* %x, align 4”, “%0 = load i32, i32* %x, align 4”)
Edge(“%0 = load i32, i32* %x, align 4”, “store i32 %0, i32* %y, align 4”)
Edge(“store i32 %0, i32* %y, align 4”, “%1 = load i32, i32* %y, align 4”)
Edge(“%1 = load i32, i32* %y, align 4”, “%div = sdiv i32 4, %1”)
….

Path(“%call = call i32 (…) @tainted_input()”, “%div = sdiv i32 4, %1”)

….

Query Z3 is there an alarm?
Alarm(“%div = sdiv i32 4, %1”)

‹#›

We build up all of these facts and then for each instruction query the solver and ask can Alarm(I) be satisfiable? Based on the rules we have already built up

Step 1. Writing Analysis Rules
Write the rules we just defined in the initialize function in Extractor.cpp
z3::expr X = C.bv_const(“X”, 32);
z3::expr Y = C.bv_const(“Y”, 32);
z3::expr Z = C.bv_const(“Z”, 32);
z3::expr kill_rule = z3::forall(X, Y, Z, z3::implies(Def(X, Y) && Def(X, Z), Kill(Y, Z)));

If X is defined at instruction Y and X is defined at instruction Z, then the definition at instruction Y is killed by the definition at instruction Z
Solver->add_rule(kill_rule, C.str_symbol(“kill_rule”));
Add the kill_rule to the Z3 solver
‹#›

Step 1. Writing Analysis Rules
Do this for all of the rules
z3::expr kill_rule = z3::forall(X, Y, Z, z3::implies(Def(X, Y) && Def(X, Z), Kill(Y, Z)));
Z3::expr out_rule1 = …
Z3::expr out_rule2 = …
Z3::expr in_rule = …

z3::expr edge_rule = …
z3::expr path_rule1 = …
z3::expr path_rule2 = …
z3::expr alarm_rule = …

And add them to the Z3 solver!
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

we provide these functions
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

we provide these functions
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

InstMap encodes each LLVM instruction as an integer
initialized in the main function

we provide these functions
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

Definition SI is generated by instruction SI
InstMap encodes each LLVM instruction as an integer
initialized in the main function

we provide these functions
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

Variable To is defined at instruction SI
InstMap encodes each LLVM instruction as an integer
initialized in the main function

we provide these functions
‹#›

Step 2. Extract Datalog Facts from LLVM IR
Write the extractConstraints function in Extractor.cpp

if (StoreInst *SI = dyn_cast(I)) {
Value *From = SI->getValueOperand();
Value *To = SI->getPointerOperand();
addGen(InstMap, SI, SI);
addDef(InstMap, To, SI);
addUse(InstMap, From, SI);

}

int x = 0;

x = alloca i32
store i32 0, i32* x

we provide these functions
InstMap encodes each LLVM instruction as an integer
initialized in the main function

Variable From is used at instruction SI
‹#›

Building and Running the Analysis

$ export LD_LIBRARY_PATH = ex1/build
$ cd ex1/test
$ clang -emit-llvm -S -fno-discard-value-names -c loop0.c
$ ../build/constraint loop0.ll

Output:
Potential divide-by-zero points:
%div = sdiv i32 4, %3

‹#›

Part 2: Symbolic Execution Engine
‹#›

Symbolic Execution

Automated testing technique that symbolically executes a program.
Use symbolic execution to explore all program paths to find latent bugs.

‹#›

Symbolic Execution
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 22
y = 7
x = x0
y = y0
‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 22
y = 7
z = 14
x = x0
y = y0
z = 2*y0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 22
y = 7
z = 14
x = x0
y = y0
z = 2*y0
2*y0 != x0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 22
y = 7
z = 14
x = x0
y = y0
z = 2*y0
2*y0 != x0
Solve: 2*y0 == x0

Solution: x0 = 2, y0 = 1
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 2
y = 1
x = x0
y = y0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 2
y = 1
z = 2
x = x0
y = y0
z = 2*y0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 2
y = 1
z = 2
x = x0
y = y0
z = 2*y0
2*y0 == x0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 2
y = 1
z = 2
x = x0
y = y0
z = 2*y0
2*y0 == x0
x0 <= y0 + 10 int foo(int v) { return 2*v; } void test_me(int x, int y) { int z = foo(y); if (z == x) if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 2
y = 1
z = 2
x = x0
y = y0
z = 2*y0
2*y0 == x0
x0 <= y0 + 10 Solve: 2*y0 == x0 and x0 > y0 + 10

Solution: x0 = 30, y0 = 15
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 30
y = 15
x = x0
y = y0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 30
y = 15
z = 30
x = x0
y = y0
z = 2*y0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 30
y = 15
z = 30
x = x0
y = y0
z = 2*y0
2*y0 == x0
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Symbolic Execution

Concrete Execution
Symbolic
Execution
concrete
state
symbolic
state
path
condition
x = 30
y = 15
z = 30
x = x0
y = y0
z = 2*y0
2*y0 == x0
x0 > y0 + 10
Program Error!
int foo(int v) {
return 2*v;
}
void test_me(int x, int y) {
int z = foo(y);
if (z == x)
if (x > y+10)
ERROR;
}

‹#›

Goal:
Implement a symbolic execution engine to find crashing inputs for C programs:
Implement symbolic execution and instrument each LLVM function
Implement a search strategy to explore program paths
‹#›

Exercise Overview
int main() {
int x; DSE_Input(x);
int y; DSE_Input(y);
int z; DSE_Input(z);

if (x == 0) {
if (y == z) {
x = x / (y-z);
}
}

return 0;
}

test/branch0.c
Creates symbolic input for DSE
Find x, y, and z that cause divide-by-zero
./dse ./branch0 N
Read input.txt for x, y, and z
Symbolically run
Done
Apply search strategy
Feed constraints to Z3
Crash?
Generate new input.txt
Yes
No
‹#›

Exercise Overview
DSE.cpp
Executes program and forward constraints to Z3
SymbolicInterpreter.{h,cpp}
Our dynamic symbolic execution engine
Runtime.{h,cpp}
Placeholders for LLVM IR transfer functions
Instrument.{h,cpp}
Scaffolding for LLVM pass to instrument dynamic symbolic execution engine code
Strategy.{h,cpp}
Placeholder for search strategy

For you to use
For you to modify
‹#›

Building and Running
Start Soln:
$ cd /popl2020-tutorial
$ unzip popl2020-soln.zip
$ cp soln/ex2/start/* ex2/src

Build:
$ cd ex2 && mkdir build && cd ex2/build
$ cmake ..
$ make

Run:
$ export LD_LIBRARY_PATH=/popl2020-tutorial/ex2/build
$ cd ex2/test
$ make
$ ../build/dse ./simple0 1

Output:
Crashing input found after N iterations

‹#›

int x = 1
int y = x;
%x = alloca
%y = alloca
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory

DSE API
‹#›

%x = alloca
%y = alloca
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
DSE API
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
Reg(%x) => 0x1000
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
Reg(%x) => 0x1000
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
Reg(%x) => 0x1000
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
Reg(%x) => 0x1000
Reg(%y) => 0x1004
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
DSE_Const(1)
DSE_Store(getRegisterId(%x))
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
DSE_Const(1)
DSE_Store(getRegisterId(%x))
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
Const(1)
‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
DSE_Const(1)
DSE_Store(getRegisterId(%x))
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1

‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
DSE_Const(1)
DSE_Store(getRegisterId(%x))
%a = load %x
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1

‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
store 1, %x
DSE_Const(1)
DSE_Store(getRegisterId(%x))
%a = load %x
DSE_Load(getRegisterId(%a), %x)
store %a, %y

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1

‹#›

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1
Reg(%a) => 1

‹#›

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
0x1004 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1
Reg(%a) => 1

‹#›

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
0x1004 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1
Reg(%a) => 1

‹#›

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
0x1004 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1
Reg(%a) => 1
Reg(%a)
‹#›

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
0x1004 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => 1
Reg(%a) => 1
0x1004 => 1

‹#›

DSE API
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
%y = alloca
DSE_Alloca(getRegisterId(%y),%y)
call void @DSE_Input(%x)
%a = load %x
DSE_Load(getRegisterId(%a), %x)
store %a, %y
DSE_Register(getRegisterId(%a))
DSE_Store(getRegisterId(%y))

DSE Stack
DSE Memory
Concrete
Memory
%x : 0x1000
%y : 0x1004
0x1000 : 1
%a : 1
0x1004 : 1
Reg(%x) => 0x1000
Reg(%y) => 0x1004
0x1000 => X0
Reg(%a) => X0
0x1004 => X0
int x; DSE_Input(x);
y = x;

‹#›

DSE API
DSE Memory
Reg(%x) => 0x1000
0x1000 => X0
Reg(%a) => X0
Reg(%b) => (+ X0 1)
%x = alloca
DSE_Alloca(getRegisterId(%x),%x)
call void @DSE_Input(%x)
%a = load %x
DSE_Load(getRegisterId(%a), %x)
%b = add %a, 1
…

Symbolic value
Z3 expr
int x; DSE_Input(x);
y = x + 1;
‹#›

Instrumentation machinery and transfer functions for alloca, store, and
load is implemented.

Attempt binary operators (BinaryOperator) and comparison (ICmpInst)
1. Implement symbolic execution and instrument each LLVM function
‹#›

Let’s look at the code

Instrumentation machinery and transfer functions for alloca, store, and
load is implemented.

Attempt binary operators (BinaryOperator) and comparison (ICmpInst)
1. Implement symbolic execution and instrument each LLVM function
‹#›

Let’s look at the code

Instrumentation machinery and transfer functions for alloca, store, and
load is implemented.

Attempt binary operators (BinaryOperator) and comparison (ICmpInst)

Attempt branch instruction (BranchInst)

1. Implement symbolic execution and instrument each LLVM function
‹#›

3. Implement search strategy
void generateInput() {
z3::expr_vector Vec = Ctx.parse_file(FormulaFile);

while (true) {
searchStrategy(Vec);

for (const auto &E : Vec) {
Solver.add(E);
}
z3::check_result Result = Solver.check();
if (Result == z3::sat) {
storeInput();
printNewPathCondition(Vec);
break;
} }

Vector of
path constraints
Define how to adjust paths
‹#›

Part 3: Static Assertion Verifier
‹#›

In this section,

Program Verification
Program verification aims to use formal proofs to demonstrate that
programs behave according to formal specifications.

‹#›

Program Verification
Program verification aims to use formal proofs to demonstrate that
programs behave according to formal specifications.

More concretely:
Program verification aims to prove assertions in a given input program.
‹#›

Program Verification: Building Blocks
Automated Theorem Prover
Verification Condition Generator

Compiler

‹#›

Program Verification: Building Blocks
Automated Theorem Prover
Verification Condition Generator

Compiler

LLVM
Z3
LLVM IR
CHC
‹#›

Program Verification: Building Blocks
Automated Theorem Prover
Verification Condition Generator

Compiler

LLVM
Z3
LLVM IR
CHC

‹#›

Constrained Horn Clause (CHC)
Constrained horn clause (CHC) is a fragment of first-order logic that is expressive enough to describe many program properties. It has the following general form:

P’s are predicates.
Φ is a constraint. A constraint is a logical formula in a background theory.

‹#›

First order theory: just a set of first-order sentences in a first-order language.

Example
In the following program, generate the verification conditions as CHCs.
int x = input();
assume(x > 1);
assert(x > 0);
return x;
‹#›

Let’s look at a C program verification example
And remind ourselves what each piece mean

Example
In the following program, generate the verification conditions as CHCs.
int x = input();
assume(x > 1);
assert(x > 0);
return x;
assume() imposes a constraint which will hold afterward.
‹#›

Example
int x = input();
assume(x > 1);
assert(x > 0);
return x;
int x = input();
assume(x > 1);
if (x > 0)
// continue normal execution
else
__assert_fail(…);
return x;
unfold
In the following program, generate the verification conditions as CHCs.
assert introduces a branch instruction
‹#›

First, note that assert() is just a macro that is unfolded as a if-then-else. The else-branch executes an assertion failure.
Assert: The assert function introduces a branch instruction such that if the condition holds, the program execution flows to the then-branch, otherwise the else-branch. Note that there is a function call to __assert_fail in the else-branch.

Example
int x = input();
assume(x > 1);
if (x > 0)
// continue normal execution
else
__assert_fail(…);
return x;
In the following program, generate the verification conditions as CHCs.
‹#›

Ok, so far we have this piece of C source code. Let’s compile it to LLVM IR since we want to encode it.

Example
int x = input();
assume(x > 1);
if (x > 0)
// continue normal execution
else
__assert_fail(…);
return x;
entry:
%call = @input()
%cmp = icmp sgt %call 1
%conv = zext %cmp
assume(%conv)
%cmp1 = icmp sgt %call 0
br %cmp1 %then %else
then:
br %end
else:
@__assert_fail(…)
end:
ret %call
LLVM IR
In the following program, generate the verification conditions as CHCs.
‹#›

so
there is no store and load
each variable (LLVM register) is assigned only once throughout its enclosing function
—
sgt: greater than
zext: zero extension

Overview of the Encoding Algorithm
For every CFG node, we find all the variables that are defined so far. These will be the parameters of the relations that correspond to each node.
Then, we will encode the instructions that are in the node if they are of these kinds: ICmp, BinaryOperator, Cast. Their conjunction is the encoding of the node.
Finally, for every CFG edge N1 → N2, the rule will have the following form:
Conjunction over the instructions of N1 N2
implies

‹#›

Number of parameters can be further optimized using a reach-def analysis.

In the following program, generate the verification conditions as CHCs.
else
end
‹#›

Now let’s continue with our example.
The parameters are not fully listed on the slide, but there are more.
—-
sgt: greater than
zext: zero extension

Example (continued)
entry:
%call = @input()
%cmp = icmp sgt %call 1
%conv = zext %cmp
assume(%conv)
%cmp1 = icmp sgt %call 0
br %cmp1 %then %else
then:
br %end
else:
@__assert_fail(…)
end:
ret %call
CHC
R2:
Note that rule R1 and R2 trigger different conclusions depending on the value of %call
In the following program, generate the verification conditions as CHCs.
entry
then
else
end

‹#›

Optimization: else does not really need any parameters.

In the following program, generate the verification conditions as CHCs.
entry
then
else
end
‹#›

“Then” node does not have any of the instructions of interest. so…

Once the VCs are established in form of CHCs, we query the reachability of else
In the following program, generate the verification conditions as CHCs.
‹#›

Once the VCs are established in form of CHC, we will query the reachability of else that contains __assert_fail.

Exercise Overview
/ex3/src/Utils.cpp
Implement IsAssume() and IsAssertFail()

/ex3/src/Verifier.cpp
Parses and manipulates files, calls Z3.

/ex3/src/Extractor.cpp
For you to modify
For you to use

‹#›

Goal
Implement an assertion verifier such that
all values are integers
there is no function call except for input, assume, and assert.
If the assertion in the input holds, print unsat. Otherwise, print sat.

CHC formulae will be written to formula.smt2 which is then solved by Z3.
‹#›

Unsat: assert fail is unreachable

Step 0: Compute the Set of Variables
void Extractor::initialize(Function &F) { … }
In this exercise, initialize() has been implemented to take care of finding the set of variables for each CFG node.
‹#›

You can later look at the code to see how it is done.
In the worst unoptimized case, we collect all the variables for all of the nodes.
But we can further improve this by only propagating variables from the predecessors until reaching a fixpoint.
One step further is to do a reaching definition analysis.

Step 1: Extract Instruction Constraints
void Extractor::extractConstraints(Instruction *I, z3::expr_vector &Assertions) {
…
if (CallInst *CI = dyn_cast(I)) { … }
else if (ICmpInst *II = dyn_cast(I)) { … }
else if (BinaryOperator *BO = dyn_cast(I)) { … }
else if (CastInst *CI = dyn_cast<>(I)) { … }
}
Encode instructions
‹#›

Step 1: Extract Instruction Constraints
if (ICmpInst *II = dyn_cast(I)) {
z3::expr X = eval(C, II);
z3::expr Y = eval(C, II->getOperand(0));
z3::expr Z = eval(C, II->getOperand(1));

switch (II->getPredicate()) {
case CmpInst::ICMP_EQ: Assertions.push_back(X == (Y == Z)); break;
case CmpInst::ICMP_NE: Assertions.push_back(X == (Y != Z)); break;
…

}
}
eval: given a Context and a LLVM Value, evaluate it as a Z3 expression depending on its type (const, int, bool, etc.)
‹#›

eval function: given the context and an llvm value, evaluates the values as z3 expressions depending on its type: constant, int, bool, etc.

Step 1: Extract Instruction Constraints
Your turn!

else if (ICmpInst *II = dyn_cast(I)) { … }
else if (BinaryOperator *BO = dyn_cast(I)) { … }
void Extractor::extractConstraints(Instruction *I, z3::expr_vector &Assertions) {

}
Complete the switch-statement for ICmpInst
Complete the BinaryOperator case
‹#›

Step 2: Extract Basic Block Constraints
void Extractor::extractConstraints(BasicBlock *BB) {
…
for (auto &I : *BB) { extractConstraints(&I, Assertions); }
z3::expr And = z3::mk_and(Assertions);

if (BranchInst *BI = dyn_cast(&BB->back())) { … }
}
Encode CFG nodes (i.e., basic blocks) as CHCs
Form the conjunction of the instruction of the node.

‹#›

What is actually important here is encoding the control flow. In other words, branches are encoded.
—
back() returns a reference to the last element in the vector.

Step 2: Extract Basic Block Constraints
if (BranchInst *BI = dyn_cast(&BB->back())) {
if (BI->isUnconditional()) {
BasicBlock *Succ = BI->getSuccessor(0);
z3::func_decl &SuccRel = BBRelations.at(Succ);
z3::expr_vector Propagation = transition(BB, Succ);
z3::expr SuccTuple = SuccRel(Propagation);

z3::expr R0
= z3::forall(allVariableVector, z3::implies(BBTuple && And, SuccTuple));

Solver->add_rule(R0, C.str_symbol(“”));
}
}
BB
Succ
Relation corresponding to BB
Conjunction of instructions of BB
Relation corresponding to Succ
‹#›

Unconditional : br
Is a jump.

z3::expr R0
= z3::forall(allVariableVector, z3::implies(BBTuple && And, SuccTuple));

Solver->add_rule(R0, C.str_symbol(“”));
}
}
BB
Succ
Finally, add the rule to the solver.
‹#›

Unconditional : br
Is a jump.
—
Finally add this rule to the solver

Your turn!

if (BI->isUnconditional()) {
// we completed together
} else {

// True branch
// False branch

}

Step 2: Extract Basic Block Constraints
Complete the conditional case.
What is different in this case?
‹#›

Unconditional : br
answer : && cond
&& !cond
will be explicitly in the encoding

Building and Running
Build:
$ cd ex3 && mkdir build && cd ex3/build
$ cmake ..
$ make

Run:
$ cd ex3/test
$ clang -emit-llvm -S -fno-discard-value-names -Xclang -disable-O0-optnone -c simple0.c
$ opt -mem2reg -S simple0.ll -o simple0.opt.ll
$ ../build/verifier simple0.opt.ll

Output:
unsat

‹#›

Related Posts