CIS 547: Software Analysis
Lab 1: LLVM and Analysis Tools (Summer 2021)
Synopsis
Understanding the LLVM IR and applying analysis tools AFL and IKOS to small C programs.
Objective
The objective of this lab is two-fold:
· Understanding the representation of C programs that we will use in all the labs. Called LLVM IR, it is the intermediate representation used by LLVM, a popular compiler framework for a variety of programming languages.
· Applying and comparing the results of two program analysis tools on small C programs. In particular, we will use the AFL dynamic analyzer and the IKOS static analyzer to uncover “division-by-zero” bugs in those programs.
Pre-Requisites
· Watch the video lectures corresponding to the module on “Introduction to Software Analysis”. The lectures introduce various terminology used throughout this lab such as control-flow graphs, program invariants, static and dynamic analysis, abstract domains, ground truth, false positives/negatives, and F1 scores.
· Read the LLVM Primer: Part I (Overview of LLVM) and Part II (Structure of LLVM IR). This is required for the first part of this lab wherein you will manually convert a few small LLVM IR programs given to you into the corresponding C programs.
· Read the article A Menagerie of Program Abstractions which surveys different abstract domains used by static analysis. This is required for the second part of this lab wherein you will run a static analyzer using two of those domains (interval domain and DBM domain) on a few small C programs, including those written by you above.
Setup
Step 1. Set up the course VM by following the instructions outlined in Course VM and Lab Instructions (skip this step if you have already completed it). The skeleton code for Lab 1 is located under /cis547vm/lab1/. We will refer to this top-level directory for Lab 1 simply as lab1 when describing file locations for the lab.
Step 2. Run the following command under /cis547vm/ to obtain the latest changes to the lab:
~$ cd /cis547vm
/cis547vm$ git pull
remote: Enumerating objects: 19, done.
remote: Counting objects: 100% (19/19), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 14 (delta 11), reused 11 (delta 8), pack-reused 0
Unpacking objects: 100% (14/14), done.
…
Step 3. You will use the Make command-line tool to run LLVM and program analysis tools AFL and IKOS. If you are not familiar with Make, read the Makefile tutorial first, and then peruse file lab1/Makefile. Future labs will use CMake to generate a Makefile, so ensure that you are comfortable with using Make in this lab.
Step 4. Run the following command and ensure that you get the corresponding output:
/cis547vm$ cd lab1
/cis547vm/lab1$ make
make: *** No rule to make target ‘test1’, needed by ‘all’. Stop.
Upon successful completion of the lab, running the make command should produce the following files under the lab1/results/ directory:
|– afl_logs/
| |– test1/
| | |– out.txt
| | |– afl_output/
| | `– test1
| |– … // similar for test2..test7
|
`– ikos_logs/
|– test1_dbm_out.txt
|– test1_int_out.txt
|– … // similar for test2..test7
Lab Instructions
The lab consists of two parts to be completed in order. In the first part, you will manually convert the given LLVM IR programs into the corresponding C programs. In the second part, you will run two analysis tools on a suite of C programs including the ones you converted in the first part, study the tools’ results, and report your findings.
Part 1: Understanding the LLVM IR
Step 1. Study the LLVM Primer to understand the structure of the LLVM IR. The primer shows how to run LLVM on a sample C program to generate the corresponding LLVM IR program. You can use the sandbox/ directory in the course VM for this purpose:
/cis547vm$ cd sandbox/test
/cis547vm/sandbox/test$ clang -emit-llvm -S -fno-discard-value-names -c simple0.c
Step 2. Write by hand the C programs corresponding to the LLVM IR programs under the lab1/ir_programs directory and place them under the lab1/c_programs/ directory. Ensure that running the above command on your hand-written C programs generates the exact LLVM IR programs provided as we will auto-grade them. You can do so by using the diff command-line utility to check if your files are the same.
/cis547vm$ cd lab1/c_programs
/cis547vm/lab1/c_programs$ clang -emit-llvm -S -fno-discard-value-names -c test1.c
/cis547vm/lab1/c_programs$ diff test1.ll ../ir_programs/test1.ll
Part 2: Applying Two Analysis Tools
Step 1. Run the provided analysis tools, AFL and IKOS, on all C programs located under the lab1/c_programs/ directory, including those written by you above. To do so, simply run the following command, which first runs AFL with a timeout of 30 seconds per program, and then runs IKOS with the two different abstractions as mentioned previously.
/cis547vm$ cd lab1
/cis547vm/lab1$ make
mkdir -p results/afl_logs/test1
AFL_DONT_OPTIMIZE=1 afl-gcc c_programs/test1.c -o …
afl-cc 2.52b by < >
afl-as 2.52b by < >
[+] Instrumented 1 locations (64-bit, non-hardened mode, ratio 100%).
timeout 30s afl-fuzz -i afl_input -o …
Makefile:8: recipe for target ‘results/afl_logs/test1/out.txt’ failed
make: [results/afl_logs/test1/out.txt] Error 124 (ignored)
mkdir -p results/ikos_logs
ikos –opt none -a dbz -d interval c_programs/test1.c …
…
mkdir -p results/ikos_logs
ikos –opt none -a dbz -d dbm c_programs/test1.c …
…
Ignore the error reported by Make above; it is normal because AFL keeps running until it is forcibly terminated by the timeout command. Feel free to experiment by changing the timeout which is set to 30 seconds. We do not expect everyone to report the same solutions since AFL is non-deterministic anyway.
Step 2. Determine the ground truth (right vs. wrong) of the C programs with respect to division-by-zero errors. In particular, for each division instruction in each program, determine by inspecting the program whether it can result in a division-by-zero error on some program input. Write your answers in file lab1/answers.txt in the “ground truth” column of the table for test1.c thru test6.c, and under Question 1 for test7.c.
Step 3. Study the output of AFL and IKOS and determine if they accept or reject each program. Write your answers in file lab1/answers.txt in the corresponding columns of the table for test1.c thru test6.c, and under Questions 2 and 3 for test7.c.
The crashing inputs discovered by AFL are stored in separate files under directories lab1/ results/afl_logs/test[1-7]/afl_output/crashes/. The files have idiosyncratic names of the form “id:000000,sig:08,src:000000,op:arith8,pos:2,val:-8”[footnoteRef:0]. It is the *contents* of these files that AFL used as the input to the test program when it encountered a crash. [0: The file name encodes various things such as the ID of the crashing input, the crashing signal, the non-crashing seed input from which this crashing input was produced — which in our case is always the file lab1/afl_input/seed.txt, and the operations by which the non-crashing seed input was transformed into this crashing input.]
Step 4. Using your entries from Steps 2 and 3, calculate the Precision, Recall, and F1 Score of each column. Enter them in the corresponding rows in lab1/answers.txt
Items to Submit
Run the below command to produce file lab1/submission.zip and submit that file.
/cis547vm/lab1$ make submit
mkdir -p submission
rm -rf submission/*
cp -r answers.txt results c_programs submission
zip -r submission.zip submission/ 2>&1 >/dev/null