CS计算机代考程序代写 c/c++ compiler Lab 2: Dynamic Analysis

Lab 2: Dynamic Analysis

CIS 547: Software Analysis

Lab 2: Dynamic Analysis (Summer 2021)

Synopsis

Building a “division-by-zero” dynamic analyzer for the C language using the LLVM framework.

Objective

In this lab, you will build a dynamic analyzer to check for division-by-zero errors in C programs
at runtime. You will create an LLVM pass that will instrument C code with additional
instructions to perform runtime checks, thus creating a sanitizer, a form of lightweight dynamic
analysis. You will also implement a code coverage mechanism that will serve two purposes: to
evaluate the quality of test inputs you will provide to your analyzer in this lab and to guide an
automated test input generator you will build in a subsequent lab.

Pre-Requisites

● Watch the video lectures corresponding to the module on “Software Specifications”. The
lectures introduce various terminology used throughout this lab such as sanitizers and
code coverage.

● Read relevant parts of the LLVM Primer: Part II (Structure of LLVM IR) to understand
how to write an LLVM pass and Part III (the LLVM API) to determine which APIs to use
in writing your LLVM pass.

● Read the article on Hardening C/C++ Code with Clang Sanitizers which surveys
pre-existing sanitizers that target common kinds of programming errors. The dynamic
analyzer you will build is a sanitizer that targets “division-by-zero” errors.

Setup

Step 1. In this and future labs, we will use CMake, a modern tool for managing the build process.
If you are unfamiliar with CMake, you are strongly advised to read the CMake tutorial first
(especially Step 1 and Step 2 in the tutorial). Running cmake produces a Makefile that you might
be more familiar with. If not, read the Makefile tutorial before proceeding. Once a Makefile is
generated, you need only call make to rebuild your project after editing the source files.

https://www.cis.upenn.edu/~cis547/llvm.pdf
https://docs.google.com/document/d/1ZnwzfWnw5EsqpqtAVyPQ5DgJY6xfoN3aBeLhLmvel5k/edit?usp=sharing
https://cmake.org/cmake/help/latest/guide/tutorial/index.html
https://www.gnu.org/software/make/manual/html_node/Simple-Makefile.html#Simple-Makefile

CIS 547: Software Analysis

The skeleton code for Lab 2 is located under /cis547vm/lab2/. We will refer to the top-level
directory for Lab 2 as lab2 when describing file locations for the lab. Run the following
commands to set up the lab:

/cis547vm$ cd lab2
/cis547vm/lab2$ mkdir build && cd build
/cis547vm/lab2/build$ cmake ..
/cis547vm/lab2/build$ make

You should see several files created in the current directory. Among other files, this builds an
LLVM pass named InstrumentPass.so from code that we have provided in lab2/src/
Instrument.cpp (which you will modify in this lab), and an auxiliary runtime library, named
libruntime.so that contains functionality to help you complete the lab.

The remaining steps follow the depicted workflow from left to right:

Step 2. As noted in Step 1, you will implement the functionality of this lab as an LLVM pass
called InstrumentPass. LLVM passes are subprocesses of the LLVM framework. They usually
perform transformations, optimizations, or analyses on programs. Each pass operates on the
LLVM IR representation of the input program. So, to exercise this lab on an input C program,
you must first compile the program to LLVM IR, as you did in Lab 1:

/cis547vm$ cd lab2/test
/cis547vm/lab2/test$ clang -emit-llvm -S -fno-discard-value-names -c -o
simple0.ll simple0.c -g

clang is a compiler front-end for C that uses LLVM as a back-end. The user manual of clang has
a useful reference to its command-line options. Briefly, -S instructs clang to perform
preprocessing and compilation steps only, -emit-llvm instructs the compiler to generate LLVM

https://llvm.org/docs/WritingAnLLVMPass.html
https://releases.llvm.org/8.0.0/tools/clang/docs/UsersManual.html#command-line-options

CIS 547: Software Analysis

IR (which will be saved to simple0.ll), and -fno-discard-value-names preserves names of
values in the generated LLVM to improve readability.

Step 3. Next, we use opt to run our dummy Instrument pass on the compiled C program:

/cis547vm/lab2/test$ opt -load ../build/InstrumentPass.so -Instrument -S
simple0.ll -o simple0.instrumented.ll

opt is an LLVM tool that performs analyses and optimizations on LLVM IR. The option -load
loads our LLVM pass library while -Instrument instructs opt to run the pass on simple0.ll.
(Libraries can and often do contain multiple LLVM passes.) Consult the documentation of opt to
understand the potential ways to use the tool; it may help you build and debug your solutions.
The produced program in simple0.instrumented.ll should be identical to simple0.ll but it
will cease to be so once you implement the functionality of this lab:

/cis547vm/lab2/test$ diff simple0.instrumented.ll simple0.ll
1c1
< ; ModuleID = 'simple0.ll' --- > ; ModuleID = ‘simple0.c’

Step 4. Next, compile the instrumented program and link it with the provided runtime library to
produce a standalone executable named simple0:

/cis547vm/lab2/test$ clang -o simple0 -L../build -lruntime
simple0.instrumented.ll

Step 5. Finally run the executable on the empty input; note that you may have to manually
provide test input for programs that expect non-empty input:

/cis547vm/lab2/test$ ./simple0
Floating point exception

Indeed, our sample program has a division-by-zero error. In this lab, you will complete the
Instrument pass to catch this error at runtime, as well as report code coverage of the test run. In
particular, your output on the above test program should be:

http://releases.llvm.org/8.0.0/docs/CommandGuide/opt.html

CIS 547: Software Analysis

Divide-by-zero detected at line 4 and col 13

and code coverage information will be printed out in a file named EXE.cov where EXE is the
name of the executable that is run (in the above case, look for simple0.cov). Our auxiliary
functions will handle the creation of the file; your instrumented code should populate it with
line,col information. If implemented correctly, you will see the following lines in simple0.cov
that indicate the executed lines from the program:

2,7
2,7
3,7
3,11
3,7
4,7
4,11
4,15

You will see some duplicates in EXE.cov. The reason is that one line in the C source code maps
to more than one line in the LLVM IR.

Lab Instructions

In this lab, you will build a dynamic analyzer to catch division-by-zero errors at runtime. A key
aspect of dynamic analysis involves inspecting a running program for information about its state
and behavior. We will develop an LLVM pass to insert runtime checking and monitoring code
into a given program. Our instrumentation will perform division-by-zero error checking and
record coverage information for a running program. In Lab 3, we will build upon this lab to
develop an automated testing framework. In Labs 7 and 8, we will use this testing framework to
develop tools for debugging.

Instrumentation Primer. Consider the following code snippet where we have two potential
divide-by-zero errors, one at Line A, the other at Line B.

int main() {
int x1 = input();
int y = 13 / x1; // Line A
int x2 = input();
int z = 21 / x2; // Line B
return 0;

CIS 547: Software Analysis

}

If we wanted to program a bit more defensively, we would manually insert checks before these
divisions, and print out an error if the divisor is 0:

int main() {
int x1 = input();
if (x1 == 0) { printf(“Detected divide-by-zero error!”); exit(1); }
int y = 13 / x1;
int x2 = input();
if (x2 == 0) { printf(“Detected divide-by-zero error!”); exit(1); }
int z = 21 / x2;
return 0;

}

Of course, there is nothing stopping us from encapsulating this repeated check into some
function, call it __sanitize__, for reuse.

void __sanitize__(int divisor) {
if (divisor == 0) {
printf(“Detected divide-by-zero error!”);
exit(1);

}
}
int main() {
int x1 = input();
__sanitize__(x1);
int y = 13 / x1;
int x2 = input();
__sanitize__(x2);
int z = 21 / x2;
return 0;

}

We have transformed our unsafe version of the program in the first example to a safe one by
instrumenting all division instructions with some code that performs a divisor check. In this lab,
you will automate this process at the LLVM IR level using an LLVM pass.

Code Coverage Primer. Code coverage is a measure of the fraction of a program’s code that is
executed in a particular run. In this lab, you will implement the mechanism underlying modern
code coverage tools, such as the LLVM’s source-based code coverage tool and gcov. It
instruments the program’s LLVM IR instructions at compile-time to record the line and column

https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

CIS 547: Software Analysis

number of the program’s source-level instructions that are executed at run-time. This seemingly
primitive information enables powerful software analysis use-cases. We will explore two such
use-cases. In this lab, you will use the information to improve your test suite by adding tests that
cover more code and thereby uncover crashing bugs. In Lab 4, you will use the same information
to guide an automated test input generator, thereby realizing the architecture of modern
industrial-strength fuzzers.

A sample report produced by LLVM’s source-based code coverage tool.

Debug Location Primer. When you compile a C program with the -g option, LLVM will include
debug information for LLVM IR instructions. Using the aforementioned instrumentation
techniques, your LLVM pass can gather this debug information for an Instruction, and forward
it to __sanitize__ to report the location at which a divide-by-zero error occurs. We will discuss
the specifics of this interface in the following sections.

Instrumentation Pass. We have provided a framework from which you can build your LLVM
instrumentation pass. You will need to edit the lab2/src/Instrument.cpp file to implement
your divide-by-zero sanitizer, as well as the code coverage mechanism. File lab2/lib/
runtime.c contains functions that you will use in your lab:

CIS 547: Software Analysis

– void __sanitize__(int divisor, int line, int col)

– Output an error for line,col if divisor is 0
– void __coverage__(int line, int col)

– Append coverage information for line,col in a file for the current executing process

As you will create a runtime sanitizer, your pass should instrument the code with calls to these
functions. In particular, you will modify the runOnFunction method in Instrument.cpp to
perform this instrumentation for all LLVM instructions encountered inside a function.

Note that our runOnFunction method returns true. Since we are instrumenting the input code
with additional functionality, we return true to indicate that the pass modifies, or transforms the
source code it traverses over.

In short, the lab consists of the following tasks:

1. Implement the instrumentSanitize function to insert a __sanitize__ check for a supplied
Instruction.

2. Modify runOnFunction to instrument all division instructions with the sanitizer for a given
block of code.

3. Implement the instrumentCoverage function to insert __coverage__ checks for all debug
locations.

4. Modify runOnFunction to instrument all instructions with the coverage check.

Inserting Instructions into LLVM code. By now you are familiar with the BasicBlock and
Instruction classes and working with LLVM instructions in general. For this lab you will need
to use the LLVM API to insert additional instructions into the code when traversing a
BasicBlock. There are manys ways to do this in LLVM. One common pattern when working
with LLVM is to create a new instruction and insert it directly after some previous instruction.

For example, in the following code snippet:

Instruction* Pi = …;
auto *NewInst = new Instruction(…, Pi);

A new instruction (NewInst) will get created and implicitly inserted after Pi; you do not need to
do anything further with NewInst. Subclasses of Instruction have similar methods for doing

http://releases.llvm.org/8.0.0/docs/ProgrammersManual.html#creating-and-inserting-new-instructions

CIS 547: Software Analysis

this. In particular, you will only need to create and insert new call instructions (CallInst), as
discussed below.

Loading C functions into LLVM code. We have provided the definitions of the auxiliary
functions __sanitize__ and __coverage__ for you, but you have to insert calls to them into the
code as LLVM instructions. Keep in mind that both of these functions are only used for logging
purposes. __sanitize__ logs all the occurences of a divisor being equal to zero, and
__coverage__ logs any executed line of the code.

Before a function can be called within a Module, it has to be loaded into the Module using the
appropriate API Module::getOrInsertFunction. One way to do this is illustrated below:

Value* NewValue = M->getOrInsertFunction(“function_name”, return_type,
arg1_type, arg2_type, …, argN_type);

Function* NewFunction = cast(NewValue);

Next, the function that you have created must be called. So you will have to create a call
instruction at instruction I using CallInst::Create as illustrated below:

CallInst *Call = CallInst::Create(NewFunction, Args, “”, &I);
Call->setCallingConv(CallingConv::C);
Call->setTailCall(true);

You should populate std::vector Args with appropriate values for arguments.

Debug Locations. As we alluded to in the primer, LLVM will store code location information of
the original C program for LLVM instructions when compiled with -g. This is done through the
DebugLoc class:

Instruction* I1 = …;
DebugLoc &Debug = I1->getDebugLoc();
printf(“Line No: %d\n”, Debug.getLine());

You will need to gather and forward this information to the sanitizer functions. As a final hint,
not every single LLVM instruction corresponds to a specific line in its source C code. You will
have to check which instructions have debug information. Use this to help build the code
coverage metric instrumentation.

https://llvm.org/doxygen/classllvm_1_1Module.html
https://llvm.org/doxygen/classllvm_1_1CallInst.html#a850d8262cd900958b3153c4aa080b2bb
https://llvm.org/doxygen/classllvm_1_1DebugLoc.html

CIS 547: Software Analysis

Items to Submit

Submit only your modified file Instrument.cpp.

Related Posts