Introduction to Fuzzing
Xiaogang University of Technology
Xiaogang University of Technology
Copyright By PowCoder代写 加微信 powcoder
Vulnerability & Bug Basic Knowledge
Vulnerability & Bug
Vulnerability & Bug Examples of Vulnerabilities The History of a Vulnerability
Vulnerabilities VS. Bugs
vA software bug is an error,
flaw, failure or fault in a computer
program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.
vA vulnerability is a bug that can be exploited by an attacker.
o The variable ‘b’ isn’t initialized, but it cannot be exploited by attackers.
int minval(int *A, int n) { int b; // Missing initialization
for (int i=0; i
test %rdi,%rdi
je 403c0a
mov %r12,%rdi
Basic Knowledge: Basic Block
vbasic block (BB): a straight-line code sequence with no branches in except to the entry and no branches out except at the exit.
one entry and one exit
Basic Knowledge: Edge
vEdge: The connection between basic blocks true false
if (a > 5)
b = a + 3;
b = a * 3;
Basic Knowledge: Path
vPath: The connections between edges path
Basic Knowledge: Control Flow Graph
vA Control Flow Graph (CFG) is the graphical representation of control flow or computation during the execution of programs or applications.
basic blocks edges
Basic Knowledge: Code Coverage
Ø Code Coverage measures how much code has been executed.
vInstruction level vBasic block level
vEdge level vPath level
Basic Knowledge: Instruction Coverage
vinstruction coverage: identify which instructions have been executed
Basic Knowledge: BB Coverage
vbasic block coverage: identify which BBs have been executed
Basic Knowledge: Edge Coverage
vedge coverage: identify which edges have been executed
Basic Knowledge: Path Coverage
vpath coverage: identify which paths have been executed
Basic Knowledge: Instrumentation
vIn the context of computer programming, instrumentation refers to an ability to monitor or measure the level of a product’s performance, to diagnose errors, and to write trace information.
int number = 0;
number ++;
Basic Knowledge: Instrumentation
ØStatic instrumentation
§ instrument the target program before it runs
int main(){ …
if ( count > 100){
executable binary file
int main(){ …
original program file
insert extra code
Instrumentation
run in a process
Basic Knowledge: Instrumentation
ØDynamic instrumentation
§ instrument the target program while it is running
extra code
instrumentation
int main(){ …
executable binary file
run in a process
result of execution
Basic Knowledge: Instrumentation tools
vIntel Pin (dynamic)
• Intel, binary
• https://software.intel.com/en-us/articles/pin-a-dynamic- binary-instrumentation-tool
vDyninst (static & dynamic)
• Dyninst group, binary
• https://github.com/dyninst/dyninst
vDynamoRIO (dynamic)
• MIT & Hewlett-Packard, binary
• https://www.dynamorio.org/ v…
General Fuzzing Steps
vIt generates inputs to test the target program
vAnd hope to trigger errors in the program
• files (.pdf, .png, …)
• network based (http, …)
v generate inputs for programs and test them
• the programs will be tested for many times
• the generated inputs may be invalid
• monitor the execution for exceptions (e.g., crashes), which are the indicators of vulnerabilities
generate inputs
Random inputs
Output strings
“a<=5” “b<18” “c<=9”
“a>5” “b<18” “c>9”
void func (){
int a,b,c; scanf(“%d,%d,%d”,&a,&b,&c); if (a > 5) printf(“a>5\n”);
else printf(“a<=5\n”);
if (b < 18) printf(“b<18\n”); else fault(); //exit
if (c > 9) printf(“c>9\n”); } else printf(“c<=9\n”);
If a tester knows nothing about the program,
the tester has to guess:
void main(){
// get a character char a = getchar(); if ( a == ’Q’){
printf (“ hello world\n”); }
printf (“no\n”); }
Q hello world r no
t no ... no
void main(){
// get a character char a = getchar(); if ( a == ’Q’){
printf (“ hello world\n”);
printf (“no\n”); }
If a tester knows nothing about the program,
the tester has to guess:
Q hello world
find a bug
r no t no ... no
vHow to generate inputs?
• how to set initial inputs?
• how to mutate the initial inputs and generate new inputs?
v What if a program needs complicated inputs? e.g., PDF file, image or HTML file
• How to generate complicated inputs? vWhen to save an input?
• How to know the inner logic of the target program?
General Steps
Crash/Error Bug Detector
bugs Bug Filter
Real Bugs/ Vulnerabilities
target program
initial input: 5,17,9
seed: 5,17,9
generation: mutate the seed, +1
new input: 6,18,10
void func (){
int a,b,c; scanf("%d,%d,%d",&a,&b,&c); if (a > 5) printf(“a>5\n”);
else printf(“a<=5\n”);
if (b < 18) printf(“b<18\n”); else bug(); //crash or error
if (c > 9) printf(“c>9\n”); } else printf(“c<=9\n”);
Example: different mutation operator
initial input: 5,17,9
void func (){
int a,b,c; scanf("%d,%d,%d",&a,&b,&c); if (a > 5) printf(“a>5\n”);
else printf(“a<=5\n”);
seed: 5,17,9 A lot of new inputs are generated
if (b < 18) printf(“b<18\n”); generation: mutate the seed,
else bug(); //crash or error if (c > 9) printf(“c>9\n”);
to test the program
} else printf(“c<=9\n”); new input: 15,27,19
Fuzzing: Initial Inputs
target program
Initial Inputs
vHow to create the initial inputs?
• Choose any input • Expert Knowledge • Grammar
vThe initial inputs will be seeds, which are the inputs that are chosen to be mutated, and are chosen to generate new inputs.
Initial Inputs: Random
vChoose any input
• “hello world”, 10, 100 • easy to create
• less effective
It will never get to the other parts of the program unless the input for ‘a’ equals to 23333. But generating 23333 from randomly chosen seed costs much time, even for months or years.
void func (){
int a,b,c; scanf("%d,%d,%d",&a,&b,&c); if (a != 23333) return;
else printf(“a<=5\n”);
if (b < 18) printf(“b<18\n”); else bug(); //crash or error
if (c > 9) printf(“c>9\n”); } else printf(“c<=9\n”);
Initial Inputs: Random
vExample: PDF Fuzzing, test the PDF viewer
1. search on the internet and download lots of
2. grab one of the PDF files
3. mutate the file
4. send the mutated file to the PDF viewer
5. record if it crashed (and the input that crashed it)
6. grab another PDF file
7. gotostep3
Fuzz PDF viewer
v What if we don’t feed the PDF viewer a PDF file? üPDF requires a specific format;
ü If an input breaches the format, PDF viewer will terminate at early stage and refuse to process the input
üThe input has to be qualified
pdfviewer:
if not a pdf:
return; process pdf file;
Initial Inputs: Expert
vExpert Knowledge
• create initial inputs that may lead the target
program to crash/error
• need experience
• know the target program
• more effective
Initial Inputs: Expert
vExpert Knowledge: ImageMagick • CVE-2019-9956
• In ImageMagick 7.0.8-35 Q16, there is a stack-based buffer overflow in the function PopHexPixel of coders/ps.c, which allows an attacker to cause a denial of service via a crafted image file.
vcreate such images as initial inputs to test the software ImageMagick
Initial Inputs: Grammar
vUse grammar to create initial inputs
• create effective initial inputs that can reach
much coverage
• grammar: the rules to create a required input
• complicated
• more effective
Initial Inputs: Grammar
vExample: The e Language (XSL)
• refuse to process further if the code is not XSL • special format
Initial Inputs: Grammar
prolog element
encoding=”utf-8″
version=”1.0″
xmlns:xsl=“…”
Fuzzing: Energy Assignment
target program
Energy Assignment
vFuzzing tests target programs repeatedly vWhich seed to be assigned more times of
• closer to the target location (AFLGO) • less-frequent seeds (AFLFast)
• larger untouched coverage (CollAFL)
mutations?
Energy Assignment: closer to target location vThe target: block ’F’
vtwo different seeds:
• one seed reaches the
path ‘ABCEG’
• another seed reaches
the path ‘AIKMG’
control flow graph
Energy Assignment: closer to target location vHow to define ‘close’? e.g., the
• the shortest path of the blocks in the seed path to the target block
v Many other definitions about distance
control flow graph
Energy Assignment: closer to target location
v For path ‘ABCEG’:
• AàF: 3 edges
• BàF: 2 edges
• CàF: 1 edges
• the shortest one: CàF
• distance of the path to the target: 1
v For path ‘AIKMG’:
• AàF: 3 edges
• IàF: 2 edges
• the shortest one: IàF
• distance of the path to the target: 2
v path ‘ABCEG’ is closer to the target.
v The seed reaching path ‘ABCEG’ has more energy
control flow graph
Energy Assignment: less-frequent seeds
vRecall: The target program will be tested by fuzzing for many times
vThe same seed will be chosen to be mutated for many times
vWhy less-frequent seeds?
• find more coverage as fast as possible
• it spends too much time on generating inputs from more-frequent seeds, and finds no more coverage;
• try to generate inputs from the less-frequent seeds, and they may reach more coverage
Energy Assignment: less-frequent seeds
• seed ‘s1’ examines path ‘ABCEG’
• mutate ‘s1’ and generate new inputs for 10,000 times;
• the new inputs may reach the same path ‘ABCEG’ because there’re no branches in the path
control flow graph
• seed ‘s2’ examines path ‘AIKMG’
• mutate ‘s2’ and generate new
inputs for 10 times;
• More energy for ’s2’ because it has been chosen for fewer times
Energy Assignment: larger untouched coverage
vThe larger untouched coverage indicates that the corresponding seed has a higher chance to generate new inputs that can reach the untouched coverage
• seed ‘s1’ examines path ‘ABCEG’
• seed ‘s2’ examines path ‘AIKMG’
• along path ‘AIKMG’, more untouched blocks (coverage) can be reached, i.e., block J, F, L
• More energy for the ‘s2’ to be mutated
control flow graph
Fuzzing: Generation
target program
Input generation
vHow to mutate a seed?
• coarse mutation • fine mutation
Input generation: Coarse mutation
vcombine different parts from different seeds/inputs
• seed 1: “hello Swinburne”
• seed 2: “nihao Students”
• new input: “hello Students”
vRandomly choose a part of the input to be replaced by prepared inputs
• seed: “hello Swinburne”
• prepared inputs: ”magic”, “greatjob” • new input 1: “hemagicwinburne”
• new input 2: ”hello Sgreatjob”
Input generation: Fine Mutation
vonly change one byte for once and change in small steps
• seed: “hello Swinburne”
• new input 1: “iello Swinburne”
• new input 2: “hfllo Swinburne” • new input 3: “hemlo Swinburne”
Many other mutation schemes can be chosen
Fuzzing: Execution Monitoring
target program
Execution Monitoring
vcrash/errors • Linux signals
• basic blocks • edges
vOther information
vOnly choose the information the fuzzing needs
Execution Monitoring
vcrash/errors: Linux signals
Signal Name
Signal Number
Description
Hang up detected on controlling terminal or death of controlling process
Issued if the user sends an interrupt signal (Ctrl + C)
Issued if an illegal mathematical operation is attempted
If a process gets this signal it must quit immediately and will not perform any clean-up operations
Software termination signal (sent by kill by default)
Execution Monitoring
control flow graph
• How many basic blocks does the
execution cover?
• How many edges does the execution cover?
• How many paths have the executions covered?
Execution Monitoring
vOther execution information • path constraints
• if (x == 0x2333) error();
• data flow: the data changes by time
input data (getchar()): a=5 5à8à40
void func (){
int a,b,c;
a = getchar();
b = getchar();
a = a + 3;
c = a * 5;
Execution Monitoring
vHow to get the internal execution information? • the understanding of the target program:
• white-box fuzzing • black-box fuzzing • grey-box fuzzing
Execution Monitoring
vwhite-box fuzzing:
• all of the information of the target program is known, e.g., the source code, the grammar, the functionality, …
• get as much information as you want
functionality
source code
1.#include
2.int main()
4.int number;
5.// printf() dislpays the formatted output
6.printf(“Enter an integer: “);
7.// scanf() reads the formatted input and stores them 8.scanf(“%d”, &number);
9.// printf() displays the formatted output 10.printf(“You entered: %d”, number);
11.return 0;
In this program, integer entered by the user is stored in a variable. Then, that variable is displayed on the screen using printf() function.
Execution Monitoring
vblack-box fuzzing:
• know nothing about the target program
• get very few information
• firmware in IoT devices; • commercial software; •…
executable program
Execution Monitoring
vgrey-box fuzzing:
• know information between the white-box and black-box
• instrument to get some information
• most fuzzing are grey-box fuzzing
executable program
global _main extern _printf section .text
push message call _printf add esp, 4
db ‘Hello, World!’, 10, 0
low-level programming language
instrument
Coverage …
some information
Fuzzing: Save A
target program
vWhat input will be chosen as a seed? • when it reaches new coverage
• new blocks, edges, or new paths
blocks edges paths
vOther criterion to save a new seed?
vWhy does fuzzing save such inputs as seeds?
• seeds will be mutated to generate new inputs
• it has higher chance that such seeds can generate new inputs to reach more coverage
if (x[0] > 5) A
x[0] =3, x[1] =20 à AC
x[0] =7, x[1] =20, x[2] =10, x[3] =6, x[4] =7 à new coverage: ABE
x[0] =7, x[1] =15, x[2] =10, x[3] =6, x[4] =7 à new coverage: BD
B if(x[1]==15) true
if(x[2]<25) C false
if(x[3]<5) D if(x[4]<5) E ......
Fuzzing: When will it stop?
qcan run forever qunless the user
target program
One big Challenge
vPath Constraints
if (a == 0x23456){
It’s hard for fuzzing to randomly generate the exact value 0x23456. So the bug() is hard to trigger.
if (x[0] == 0x23456){
if (x[1] == 0x23333){
It’s harder to trigger the bug() because fuzzing has to generate two exact value at the same time.
How to resolve the path constraints?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com