Static Analysis
Static Analysis
Slide deck courtesy of Prof. Michael Hicks, University of Maryland, College Park (UMD)
Static Analysis
for Secure Development
•
•
Static Analysis
for Secure Development
Introduction
Static analysis: What, and why?
•
•
•
•
Static Analysis
for Secure Development Introduction
Static analysis: What, and why?
Basic analysis
Example: Flow analysis
•
•
•
•
•
•
Static Analysis
for Secure Development Introduction
Static analysis: What, and why? Basic analysis
Example: Flow analysis
Increasing precision
Context-, flow-, and path sensitivity
•
•
•
•
•
•
•
•
Static Analysis
for Secure Development Introduction
Static analysis: What, and why? Basic analysis
Example: Flow analysis Increasing precision
Context-, flow-, and path sensitivity
Scaling it up
Pointers, arrays, information flow, …
Current Practice for Software Assurance
•
–
Testing
Make sure program runs correctly on set of inputs
Current Practice for Software Assurance
•
–
Testing
Make sure program runs correctly on set of inputs
inputs
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
program
•
–
Testing
Current Practice
for Software Assurance
Make sure program runs correctly on set of inputs
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
inputs
outputs
program
•
–
Testing
Make sure program runs correctly on set of inputs
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
inputs outputs program
oracle
•
–
Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
•
–
Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Benefits: Concrete failure proves issue, aids in fix
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
–
•
–
Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Benefits: Concrete failure proves issue, aids in fix Drawbacks: Expensive, difficult, hard to cover all
code paths, no guarantees
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
– –
Current Practice (cont’d)
•
–
Code Auditing
Convince someone else your source code is correct
Current Practice (cont’d)
•
–
Code Auditing
Convince someone else your source code is correct
Current Practice (cont’d)
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++;
*cmd = '\0';
/* throw away leading whitespace */ while (isascii(*p) && isspace(*p))
p++;
/* decode command */
for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */
•
– –
Code Auditing
Convince someone else your source code is correct Benefit: humans can generalize beyond single runs
Current Practice (cont’d)
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++;
*cmd = '\0';
/* throw away leading whitespace */ while (isascii(*p) && isspace(*p))
p++;
/* decode command */
for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */
•
– – –
Code Auditing
Convince someone else your source code is correct Benefit: humans can generalize beyond single runs Drawbacks: Expensive, hard, no guarantees
Current Practice (cont’d)
???
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++;
*cmd = '\0';
/* throw away leading whitespace */ while (isascii(*p) && isspace(*p))
p++;
/* decode command */
for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */
If You’re Worried about Security…
If You’re Worried about Security…
A malicious adversary is trying to exploit anything you miss!
What more can we do?
•
Analyze program’s code without running it
Static analysis
In a sense, we are asking a computer to do what a
•
human might do during a code review
•
Analyze program’s code without running it
Static analysis
In a sense, we are asking a computer to do what a Benefit is (much) higher coverage
•
human might do during a code review
•
–
– –
Reason about many possible runs of the program Sometimes all of them, providing a guarantee
Reason about incomplete programs (e.g., libraries)
•
Analyze program’s code without running it
•
• • •
Static analysis
In a sense, we are asking a computer to do what a Benefit is (much) higher coverage
•
human might do during a code review
•
–
– –
Reason about many possible runs of the program Sometimes all of them, providing a guarantee
Reason about incomplete programs (e.g., libraries)
Drawbacks
Can only analyze limited properties
May miss some errors, or have false alarms Can be time consuming to run
Impact
•
–
–
Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning
•
–
–
•
–
–
–
Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning Encourages better development practices
Develop programming models that avoid mistakes in the first place
Encourage programmers to think about and make manifest their assumptions
Using annotations that improve tool precision
•
–
–
•
–
–
–
•
Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning Encourages better development practices
Develop programming models that avoid mistakes in the first place
Encourage programmers to think about and make manifest their assumptions
Using annotations that improve tool precision Seeing increased commercial adoption
What is
Static Analysis?
The Halting Problem
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
•
program P and inputs to it, P will terminate
•
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
program P
•
program P and inputs to it, P will terminate
•
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
•
The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
program P analyzer
•
program P and inputs to it, P will terminate
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
•
program P and inputs to it, P will terminate
•
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Always terminates?
program P analyzer
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
•
The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
Always terminates?
program P analyzer
Unfortunately, the halting problem is undecidable
That is, it is impossible to write such an analyzer: it will fail to produce an answer for at least some programs (and/or some inputs)
•
program P and inputs to it, P will terminate
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
•
•
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
Other properties?
•
•
Perhaps security-related properties are feasible E.g., that all accesses a[i] are in bounds
Other properties?
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the
•
halting problem by transforming the program
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the
•
halting problem by transforming the program
I.e., a perfect array bounds checker could solve the
•
halting problem, which is impossible!
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the
•
halting problem by transforming the program
•
Other undecidable properties (Rice’s theorem)
•
halting problem, which is impossible!
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the
•
halting problem by transforming the program
•
–
Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source?
•
halting problem, which is impossible!
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the
•
halting problem by transforming the program
•
–
–
Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source? Is this pointer used after its memory is freed?
•
halting problem, which is impossible!
•
•
Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the
•
halting problem by transforming the program
•
–
– –
Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source? Is this pointer used after its memory is freed?
Do any variables experience data races?
•
halting problem, which is impossible!
Halting ≈ Index in Bounds
Proof by transformation
•
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
•
•
-
-
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
Change program exit points to out-of-bounds accesses
a[a.length+10]
•
•
-
- •
-
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
Change program exit points to out-of-bounds accesses
a[a.length+10]
Now if the array bounds checker
•
•
-
•
- •
-
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
Change program exit points to out-of-bounds accesses
a[a.length+10]
Now if the array bounds checker
... finds an error, then the original program halts
•
•
-
•
•
- •
-
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
Change program exit points to out-of-bounds accesses
a[a.length+10]
Now if the array bounds checker
... finds an error, then the original program halts
•
•
-
•
•
- •
-
... claims there are no such errors, then the original
•
program does not halt
Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit()
Now all array bounds errors instead result in termination
Change program exit points to out-of-bounds accesses
a[a.length+10]
Now if the array bounds checker
... finds an error, then the original program halts
... contradiction!
with undecidability of the halting problem
•
•
-
•
•
- •
-
... claims there are no such errors, then the original
•
program does not halt
•
-
Static analysis is impossible?
Static analysis is impossible?
Perfect static analysis is not possible
•
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite
• •
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or
• •
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or
2. False alarms - claimed errors are not really errors, or
• •
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or
2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free
• •
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or
2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free
• •
Nonterminating analyses are confusing, so tools tend
•
to exhibit only false alarms and/or missed errors
Static analysis is impossible?
Perfect static analysis is not possible
Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or
2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free
Fall somewhere between soundness and completeness
• •
Nonterminating analyses are confusing, so tools tend
•
to exhibit only false alarms and/or missed errors
•
Soundness
If analysis says that X is true, then X is true.
Soundness
If analysis says that X is true, then X is true.
Things I say
Soundness
If analysis says that X is true, then X is true.
True things
Trivially Sound: Say nothing
Things I say
Soundness Completeness
If analysis says that X is If X is true, then analysis true, then X is true. says X is true.
True things
Trivially Sound: Say nothing
Things I say
Soundness Completeness
If analysis says that X is If X is true, then analysis true, then X is true. says X is true.
True things
Trivially Sound: Say nothing
Things I say
Things I say
Soundness Completeness
If analysis says that X is true, then X is true.
True things
Trivially Sound: Say nothing
If X is true, then analysis says X is true.
Things I say
True things
Things I say
Trivially Complete: Say everything
Soundness Completeness
If analysis says that X is If X is true, then analysis true, then X is true. says X is true.
ThruinegsthIinsgasy are all True things
Sound and Complete:
Say exactly the set of true things
Stepping back
Stepping back
Soundness: if the program is claimed to be error Alarms do not imply erroneousness
•
free, then it really is
•
•
Stepping back
Soundness: if the program is claimed to be error Alarms do not imply erroneousness
•
free, then it really is
Completeness: if the program is claimed to be Silence does not imply error freedom
•
erroneous, then it really is
•
•
Stepping back
Soundness: if the program is claimed to be error Alarms do not imply erroneousness
•
free, then it really is
Completeness: if the program is claimed to be Silence does not imply error freedom
•
erroneous, then it really is
•
•
•
•
Essentially, most interesting analyses
are neither sound nor complete (and not both)
... usually lean toward soundness (“soundy”) or completeness
The Art of Static Analysis
The Art of Static Analysis
Analysis design tradeoffs
•
The Art of Static Analysis
Analysis design tradeoffs
•
Precision: Carefully model program behavior, to
•
minimize false alarms
The Art of Static Analysis
Analysis design tradeoffs
•
Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs
•
minimize false alarms
•
The Art of Static Analysis
Analysis design tradeoffs
Scalability: Successfully analyze large programs Understandability: Error reports should be actionable
•
Precision: Carefully model program behavior, to
•
minimize false alarms
• •
The Art of Static Analysis
Analysis design tradeoffs
•
• •
•
Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs
•
minimize false alarms
Understandability: Error reports should be actionable Observation: Code style is important
The Art of Static Analysis
Analysis design tradeoffs
•
•
•
Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs
•
minimize false alarms
• •
Observation: Code style is important Aim to be precise for “good” programs
Understandability: Error reports should be actionable
The Art of Static Analysis
Analysis design tradeoffs
•
Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs
•
minimize false alarms
• •
•
•
•
False alarms viewed positively: reduces complexity
•
Understandability: Error reports should be actionable Observation: Code style is important
Aim to be precise for “good” programs It’s OK to forbid yucky code in the name of safety
The Art of Static Analysis
Analysis design tradeoffs
•
Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs
•
minimize false alarms
• •
•
•
•
False alarms viewed positively: reduces complexity
•
Understandability: Error reports should be actionable Observation: Code style is important
Aim to be precise for “good” programs It’s OK to forbid yucky code in the name of safety
Code that is more understandable to the analysis is more
•
understandable to humans
Flow Analysis
Flow Analysis
Tainted Flow Analysis
•
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
•
•
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
Input from the user is tainted
•
• •
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
Input from the user is tainted
Various data is used, assuming it is untainted
•
• •
•
•
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
Input from the user is tainted
Various data is used, assuming it is untainted
Examples expecting untainted data
source string of strcpy (≤ target buffer size)
•
• •
•
•
•
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
Input from the user is tainted
Various data is used, assuming it is untainted
Examples expecting untainted data
source string of strcpy (≤ target buffer size)
format string of printf (contains no format specifiers)
•
• •
•
•
• •
Tainted Flow Analysis
The root cause of many attacks is trusting unvalidated input
Input from the user is tainted
Various data is used, assuming it is untainted
Examples expecting untainted data
source string of strcpy (≤ target buffer size)
format string of printf (contains no format specifiers)
form field used in constructed SQL query (contains no SQL commands)
Recall: Format String Attack
Recall: Format String Attack
Adversary-controlled format string
•
char *name = fgets(..., network_fd); printf(name); // Oops
Recall: Format String Attack
Adversary-controlled format string
•
char *name = fgets(..., network_fd); printf(name); // Oops
•
Attacker sets name = “%s%s%s” to crash program
Recall: Format String Attack
Adversary-controlled format string
•
char *name = fgets(..., network_fd); printf(name); // Oops
•
•
-
Attacker sets name = “%s%s%s” to crash program
Attacker sets name = “...%n...” to write to memory Yields code injection exploits
Recall: Format String Attack
Adversary-controlled format string
•
char *name = fgets(..., network_fd); printf(name); // Oops
•
•
Attacker sets name = “%s%s%s” to crash program Attacker sets name = “...%n...” to write to memory
Yields code injection exploits
These bugs still occur in the wild
Too restrictive to forbid non-constant format strings
•
•
-
•
The problem, in types Specify our requirement as a type qualifier
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
•
The problem, in types Specify our requirement as a type qualifier
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
•
tainted = possibly controlled by adversary
•
The problem, in types Specify our requirement as a type qualifier
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
• •
tainted = possibly controlled by adversary untainted = must not be controlled by adversary
•
The problem, in types Specify our requirement as a type qualifier
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
• •
tainted = possibly controlled by adversary untainted = must not be controlled by adversary
tainted char *name = fgets(...,network_fd); printf(name); // FAIL: tainted ≠ untainted
•
No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected
Analysis problem
•
•
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink
•
• •
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
•
• • •
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
no annotation means: not sure (analysis figures it out)
•
• • •
•
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
no annotation means: not sure (analysis figures it out)
A solution requires inferring flows in the program
•
• • •
•
•
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
no annotation means: not sure (analysis figures it out)
A solution requires inferring flows in the program What sources can reach what sinks
•
• • •
•
•
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
no annotation means: not sure (analysis figures it out)
A solution requires inferring flows in the program What sources can reach what sinks
If any flows are illegal, i.e., whether a tainted source
•
may flow to an untainted sink
•
• • •
•
•
•
Analysis problem
No tainted data flows: For all possible inputs, prove
that tainted data will never be used where untainted
data is expected
untainted annotation: indicates a trusted sink tainted annotation: an untrusted source
no annotation means: not sure (analysis figures it out)
A solution requires inferring flows in the program What sources can reach what sinks
If any flows are illegal, i.e., whether a tainted source We will aim to develop a sound analysis
•
may flow to an untainted sink
Legal Flow
Legal Flow
void f(tainted int);
untainted int a = ...;
f(a);
Legal Flow
void f(tainted int);
untainted int a = ...;
f(a);
f accepts tainted or untainted data
Legal Flow Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
f accepts tainted or untainted data
Legal Flow Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
void g(untainted int);
tainted int b = ...;
g(b);
f accepts tainted or untainted data
Legal Flow
Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
void g(untainted int);
tainted int b = ...;
g(b);
f accepts tainted or untainted data
g accepts only untainted data
Legal Flow f accepts tainted or
untainted data
Allowed flow as a
lattice
Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
void g(untainted int);
tainted int b = ...;
g(b);
g accepts only untainted data
untainted < tainted
Legal Flow Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
void g(untainted int);
tainted int b = ...;
g(b);
untainted ≤ tainted
Allowed flow as a
lattice
tainted ≤ untainted untainted < tainted
Legal Flow
Illegal Flow
void f(tainted int);
untainted int a = ...;
f(a);
void g(untainted int);
tainted int b = ...;
g(b);
untainted ≤ tainted
Allowed flow as a
lattice
tainted ≤ untainted tainted
untainted
•
•
Analysis Approach
Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it
•
•
•
•
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it
Steps:
Create a name for each missing qualifier (e.g., α, β)
•
•
•
•
Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it Steps:
•
constraints (of the form q1 ≤ q2) on possible solutions
•
•
•
•
Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it Steps:
•
constraints (of the form q1 ≤ q2) on possible solutions
Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier
-
and qx is x’s qualifier
•
•
•
•
Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it Steps:
Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc.
•
constraints (of the form q1 ≤ q2) on possible solutions
•
-
and qx is x’s qualifier
•
•
•
•
Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it Steps:
•
constraints (of the form q1 ≤ q2) on possible solutions
•
-
Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc.
A solution is a substitution of qualifiers (like tainted or untainted) for names (like α and β) such that all of the constraints are legal flows
-
and qx is x’s qualifier
•
•
•
•
Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate
•
Analysis Approach Think of flow analysis as a kind of type inference
If no qualifier is present, we must infer it Steps:
•
constraints (of the form q1 ≤ q2) on possible solutions
Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc.
A solution is a substitution of qualifiers (like tainted or untainted) for names (like α and β) such that all of the constraints are legal flows
If there is no solution, we (may) have an illegal flow
•
-
-
and qx is x’s qualifier
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
char *name = fgets(..., network_fd);
char *x = name;
printf(x);
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α α≤β
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α α≤β
β ≤ untainted
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α ≤ β ≤ untainted
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α ≤ β ≤ untainted
First constraint requires α = tainted
To satisfy the second constraint implies β = tainted
But then the third constraint is illegal: tainted ≤ untainted
Example Analysis
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x = name; printf(x);
tainted ≤ α ≤ β ≤ untainted
Illegal flow!
No possible solution for α and β
Flow Analysis: Adding Sensitivity
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
→
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
tainted ≤ α
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
→
tainted ≤ α α≤β
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
→
tainted ≤ α α≤β
untainted ≤ β
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
→
tainted ≤ α α≤β
untainted ≤ β β ≤ untainted
Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
if (...) x = name; else x = “hello!”; printf(x);
→
tainted ≤ α α≤β
β ≤ untainted
Constraints still unsolvable
Illegal flow
Dropping the Conditional
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
x = name;
x = “hello!”; printf(x);
Dropping the Conditional
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
→
α char *name = fgets(..., network_fd);
β char *x;
x = name;
x = “hello!”; printf(x);
tainted ≤ α
Dropping the Conditional
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
x = name;
x = “hello!”; printf(x);
→
tainted ≤ α α≤β
Dropping the Conditional
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
x = name;
x = “hello!”; printf(x);
→
tainted ≤ α α≤β
untainted ≤ β
Dropping the Conditional
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x;
x = name;
x = “hello!”; printf(x);
→
tainted ≤ α α≤β
untainted ≤ β β ≤ untainted
Same constraints, different semantics!
False Alarm
•
Flow Sensitivity Our analysis is flow insensitive
•
•
Flow Sensitivity Our analysis is flow insensitive
Each variable has one qualifier which abstracts the taintedness of all values it ever contains
•
•
Flow Sensitivity Our analysis is flow insensitive
Each variable has one qualifier which abstracts the taintedness of all values it ever contains
A flow sensitive analysis would account for
•
variables whose contents change
•
•
Flow Sensitivity Our analysis is flow insensitive
Each variable has one qualifier which abstracts the taintedness of all values it ever contains
A flow sensitive analysis would account for Allow each assigned use of a variable to have a
•
variables whose contents change
•
different qualifier
E.g., α1 is x’s qualifier at line 1, but α2 is the qualifier at line 2, where
-
α1 and α2 can differ
•
•
Flow Sensitivity Our analysis is flow insensitive
Each variable has one qualifier which abstracts the taintedness of all values it ever contains
A flow sensitive analysis would account for Allow each assigned use of a variable to have a
•
variables whose contents change
•
different qualifier
•
-
E.g., α1 is x’s qualifier at line 1, but α2 is the qualifier at line 2, where Could implement this by transforming the program to
assign to a variable at most once Called static single assignment (SSA) form
-
α1 and α2 can differ
Reworked Example
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x1, γ *x2; x1 = name;
x2 = “%s”; printf(x2);
→
Reworked Example
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x1, γ *x2; x1 = name;
x2 = “%s”; printf(x2);
tainted ≤ α
Reworked Example
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x1, γ *x2; x1 = name;
x2 = “%s”; printf(x2);
→
tainted ≤ α α≤β
Reworked Example
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x1, γ *x2; x1 = name;
x2 = “%s”; printf(x2);
→
tainted ≤ α α≤β
untainted ≤ γ
Reworked Example
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
α char *name = fgets(..., network_fd);
β char *x1, γ *x2; x1 = name;
x2 = “%s”; printf(x2);
→
tainted ≤ α α ≤ β
untainted ≤ γ γ ≤ untainted
No Alarm
Good solution exists:
γ = untainted α = β = tainted
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
→
untainted ≤ α
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
→
untainted ≤ α tainted ≤ α
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
→
untainted ≤ α tainted ≤ α
α ≤ untainted
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
→
tainted ≤ α ≤ untainted
no solution for α
Multiple Conditionals
int printf(untainted char *fmt, ...);
tainted char *fgets(...);
void f(int x) { α char *y;
if (x) y = “hello!”;
else y = fgets(..., network_fd);
if (x) printf(y);
}
X
→
no solution for α False Alarm!
tainted ≤ α ≤ untainted
(and flow sensitivity won’t help)
Path Sensitivity
An analysis may consider path feasibility. E.g.,
•
f(x) can execute path
void f(int x) {
char *y;
1if (x) 2y = “hello!”; else 3y = fgets(...);
4if (x) 5printf(y); 6}
Path Sensitivity
An analysis may consider path feasibility. E.g.,
1-2-4-5-6 when x is not 0, or
•
f(x) can execute path
void f(int x) {
char *y;
1if (x) 2y = “hello!”; else 3y = fgets(...);
4if (x) 5printf(y); 6}
•
Path Sensitivity
An analysis may consider path feasibility. E.g.,
1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But,
•
f(x) can execute path
void f(int x) {
char *y;
1if (x) 2y = “hello!”; else 3y = fgets(...);
4if (x) 5printf(y); 6}
• •
Path Sensitivity
An analysis may consider path feasibility. E.g.,
1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But,
path 1-3-4-5-6 infeasible
•
f(x) can execute path
void f(int x) {
char *y;
1if (x) 2y = “hello!”; else 3y = fgets(...);
4if (x) 5printf(y); 6}
•
• •
•
• •
Path Sensitivity
An analysis may consider path feasibility. E.g.,
1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But,
path 1-3-4-5-6 infeasible
•
f(x) can execute path
void f(int x) {
char *y;
1if (x) 2y = “hello!”; else 3y = fgets(...);
4if (x) 5printf(y); 6}
A path sensitive analysis checks feasibility, e.g.,
•
by qualifying each constraint with a path condition
• • •
x ≠ 0 ⟹ untainted ≤ α x = 0 ⟹ tainted ≤ α
x ≠ 0 ⟹ α ≤ untainted
(segment 1-2) (segment 1-3)
(segment 4-5)
Why not flow/path sensitivity?
Why not flow/path sensitivity?
Flow sensitivity adds precision, and path sensitivity
•
adds even more, which is good
Why not flow/path sensitivity?
Flow sensitivity adds precision, and path sensitivity But both of these make solving more difficult
•
adds even more, which is good
•
•
Flow sensitivity also increases the number of nodes in the constraint graph
Path sensitivity requires more general solving
•
procedures to handle path conditions
Why not flow/path sensitivity?
•
•
Flow sensitivity also increases the number of nodes in the constraint graph
In short: precision (often) trades off scalability Ultimately, limits the size of programs we can analyze
•
•
Flow sensitivity adds precision, and path sensitivity But both of these make solving more difficult
•
adds even more, which is good
Path sensitivity requires more general solving
•
procedures to handle path conditions
Handling Function Calls
char *id( char *x) {
return x; }
α char *a = fgets(...); β char *b = id(a);
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
•
Names for arguments and return value
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
•
•
•
•
Names for arguments and return value
Calls create flows
from caller’s data to callee’s arguments, from callee’s result to caller’s returned value
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
tainted ≤ α
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
tainted ≤ α α≤γ
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
tainted ≤ α α≤γ
γ≤δ
Handling Function Calls
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
tainted ≤ α α≤γ
γ≤δ δ≤β
Function Call Example
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = “hi”; printf(c);
→
Function Call Example
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = “hi”; printf(c);
tainted ≤ α
Function Call Example
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = “hi”; printf(c);
→
tainted ≤ α α≤γ
γ≤δ δ≤β
Function Call Example
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = “hi”; printf(c);
→
tainted ≤ α α≤γ
γ≤δ
δ≤β untainted ≤ ω
→
Function Call Example
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = “hi”; printf(c);
tainted ≤ α α≤γ
γ≤δ
δ≤β untainted ≤ ω
No Alarm
Good solution exists:
ω = untainted
α = β = γ = δ = tainted
ω ≤ untainted
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
→
tainted ≤ α α≤γ
γ≤δ δ≤β
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
→
tainted ≤ α α≤γ
γ≤δ
δ≤β untainted ≤ γ
δ≤ω
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
→
tainted ≤ α α≤γ
γ≤δ
δ≤β untainted ≤ γ
δ≤ω
ω ≤ untainted
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
→
tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted False Alarm!
No solution, and yet no true tainted flow
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted False Alarm!
No solution, and yet no true tainted flow
Two Calls to Same Function
δ char *id(γ char *x) { return x;
}
α char *a = fgets(...); β char *b = id(a);
ω char *c = id(“hi”); printf(c);
Problematic constraints represent an infeasible path
False Alarm!
No solution, and yet no true tainted flow
tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted
•
•
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
•
•
•
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by
•
•
•
•
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
•
•
•
•
-
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by
distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program
•
•
•
•
- •
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns
•
•
•
•
- •
-
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program
matching up calls with the corresponding returns Label call and return edges
•
•
•
•
- •
- -
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns
Label call and return edges
Allow flows if the labels and polarities match
•
•
•
•
- •
- - -
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns
Label call and return edges
Allow flows if the labels and polarities match Use index -i for argument passing, i.e., q1 ≤-i q2
•
•
•
•
- •
- - - -
Context (In)sensitivity
This is a problem of context insensitivity All call sites are “conflated” in the graph
Context sensitivity solves this problem by distinguishing call sites in some way
We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns
Label call and return edges
Allow flows if the labels and polarities match Use index -i for argument passing, i.e., q1 ≤-i q2 Use index +i for returned values, i.e., q1 ≤+i q2
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
δ ≤+1 β
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
δ ≤+1 β untainted ≤-2 γ
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
δ ≤+1 β untainted ≤-2 γ
δ ≤+2 ω
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
δ ≤+1 β untainted ≤-2 γ
δ ≤+2 ω
ω ≤ untainted
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ≤δ
δ ≤+2 ω
ω ≤ untainted
Two Calls to Same Function
α char *a = fgets(...); β char *b = id1(a);
ω char *c = id2(“hi”); printf(c);
δ char *id(γ char *x) { return x;
}
tainted ≤ α α ≤-1 γ
γ ≤ δ
Indexes don’t match up
Infeasible flow not allowed
No Alarm
δ ≤+2 ω
ω ≤ untainted
•
Discussion Context sensitivity is a tradeoff again
•
•
-
Discussion Context sensitivity is a tradeoff again
Precision vs. scalability
O(n) insensitive algorithm becomes O(n3) sensitive algorithm
•
•
- •
-
Discussion Context sensitivity is a tradeoff again
Precision vs. scalability
O(n) insensitive algorithm becomes O(n3) sensitive algorithm
But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller)
Discussion Context sensitivity is a tradeoff again
Precision vs. scalability
O(n) insensitive algorithm becomes O(n3) sensitive algorithm
•
•
-
•
•
- •
Only some call sites treated sensitively Rest conflated
But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller)
-
Compromises possible
Discussion Context sensitivity is a tradeoff again
Precision vs. scalability
O(n) insensitive algorithm becomes O(n3) sensitive algorithm
•
•
-
•
-
•
•
- •
Only some call sites treated sensitively Rest conflated
Conflate groups of call sites Give them the same index
But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller)
-
Compromises possible
Discussion Context sensitivity is a tradeoff again
Precision vs. scalability
O(n) insensitive algorithm becomes O(n3) sensitive algorithm
•
•
-
•
-
•
-
•
•
- •
Only some call sites treated sensitively Rest conflated
Conflate groups of call sites Give them the same index
Sensitivity only up to a certain call depth Don’t do exact matching of edges beyond that depth
But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller)
-
Compromises possible
Flow Analysis:
Scaling it up
to a complete language and problem set
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α α≤β
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α α≤β
β≤γ
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α α≤β
β≤γ tainted ≤ ω
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α α≤β
β≤γ tainted ≤ ω
ω≤γ
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
untainted ≤ α α≤β
β≤γ tainted ≤ ω
ω≤γ
β ≤ untainted
→
untainted ≤ α α≤β
β≤γ tainted ≤ ω
ω≤γ
β ≤ untainted
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
Solution exists:
α = β = untainted ω = γ = tainted
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
→
Solution exists:
α = β = untainted ω = γ = tainted
Misses illegal flow!
• p and q are aliases
-so writing tainted data to q -makes p’s contents tainted
untainted ≤ α α≤β
β≤γ tainted ≤ ω
ω≤γ
β ≤ untainted
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
Solution exists:
α = β = untainted ω = γ = tainted
untainted ≤ α α≤β
β≤γ tainted ≤ ω
ω≤γ
β ≤ untainted
untainted ≤ α α≤β
β≤γ γ≤β
tainted ≤ ω ω≤γ
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
β ≤ untainted
Solution exists:
α = β = untainted ω = γ = tainted
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
untainted ≤ α α≤β
β≤γ γ≤β
tainted ≤ ω ω≤γ
β ≤ untainted
Solution exists:
α = β = untainted ω = γ = tainted
Pointers
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
untainted ≤ α α≤β
β≤γ γ≤β
tainted ≤ ω ω≤γ
β ≤ untainted
Solution exists:
α = β = untainted ω = γ = tainted
Pointers
Solution exists:
α = β = untainted ω = γ = tainted
α char *a = “hi”;
(β char *)*p = &a; (γ char *)*q = p;
ω char *b = fgets(...); *q = b;
printf(*p);
tainted ≤ ω ≤ γ
γ ≤ β ≤ untainted
•
•
•
An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms
Flow and pointers
•
•
•
•
An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms
Reducing alarms
Flow and pointers
•
•
•
•
•
An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms
Reducing alarms
If pointers are never assigned to (const) then backward flow is not needed (sound)
Flow and pointers
•
•
•
•
•
•
-
Flow and pointers
An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms
Reducing alarms
If pointers are never assigned to (const) then backward flow is not needed (sound)
Drop backward flow edge anyway
Trades false alarms for missed errors (unsoundness)
Implicit flows
void copy(tainted char *src,
untainted char *dst,
int len) {
untainted int i;
for (i = 0; i
Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance
•
•
•
tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;
Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance
Our copying example is pathological
We typically don’t write programs like this Implicit flows will have little overall influence
•
•
•
•
•
•
tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;
Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance
Our copying example is pathological
We typically don’t write programs like this Implicit flows will have little overall influence
So: tainting analyses tend to ignore implicit flows
•
•
•
•
•
• •
tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;
Other challenges
•
•
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably)
•
•
•
• •
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets
•
•
•
• •
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets
Struct fields
Track the taintedness of the whole struct, or each field?
Taintedness for each struct instance, or shared among all of them (or
something in between)?
Note: objects ≈ structs + function pointers
•
• •
–
•
•
•
• •
•
• •
–
•
Arrays
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets
Struct fields
Track the taintedness of the whole struct, or each field?
Taintedness for each struct instance, or shared among all of them (or something in between)?
Note: objects ≈ structs + function pointers
Keep track of taintedness of each array element, or one element
•
representing the whole array?
Refining taint analysis
•
•
Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index
•
•
•
•
Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index
Can expand taint analysis to handle sanitizers Functions to convert tainted data to untainted data
•
•
•
•
•
•
– •
Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index
Can expand taint analysis to handle sanitizers Functions to convert tainted data to untainted data
Other application: Leaking confidential data Don’t want secret sources to go to public sinks
Implicit flows more relevant in this setting Dual of tainting
Other kinds of analysis
•
•
Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Shares many elements of flow analysis. Really
•
advanced in the last 10 years.
•
•
•
Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Data Flow Analysis
Shares many elements of flow analysis. Really
•
advanced in the last 10 years.
Invented in the early 1970’s. Flow sensitive, tracks
•
“data flow facts” about variables in the program
•
•
•
•
Invented in the early 1970’s. Flow sensitive, tracks
Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Data Flow Analysis
Abstract interpretation
Shares many elements of flow analysis. Really
•
advanced in the last 10 years.
•
“data flow facts” about variables in the program
Invented in the late 1970’s as a theoretical foundation Associated with certain analysis algorithms
•
for data flow analysis, and static analysis generally.
•
Static analysis in practice
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings
Static analysis in practice
Commercial products
Fortify
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings
Static analysis in practice
Commercial products
Fortify
Open source tools
FindBugs
clang
analyzer &
KLEE
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings
Learning more
Learning more
•
Secure Programming with Static Analysis, by Brian Chess, goes into more depth about how static analysis tools work, and can aid secure software development
•
Learning more
•
Secure Programming with Static Analysis, by Brian Chess, goes into more depth about how static analysis tools work, and can aid secure software development
Principles of Program Analysis, by Nielson, Nielson, and Hankin, is a formal, mathematical presentation of different analysis methods
A bit dense for the casual reader, but
•
good for introducing the academic field