程序代写代做代考 graph algorithm go C Static Analysis

Static Analysis
Static Analysis
Slide deck courtesy of Prof. Michael Hicks, University of Maryland, College Park (UMD)

Static Analysis
for Secure Development



Static Analysis
for Secure Development
Introduction
Static analysis: What, and why?





Static Analysis
for Secure Development Introduction
Static analysis: What, and why?
Basic analysis
Example: Flow analysis







Static Analysis
for Secure Development Introduction
Static analysis: What, and why? Basic analysis
Example: Flow analysis
Increasing precision
Context-, flow-, and path sensitivity









Static Analysis
for Secure Development Introduction
Static analysis: What, and why? Basic analysis
Example: Flow analysis Increasing precision
Context-, flow-, and path sensitivity
Scaling it up
Pointers, arrays, information flow, …

Current Practice for Software Assurance



Testing
Make sure program runs correctly on set of inputs
Current Practice for Software Assurance



Testing
Make sure program runs correctly on set of inputs
inputs
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
program



Testing
Current Practice
for Software Assurance
Make sure program runs correctly on set of inputs
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
inputs
outputs
program



Testing
Make sure program runs correctly on set of inputs
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
inputs outputs program
oracle



Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .



Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Benefits: Concrete failure proves issue, aids in fix
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .



Testing
Make sure program runs correctly on set of inputs
inputs outputs Is it correct? program oracle
Benefits: Concrete failure proves issue, aids in fix Drawbacks: Expensive, difficult, hard to cover all
code paths, no guarantees
Current Practice for Software Assurance
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
– –

Current Practice (cont’d)



Code Auditing
Convince someone else your source code is correct
Current Practice (cont’d)



Code Auditing
Convince someone else your source code is correct
Current Practice (cont’d)
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++; *cmd = '\0'; /* throw away leading whitespace */ while (isascii(*p) && isspace(*p)) p++; /* decode command */ for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “” : vp);
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */


– –
Code Auditing
Convince someone else your source code is correct Benefit: humans can generalize beyond single runs
Current Practice (cont’d)
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++; *cmd = '\0'; /* throw away leading whitespace */ while (isascii(*p) && isspace(*p)) p++; /* decode command */ for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “” : vp);
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */


– – –
Code Auditing
Convince someone else your source code is correct Benefit: humans can generalize beyond single runs Drawbacks: Expensive, hard, no guarantees
Current Practice (cont’d)
???
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *));
if (fileno(OutChannel) != fileno(stdout)) {
/* arrange for debugging output to go to remote host */
(void) dup2(fileno(OutChannel), fileno(stdout)); }
settime(e);
peerhostname = RealHostName; if (peerhostname == NULL)
peerhostname = “localhost”; CurHostName = peerhostname; CurSmtpClient = macvalue(‘_’, e); if (CurSmtpClient == NULL)
CurSmtpClient = CurHostName;
setproctitle(“server %s startup”, CurSmtpClient); #if DAEMON
if (LogLevel > 11) {
/* log connection information */ sm_syslog(LOG_INFO, NOQID,
“SMTP connect from %.100s (%.100s)”,
CurSmtpClient, anynet_ntoa(&RealHostAddr)); }
#endif
/* output the first line, inserting “ESMTP” as second word */ expand(SmtpGreeting, inp, sizeof inp, e);
p = strchr(inp, ‘\n’);
if (p != NULL)
*p++ = ‘\0’;
id = strchr(inp, ‘ ‘); if (id == NULL)
id = &inp[strlen(inp)];
cmd = p == NULL ? “220 %.*s ESMTP%s” : “220-%.*s ESMTP%s”; message(cmd, id – inp, inp, id);
/* output remaining lines */
while ((id = p) != NULL && (p = strchr(id, ‘\n’)) != NULL) {
*p++ = ‘\0’;
if (isascii(*id) && isspace(*id))
cmd < &cmdbuf[sizeof cmdbuf - 2]) *cmd++ = *p++; *cmd = '\0'; /* throw away leading whitespace */ while (isascii(*p) && isspace(*p)) p++; /* decode command */ for (c = CmdTab; c->cmdname != NULL; c++) {
if (!strcasecmp(c->cmdname, cmdbuf)) break;
}
/* reset errors */ errno = 0;
/*
** Process command.
**
** If we are running as a null server, return 550 ** to everything.
*/
if (nullserver) {
switch (c->cmdcode) {
case CMDQUIT:
case CMDHELO: case CMDEHLO: case CMDNOOP:
/* process normally */ break;
default:
if (++badcommands > MAXBADCOMMANDS)
sleep(1);
usrerr(“550 Access denied”); continue;
while (isascii(*p) && isspace(*p)) p++;
if (*p == ‘\0’) break;
kp = p;
/* skip to the value portion */
while ((isascii(*p) && isalnum(*p)) || *p == ‘-‘)
p++;
if (*p == ‘=’) {
*p++ = ‘\0’; vp = p;
/* skip to the end of the value */ while (*p != ‘\0’ && *p != ‘ ‘ &&
!(isascii(*p) && iscntrl(*p)) &&
*p != ‘=’) p++;
}
if (*p != ‘\0’) *p++ = ‘\0’;
if (tTd(19, 1))
printf(“RCPT: got arg %s=\”%s\”\n”, kp,
vp == NULL ? “” : vp);
rcpt_esmtp_args(a, kp, vp, e); if (Errors > 0)
break; }
if (Errors > 0) break;
/* save in recipient list after ESMTP mods */ a = recipient(a, &e->e_sendqueue, 0, e);
if (Errors > 0)
break;
/* no errors during parsing, but might be a duplicate */ e->e_to = a->q_paddr;
if (!bitset(QBADADDR, a->q_flags))
{
message(“250 Recipient ok%s”, bitset(QQUEUEUP, a->q_flags) ?
” (will queue)” : “”); nrcpts++;
} else {
} }
/*
switch (c->cmdcode) {
non-null server */
case CMDMAIL: case CMDEXPN: case CMDVRFY:
/* punt — should keep message in ADDRESS…. */

If You’re Worried about Security…

If You’re Worried about Security…
A malicious adversary is trying to exploit anything you miss!
What more can we do?


Analyze program’s code without running it
Static analysis
In a sense, we are asking a computer to do what a

human might do during a code review


Analyze program’s code without running it
Static analysis
In a sense, we are asking a computer to do what a Benefit is (much) higher coverage

human might do during a code review


– –
Reason about many possible runs of the program Sometimes all of them, providing a guarantee
Reason about incomplete programs (e.g., libraries)


Analyze program’s code without running it

• • •
Static analysis
In a sense, we are asking a computer to do what a Benefit is (much) higher coverage

human might do during a code review


– –
Reason about many possible runs of the program Sometimes all of them, providing a guarantee
Reason about incomplete programs (e.g., libraries)
Drawbacks
Can only analyze limited properties
May miss some errors, or have false alarms Can be time consuming to run

Impact




Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning








Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning Encourages better development practices
Develop programming models that avoid mistakes in the first place
Encourage programmers to think about and make manifest their assumptions
Using annotations that improve tool precision









Impact
Thoroughly check limited but useful properties
Eliminate categories of errors
Developers can concentrate on deeper reasoning Encourages better development practices
Develop programming models that avoid mistakes in the first place
Encourage programmers to think about and make manifest their assumptions
Using annotations that improve tool precision Seeing increased commercial adoption

What is
Static Analysis?

The Halting Problem
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/

The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem

program P and inputs to it, P will terminate

Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/

The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
program P

program P and inputs to it, P will terminate

register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/


The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
program P analyzer

program P and inputs to it, P will terminate
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/

The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem

program P and inputs to it, P will terminate

register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .
Always terminates?
program P analyzer
Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/


The Halting Problem
Can we write an analyzer that can prove, for any Doing so is called the halting problem
Always terminates?
program P analyzer
Unfortunately, the halting problem is undecidable
That is, it is impossible to write such an analyzer: it will fail to produce an answer for at least some programs (and/or some inputs)

program P and inputs to it, P will terminate
register char *q;
char inp[MAXLINE];
char cmdbuf[MAXLINE];
extern ENVELOPE BlankEnvelope;
extern void help __P((char *));
extern void settime __P((ENVELOPE *));
extern bool enoughdiskspace __P((long));
extern int runinchild __P((char *, ENVELOPE *));
. . .


Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/

Other properties?



Perhaps security-related properties are feasible E.g., that all accesses a[i] are in bounds
Other properties?



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the

halting problem by transforming the program



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the

halting problem by transforming the program
I.e., a perfect array bounds checker could solve the

halting problem, which is impossible!



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the

halting problem by transforming the program

Other undecidable properties (Rice’s theorem)

halting problem, which is impossible!



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the

halting problem by transforming the program


Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source?

halting problem, which is impossible!



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the

halting problem by transforming the program



Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source? Is this pointer used after its memory is freed?

halting problem, which is impossible!



Other properties? Perhaps security-related properties are feasible
E.g., that all accesses a[i] are in bounds
But these properties can be converted into the I.e., a perfect array bounds checker could solve the

halting problem by transforming the program


– –
Other undecidable properties (Rice’s theorem) Does this SQL string come from a tainted source? Is this pointer used after its memory is freed?
Do any variables experience data races?

halting problem, which is impossible!

Halting ≈ Index in Bounds
Proof by transformation

Halting ≈ Index in Bounds
Proof by transformation
Change indexing expressions a[i] to exit
(i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination • • - - Halting ≈ Index in Bounds Proof by transformation Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination Change program exit points to out-of-bounds accesses a[a.length+10] • • - - • - Halting ≈ Index in Bounds Proof by transformation Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination Change program exit points to out-of-bounds accesses a[a.length+10] Now if the array bounds checker • • - • - • - Halting ≈ Index in Bounds Proof by transformation Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination Change program exit points to out-of-bounds accesses a[a.length+10] Now if the array bounds checker ... finds an error, then the original program halts • • - • • - • - Halting ≈ Index in Bounds Proof by transformation Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination Change program exit points to out-of-bounds accesses a[a.length+10] Now if the array bounds checker ... finds an error, then the original program halts • • - • • - • - ... claims there are no such errors, then the original • program does not halt Halting ≈ Index in Bounds Proof by transformation Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() Now all array bounds errors instead result in termination Change program exit points to out-of-bounds accesses a[a.length+10] Now if the array bounds checker ... finds an error, then the original program halts ... contradiction! with undecidability of the halting problem • • - • • - • - ... claims there are no such errors, then the original • program does not halt • - Static analysis is impossible? Static analysis is impossible? Perfect static analysis is not possible • Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite • • Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or • • Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or 2. False alarms - claimed errors are not really errors, or • • Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or 2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free • • Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or 2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free • • Nonterminating analyses are confusing, so tools tend • to exhibit only false alarms and/or missed errors Static analysis is impossible? Perfect static analysis is not possible Useful static analysis is perfectly possible, despite 1. Nontermination - analyzer never terminates, or 2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free Fall somewhere between soundness and completeness • • Nonterminating analyses are confusing, so tools tend • to exhibit only false alarms and/or missed errors • Soundness If analysis says that X is true, then X is true. Soundness If analysis says that X is true, then X is true. Things I say Soundness If analysis says that X is true, then X is true. True things Trivially Sound: Say nothing Things I say Soundness Completeness If analysis says that X is If X is true, then analysis true, then X is true. says X is true. True things Trivially Sound: Say nothing Things I say Soundness Completeness If analysis says that X is If X is true, then analysis true, then X is true. says X is true. True things Trivially Sound: Say nothing Things I say Things I say Soundness Completeness If analysis says that X is true, then X is true. True things Trivially Sound: Say nothing If X is true, then analysis says X is true. Things I say True things Things I say Trivially Complete: Say everything Soundness Completeness If analysis says that X is If X is true, then analysis true, then X is true. says X is true. ThruinegsthIinsgasy are all True things Sound and Complete: Say exactly the set of true things Stepping back Stepping back Soundness: if the program is claimed to be error Alarms do not imply erroneousness • free, then it really is • • Stepping back Soundness: if the program is claimed to be error Alarms do not imply erroneousness • free, then it really is Completeness: if the program is claimed to be Silence does not imply error freedom • erroneous, then it really is • • Stepping back Soundness: if the program is claimed to be error Alarms do not imply erroneousness • free, then it really is Completeness: if the program is claimed to be Silence does not imply error freedom • erroneous, then it really is • • • • Essentially, most interesting analyses are neither sound nor complete (and not both) ... usually lean toward soundness (“soundy”) or completeness The Art of Static Analysis The Art of Static Analysis Analysis design tradeoffs • The Art of Static Analysis Analysis design tradeoffs • Precision: Carefully model program behavior, to • minimize false alarms The Art of Static Analysis Analysis design tradeoffs • Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs • minimize false alarms • The Art of Static Analysis Analysis design tradeoffs Scalability: Successfully analyze large programs Understandability: Error reports should be actionable • Precision: Carefully model program behavior, to • minimize false alarms • • The Art of Static Analysis Analysis design tradeoffs • • • • Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs • minimize false alarms Understandability: Error reports should be actionable Observation: Code style is important The Art of Static Analysis Analysis design tradeoffs • • • Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs • minimize false alarms • • Observation: Code style is important Aim to be precise for “good” programs Understandability: Error reports should be actionable The Art of Static Analysis Analysis design tradeoffs • Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs • minimize false alarms • • • • • False alarms viewed positively: reduces complexity • Understandability: Error reports should be actionable Observation: Code style is important Aim to be precise for “good” programs It’s OK to forbid yucky code in the name of safety The Art of Static Analysis Analysis design tradeoffs • Precision: Carefully model program behavior, to Scalability: Successfully analyze large programs • minimize false alarms • • • • • False alarms viewed positively: reduces complexity • Understandability: Error reports should be actionable Observation: Code style is important Aim to be precise for “good” programs It’s OK to forbid yucky code in the name of safety Code that is more understandable to the analysis is more • understandable to humans Flow Analysis Flow Analysis Tainted Flow Analysis • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input • • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input Input from the user is tainted • • • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input Input from the user is tainted Various data is used, assuming it is untainted • • • • • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input Input from the user is tainted Various data is used, assuming it is untainted Examples expecting untainted data source string of strcpy (≤ target buffer size) • • • • • • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input Input from the user is tainted Various data is used, assuming it is untainted Examples expecting untainted data source string of strcpy (≤ target buffer size) format string of printf (contains no format specifiers) • • • • • • • Tainted Flow Analysis The root cause of many attacks is trusting unvalidated input Input from the user is tainted Various data is used, assuming it is untainted Examples expecting untainted data source string of strcpy (≤ target buffer size) format string of printf (contains no format specifiers) form field used in constructed SQL query (contains no SQL commands) Recall: Format String Attack Recall: Format String Attack Adversary-controlled format string • char *name = fgets(..., network_fd); printf(name); // Oops Recall: Format String Attack Adversary-controlled format string • char *name = fgets(..., network_fd); printf(name); // Oops • Attacker sets name = “%s%s%s” to crash program Recall: Format String Attack Adversary-controlled format string • char *name = fgets(..., network_fd); printf(name); // Oops • • - Attacker sets name = “%s%s%s” to crash program Attacker sets name = “...%n...” to write to memory Yields code injection exploits Recall: Format String Attack Adversary-controlled format string • char *name = fgets(..., network_fd); printf(name); // Oops • • Attacker sets name = “%s%s%s” to crash program Attacker sets name = “...%n...” to write to memory Yields code injection exploits These bugs still occur in the wild Too restrictive to forbid non-constant format strings • • - • The problem, in types Specify our requirement as a type qualifier int printf(untainted char *fmt, ...); tainted char *fgets(...); • The problem, in types Specify our requirement as a type qualifier int printf(untainted char *fmt, ...); tainted char *fgets(...); • tainted = possibly controlled by adversary • The problem, in types Specify our requirement as a type qualifier int printf(untainted char *fmt, ...); tainted char *fgets(...); • • tainted = possibly controlled by adversary untainted = must not be controlled by adversary • The problem, in types Specify our requirement as a type qualifier int printf(untainted char *fmt, ...); tainted char *fgets(...); • • tainted = possibly controlled by adversary untainted = must not be controlled by adversary tainted char *name = fgets(...,network_fd); printf(name); // FAIL: tainted ≠ untainted • No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected Analysis problem • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source • • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source no annotation means: not sure (analysis figures it out) • • • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source no annotation means: not sure (analysis figures it out) A solution requires inferring flows in the program • • • • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source no annotation means: not sure (analysis figures it out) A solution requires inferring flows in the program What sources can reach what sinks • • • • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source no annotation means: not sure (analysis figures it out) A solution requires inferring flows in the program What sources can reach what sinks If any flows are illegal, i.e., whether a tainted source • may flow to an untainted sink • • • • • • • Analysis problem No tainted data flows: For all possible inputs, prove that tainted data will never be used where untainted data is expected untainted annotation: indicates a trusted sink tainted annotation: an untrusted source no annotation means: not sure (analysis figures it out) A solution requires inferring flows in the program What sources can reach what sinks If any flows are illegal, i.e., whether a tainted source We will aim to develop a sound analysis • may flow to an untainted sink Legal Flow Legal Flow void f(tainted int); untainted int a = ...; f(a); Legal Flow void f(tainted int); untainted int a = ...; f(a); f accepts tainted or untainted data Legal Flow Illegal Flow void f(tainted int); untainted int a = ...; f(a); f accepts tainted or untainted data Legal Flow Illegal Flow void f(tainted int); untainted int a = ...; f(a); void g(untainted int); tainted int b = ...; g(b); f accepts tainted or untainted data Legal Flow Illegal Flow void f(tainted int); untainted int a = ...; f(a); void g(untainted int); tainted int b = ...; g(b); f accepts tainted or untainted data g accepts only untainted data Legal Flow f accepts tainted or untainted data Allowed flow as a lattice Illegal Flow void f(tainted int); untainted int a = ...; f(a); void g(untainted int); tainted int b = ...; g(b); g accepts only untainted data untainted < tainted Legal Flow Illegal Flow void f(tainted int); untainted int a = ...; f(a); void g(untainted int); tainted int b = ...; g(b); untainted ≤ tainted Allowed flow as a lattice tainted ≤ untainted untainted < tainted Legal Flow Illegal Flow void f(tainted int); untainted int a = ...; f(a); void g(untainted int); tainted int b = ...; g(b); untainted ≤ tainted Allowed flow as a lattice tainted ≤ untainted tainted untainted • • Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it • • • • Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: Create a name for each missing qualifier (e.g., α, β) • • • • Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: • constraints (of the form q1 ≤ q2) on possible solutions • • • • Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: • constraints (of the form q1 ≤ q2) on possible solutions Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier - and qx is x’s qualifier • • • • Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc. • constraints (of the form q1 ≤ q2) on possible solutions • - and qx is x’s qualifier • • • • Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: • constraints (of the form q1 ≤ q2) on possible solutions • - Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc. A solution is a substitution of qualifiers (like tainted or untainted) for names (like α and β) such that all of the constraints are legal flows - and qx is x’s qualifier • • • • Create a name for each missing qualifier (e.g., α, β) For each statement in the program, generate • Analysis Approach Think of flow analysis as a kind of type inference If no qualifier is present, we must infer it Steps: • constraints (of the form q1 ≤ q2) on possible solutions Statement x = y generates constraint qy ≤ qx where qy is y’s qualifier Solve the constraints to produce solutions for α, β, etc. A solution is a substitution of qualifiers (like tainted or untainted) for names (like α and β) such that all of the constraints are legal flows If there is no solution, we (may) have an illegal flow • - - and qx is x’s qualifier Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); char *name = fgets(..., network_fd); char *x = name; printf(x); Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α α≤β Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α α≤β β ≤ untainted Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α ≤ β ≤ untainted Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α ≤ β ≤ untainted First constraint requires α = tainted To satisfy the second constraint implies β = tainted But then the third constraint is illegal: tainted ≤ untainted Example Analysis int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x = name; printf(x); tainted ≤ α ≤ β ≤ untainted Illegal flow! No possible solution for α and β Flow Analysis: Adding Sensitivity Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); → Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); tainted ≤ α Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); → tainted ≤ α α≤β Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); → tainted ≤ α α≤β untainted ≤ β Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); → tainted ≤ α α≤β untainted ≤ β β ≤ untainted Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; if (...) x = name; else x = “hello!”; printf(x); → tainted ≤ α α≤β β ≤ untainted Constraints still unsolvable Illegal flow Dropping the Conditional int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; x = name; x = “hello!”; printf(x); Dropping the Conditional int printf(untainted char *fmt, ...); tainted char *fgets(...); → α char *name = fgets(..., network_fd); β char *x; x = name; x = “hello!”; printf(x); tainted ≤ α Dropping the Conditional int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; x = name; x = “hello!”; printf(x); → tainted ≤ α α≤β Dropping the Conditional int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; x = name; x = “hello!”; printf(x); → tainted ≤ α α≤β untainted ≤ β Dropping the Conditional int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x; x = name; x = “hello!”; printf(x); → tainted ≤ α α≤β untainted ≤ β β ≤ untainted Same constraints, different semantics! False Alarm • Flow Sensitivity Our analysis is flow insensitive • • Flow Sensitivity Our analysis is flow insensitive Each variable has one qualifier which abstracts the taintedness of all values it ever contains • • Flow Sensitivity Our analysis is flow insensitive Each variable has one qualifier which abstracts the taintedness of all values it ever contains A flow sensitive analysis would account for • variables whose contents change • • Flow Sensitivity Our analysis is flow insensitive Each variable has one qualifier which abstracts the taintedness of all values it ever contains A flow sensitive analysis would account for Allow each assigned use of a variable to have a • variables whose contents change • different qualifier E.g., α1 is x’s qualifier at line 1, but α2 is the qualifier at line 2, where - α1 and α2 can differ • • Flow Sensitivity Our analysis is flow insensitive Each variable has one qualifier which abstracts the taintedness of all values it ever contains A flow sensitive analysis would account for Allow each assigned use of a variable to have a • variables whose contents change • different qualifier • - E.g., α1 is x’s qualifier at line 1, but α2 is the qualifier at line 2, where Could implement this by transforming the program to assign to a variable at most once Called static single assignment (SSA) form - α1 and α2 can differ Reworked Example int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x1, γ *x2; x1 = name; x2 = “%s”; printf(x2); → Reworked Example int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x1, γ *x2; x1 = name; x2 = “%s”; printf(x2); tainted ≤ α Reworked Example int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x1, γ *x2; x1 = name; x2 = “%s”; printf(x2); → tainted ≤ α α≤β Reworked Example int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x1, γ *x2; x1 = name; x2 = “%s”; printf(x2); → tainted ≤ α α≤β untainted ≤ γ Reworked Example int printf(untainted char *fmt, ...); tainted char *fgets(...); α char *name = fgets(..., network_fd); β char *x1, γ *x2; x1 = name; x2 = “%s”; printf(x2); → tainted ≤ α α ≤ β untainted ≤ γ γ ≤ untainted No Alarm Good solution exists: γ = untainted α = β = tainted Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } → untainted ≤ α Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } → untainted ≤ α tainted ≤ α Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } → untainted ≤ α tainted ≤ α α ≤ untainted Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } → tainted ≤ α ≤ untainted no solution for α Multiple Conditionals int printf(untainted char *fmt, ...); tainted char *fgets(...); void f(int x) { α char *y; if (x) y = “hello!”; else y = fgets(..., network_fd); if (x) printf(y); } X → no solution for α False Alarm! tainted ≤ α ≤ untainted (and flow sensitivity won’t help) Path Sensitivity An analysis may consider path feasibility. E.g., • f(x) can execute path void f(int x) { char *y; 1if (x) 2y = “hello!”; else 3y = fgets(...); 4if (x) 5printf(y); 6} Path Sensitivity An analysis may consider path feasibility. E.g., 1-2-4-5-6 when x is not 0, or • f(x) can execute path void f(int x) { char *y; 1if (x) 2y = “hello!”; else 3y = fgets(...); 4if (x) 5printf(y); 6} • Path Sensitivity An analysis may consider path feasibility. E.g., 1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But, • f(x) can execute path void f(int x) { char *y; 1if (x) 2y = “hello!”; else 3y = fgets(...); 4if (x) 5printf(y); 6} • • Path Sensitivity An analysis may consider path feasibility. E.g., 1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But, path 1-3-4-5-6 infeasible • f(x) can execute path void f(int x) { char *y; 1if (x) 2y = “hello!”; else 3y = fgets(...); 4if (x) 5printf(y); 6} • • • • • • Path Sensitivity An analysis may consider path feasibility. E.g., 1-2-4-5-6 when x is not 0, or 1-3-4-6 when x is 0. But, path 1-3-4-5-6 infeasible • f(x) can execute path void f(int x) { char *y; 1if (x) 2y = “hello!”; else 3y = fgets(...); 4if (x) 5printf(y); 6} A path sensitive analysis checks feasibility, e.g., • by qualifying each constraint with a path condition • • • x ≠ 0 ⟹ untainted ≤ α x = 0 ⟹ tainted ≤ α x ≠ 0 ⟹ α ≤ untainted (segment 1-2) (segment 1-3) (segment 4-5) Why not flow/path sensitivity? Why not flow/path sensitivity? Flow sensitivity adds precision, and path sensitivity • adds even more, which is good Why not flow/path sensitivity? Flow sensitivity adds precision, and path sensitivity But both of these make solving more difficult • adds even more, which is good • • Flow sensitivity also increases the number of nodes in the constraint graph Path sensitivity requires more general solving • procedures to handle path conditions Why not flow/path sensitivity? • • Flow sensitivity also increases the number of nodes in the constraint graph In short: precision (often) trades off scalability Ultimately, limits the size of programs we can analyze • • Flow sensitivity adds precision, and path sensitivity But both of these make solving more difficult • adds even more, which is good Path sensitivity requires more general solving • procedures to handle path conditions Handling Function Calls char *id( char *x) { return x; } α char *a = fgets(...); β char *b = id(a); Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); • Names for arguments and return value Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); • • • • Names for arguments and return value Calls create flows from caller’s data to callee’s arguments, from callee’s result to caller’s returned value Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); tainted ≤ α Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); tainted ≤ α α≤γ Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); tainted ≤ α α≤γ γ≤δ Handling Function Calls δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); tainted ≤ α α≤γ γ≤δ δ≤β Function Call Example δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = “hi”; printf(c); → Function Call Example δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = “hi”; printf(c); tainted ≤ α Function Call Example δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = “hi”; printf(c); → tainted ≤ α α≤γ γ≤δ δ≤β Function Call Example δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = “hi”; printf(c); → tainted ≤ α α≤γ γ≤δ δ≤β untainted ≤ ω → Function Call Example δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = “hi”; printf(c); tainted ≤ α α≤γ γ≤δ δ≤β untainted ≤ ω No Alarm Good solution exists: ω = untainted α = β = γ = δ = tainted ω ≤ untainted Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); → tainted ≤ α α≤γ γ≤δ δ≤β Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); → tainted ≤ α α≤γ γ≤δ δ≤β untainted ≤ γ δ≤ω Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); → tainted ≤ α α≤γ γ≤δ δ≤β untainted ≤ γ δ≤ω ω ≤ untainted Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); → tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted False Alarm! No solution, and yet no true tainted flow Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted False Alarm! No solution, and yet no true tainted flow Two Calls to Same Function δ char *id(γ char *x) { return x; } α char *a = fgets(...); β char *b = id(a); ω char *c = id(“hi”); printf(c); Problematic constraints represent an infeasible path False Alarm! No solution, and yet no true tainted flow tainted ≤ α ≤ γ ≤ δ ≤ ω ≤ untainted • • Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph • • • Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by • • • • Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way • • • • - Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program • • • • - • Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns • • • • - • - Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns Label call and return edges • • • • - • - - Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns Label call and return edges Allow flows if the labels and polarities match • • • • - • - - - Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns Label call and return edges Allow flows if the labels and polarities match Use index -i for argument passing, i.e., q1 ≤-i q2 • • • • - • - - - - Context (In)sensitivity This is a problem of context insensitivity All call sites are “conflated” in the graph Context sensitivity solves this problem by distinguishing call sites in some way We can give them a label i, e.g., the line number in the program matching up calls with the corresponding returns Label call and return edges Allow flows if the labels and polarities match Use index -i for argument passing, i.e., q1 ≤-i q2 Use index +i for returned values, i.e., q1 ≤+i q2 Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ δ ≤+1 β Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ δ ≤+1 β untainted ≤-2 γ Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ δ ≤+1 β untainted ≤-2 γ δ ≤+2 ω Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ δ ≤+1 β untainted ≤-2 γ δ ≤+2 ω ω ≤ untainted Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ≤δ δ ≤+2 ω ω ≤ untainted Two Calls to Same Function α char *a = fgets(...); β char *b = id1(a); ω char *c = id2(“hi”); printf(c); δ char *id(γ char *x) { return x; } tainted ≤ α α ≤-1 γ γ ≤ δ Indexes don’t match up Infeasible flow not allowed No Alarm δ ≤+2 ω ω ≤ untainted • Discussion Context sensitivity is a tradeoff again • • - Discussion Context sensitivity is a tradeoff again Precision vs. scalability O(n) insensitive algorithm becomes O(n3) sensitive algorithm • • - • - Discussion Context sensitivity is a tradeoff again Precision vs. scalability O(n) insensitive algorithm becomes O(n3) sensitive algorithm But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller) Discussion Context sensitivity is a tradeoff again Precision vs. scalability O(n) insensitive algorithm becomes O(n3) sensitive algorithm • • - • • - • Only some call sites treated sensitively Rest conflated But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller) - Compromises possible Discussion Context sensitivity is a tradeoff again Precision vs. scalability O(n) insensitive algorithm becomes O(n3) sensitive algorithm • • - • - • • - • Only some call sites treated sensitively Rest conflated Conflate groups of call sites Give them the same index But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller) - Compromises possible Discussion Context sensitivity is a tradeoff again Precision vs. scalability O(n) insensitive algorithm becomes O(n3) sensitive algorithm • • - • - • - • • - • Only some call sites treated sensitively Rest conflated Conflate groups of call sites Give them the same index Sensitivity only up to a certain call depth Don’t do exact matching of edges beyond that depth But: sometimes higher precision improves performance Eliminates infeasible paths from consideration (makes n smaller) - Compromises possible Flow Analysis: Scaling it up to a complete language and problem set Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α α≤β Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α α≤β β≤γ Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α α≤β β≤γ tainted ≤ ω Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α α≤β β≤γ tainted ≤ ω ω≤γ Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → untainted ≤ α α≤β β≤γ tainted ≤ ω ω≤γ β ≤ untainted → untainted ≤ α α≤β β≤γ tainted ≤ ω ω≤γ β ≤ untainted Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); Solution exists: α = β = untainted ω = γ = tainted Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); → Solution exists: α = β = untainted ω = γ = tainted Misses illegal flow! • p and q are aliases -so writing tainted data to q -makes p’s contents tainted untainted ≤ α α≤β β≤γ tainted ≤ ω ω≤γ β ≤ untainted Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); Solution exists: α = β = untainted ω = γ = tainted untainted ≤ α α≤β β≤γ tainted ≤ ω ω≤γ β ≤ untainted untainted ≤ α α≤β β≤γ γ≤β tainted ≤ ω ω≤γ Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); β ≤ untainted Solution exists: α = β = untainted ω = γ = tainted Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); untainted ≤ α α≤β β≤γ γ≤β tainted ≤ ω ω≤γ β ≤ untainted Solution exists: α = β = untainted ω = γ = tainted Pointers α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); untainted ≤ α α≤β β≤γ γ≤β tainted ≤ ω ω≤γ β ≤ untainted Solution exists: α = β = untainted ω = γ = tainted Pointers Solution exists: α = β = untainted ω = γ = tainted α char *a = “hi”; (β char *)*p = &a; (γ char *)*q = p; ω char *b = fgets(...); *q = b; printf(*p); tainted ≤ ω ≤ γ γ ≤ β ≤ untainted • • • An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms Flow and pointers • • • • An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms Reducing alarms Flow and pointers • • • • • An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms Reducing alarms If pointers are never assigned to (const) then backward flow is not needed (sound) Flow and pointers • • • • • • - Flow and pointers An assignment via a pointer “flows both ways” Ensures that aliasing constraints are sound But can lead to false alarms Reducing alarms If pointers are never assigned to (const) then backward flow is not needed (sound) Drop backward flow edge anyway Trades false alarms for missed errors (unsoundness) Implicit flows void copy(tainted char *src, untainted char *dst, int len) { untainted int i; for (i = 0; i 0) dst = 0; else dst = 0;

Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance



tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;

Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance
Our copying example is pathological
We typically don’t write programs like this Implicit flows will have little overall influence






tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;

Why not information flow?
Tracking implicit flows with a pc label can lead to false alarms
E.g., ignores values
Extra constraints also hurt performance
Our copying example is pathological
We typically don’t write programs like this Implicit flows will have little overall influence
So: tainting analyses tend to ignore implicit flows





• •
tainted int src;
α int dst;
if (src > 0) dst = 0; else dst = 0;

Other challenges



Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably)




• •
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets




• •
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets
Struct fields
Track the taintedness of the whole struct, or each field?
Taintedness for each struct instance, or shared among all of them (or
something in between)?
Note: objects ≈ structs + function pointers

• •




• •

• •


Arrays
Other challenges
Taint through operations
tainted a; untainted b; c=a+b — is c tainted? (yes, probably) Function pointers
What function can this call go to?
Can flow analysis to compute possible targets
Struct fields
Track the taintedness of the whole struct, or each field?
Taintedness for each struct instance, or shared among all of them (or something in between)?
Note: objects ≈ structs + function pointers
Keep track of taintedness of each array element, or one element

representing the whole array?

Refining taint analysis



Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index





Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index
Can expand taint analysis to handle sanitizers Functions to convert tainted data to untainted data







– •
Refining taint analysis
Can label additional sources and sinks
Array bounds accesses: must have untainted index
Can expand taint analysis to handle sanitizers Functions to convert tainted data to untainted data
Other application: Leaking confidential data Don’t want secret sources to go to public sinks
Implicit flows more relevant in this setting Dual of tainting

Other kinds of analysis



Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Shares many elements of flow analysis. Really

advanced in the last 10 years.




Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Data Flow Analysis
Shares many elements of flow analysis. Really

advanced in the last 10 years.
Invented in the early 1970’s. Flow sensitive, tracks

“data flow facts” about variables in the program





Invented in the early 1970’s. Flow sensitive, tracks
Other kinds of analysis
Pointer Analysis (“points-to” analysis)
Determine whether pointers point to the same locations
Data Flow Analysis
Abstract interpretation
Shares many elements of flow analysis. Really

advanced in the last 10 years.

“data flow facts” about variables in the program
Invented in the late 1970’s as a theoretical foundation Associated with certain analysis algorithms

for data flow analysis, and static analysis generally.

Static analysis in practice
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings

Static analysis in practice
Commercial products
Fortify
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings

Static analysis in practice
Commercial products
Fortify
Open source tools
FindBugs
clang
analyzer &
KLEE
Caveat: appearance in the above list is not an implicit endorsement, and these are only a sample of available offerings

Learning more

Learning more

Secure Programming with Static Analysis, by Brian Chess, goes into more depth about how static analysis tools work, and can aid secure software development


Learning more

Secure Programming with Static Analysis, by Brian Chess, goes into more depth about how static analysis tools work, and can aid secure software development
Principles of Program Analysis, by Nielson, Nielson, and Hankin, is a formal, mathematical presentation of different analysis methods
A bit dense for the casual reader, but

good for introducing the academic field