Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Penumbra:
Automatically Identifying
Failure-Relevant Inputs
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
Supported in part by:
NSF awards CCF-0725202 and CCF-0541080
to Georgia Tech

Automated Debugging
• Gupta and colleagues ’05
• Jones and colleagues ’02
• Korel and Laski ’88
• Liblit and colleagues ’05
• Nainar and colleagues ’07
• Renieris and Reiss ’03
• Seward and Nethercote ’05
• Tucek and colleagues ’07
• Weiser ’81
• Zhang and colleagues ’05
• ...

Automated Debugging
Code-centric
• Weiser ’81
• ...

Automated Debugging
Code-centric
• Weiser ’81
• ...
What about inputs which cause the failure?

• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Data-centric Techniques

Delta Debugging

Delta Debugging
Requires:
1. Multiple executions
2. Large amounts of manual
effort (oracle creation, setup)

Delta Debugging
Requires:
Penumbra

Delta Debugging
Requires:
Penumbra
Comparable
performance

Delta Debugging
Requires:
Requires:
1. Single execution
2. Reduced manual effort
Penumbra
Comparable
performance

Intuition and Terminology
Failure-revealing input vector

Failure-relevant subset
(inputs which are useful for investigating the failure)

Failure-relevant subset
(inputs which are useful for investigating the failure)
Approximate failure-relevant subsets by
identifying inputs that reach the failure along
program dependencies.

Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
ﬁleinfo

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
fileinfo
Command line arguments
(flag, list of file names)

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
fileinfo
File statistics (for each file)
(size, last modified date, ...)

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
fileinfo
File contents (for each file)
(first 50 characters)

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
fileinfo
File contents (for each file)
(first 50 characters)
Inputvector

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo
Overflow out

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo
buf.st_size ≥ 1GB
Overflow out

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out

Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out
read 50 characters

1. Many more inputs than lines
of code.
Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo

of code.
2. Understanding the failure
requires tracing interactions
between inputs from multiple
sources.
Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo

of code.
2. Understanding the failure
requires tracing interactions
between inputs from multiple
sources.
3. Only a small percentage of all
inputs are relevant for the
failure.
Motivating Example
2. struct stat buf;
4. for(i = 2; i < argc; i++) {
6. fstat(fd, &buf);
9. if(verbose) {
12. pview[50] = '0';
14. }
17. }
}
ﬁleinfo

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
Relevant context:
1. When the failure occurs.
2. Which data are involved in
the failure.

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
In general, it is chosen using
traditional debugging methods.

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
1
2
3
4
5
6
7
8
9
0

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
1
2
3
4
5
6
7
8
9
0

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9

ﬁleinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
verbose is true
read 50 characters
buf.st_size ≥ 1GB

Outline
• Penumbra approach
1. Tainting inputs
2. Propagating taint marks
3. Identifying relevant inputs
• Evaluation
• Conclusions and future work

1: Tainting Inputs
Assign a taint mark to each input as it enters the application.

1: Tainting Inputs
Per-byte Per-entity Domain speciﬁc

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation
Unnecessarily
expensive

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation
Unnecessarily
expensive
Maintains per -
byte precision

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Maintains per -
byte precision

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
Precise
identiﬁcation
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Maintains per -
byte precision
Further increases
scalability

1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from ﬁles)
Assign the same
taint mark to
related bytes.
Assign taint marks
based on user-
provided
information.
When a taint mark is assigned to an input, log the
input’s value and where the input was read from.
Precise
identiﬁcation
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Maintains per -
byte precision
Further increases
scalability

2: Propagating Taint Marks
Data-ﬂow
Propagation (DF)
Data- and control-ﬂow
Propagation (DF + CF)

Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
Data-flow
Propagation (DF)

data dependencies.
C = A + B;
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
1 2
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
1 21 2
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
1 2 3
Data-ﬂow
Propagation (DF)

data dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
1 2 3
The effectiveness of each option depends on the particular failure.
Data-ﬂow
Propagation (DF)

3: Identifying Relevant-inputs
1. Relevant context indicates
which data is involved in the
considered failure.
2. Identify which taint marks as
associated with the data
indicated by the relevant
context.
3. Use recorded logs to
reconstruct inputs that are
identiﬁed by the taint marks.
Baz
1.5GB

Prototype Implementation
Trace
Processor
Trace
generator

input vector
executable
trace
relevant context
Trace
Processor
Trace
generator

input vector
executable
trace
relevant context
Trace
Processor
Trace
generator
Implemented using Dytan, a
generic x86 tainting framework
developed in previous work
[Clause and Orso 2007].

input vector
executable
trace
relevant context
Trace
Processor
Trace
generator
input subset
(DF)
input subset
(DF+CF)

Evaluation
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging

Evaluation
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:

Evaluation
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
We selected a failure-revealing input vector for each subject.

Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.

Data Generation
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
inputs (DF and DF +
CF)
Debugging
implementation to
minimize inputs.
• Location: statement where
the failure occurs.
• Data: any data read by such
statement

Data Generation
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
inputs (DF and DF +
CF)
Debugging
implementation to
minimize inputs.
• Use gdb to inspect stack
trace and program data.
• One second timeout to
prevent incorrect results.

Study 1: Effectiveness
Is the information that
Penumbra provides helpful for
debugging real failures?

Study 1 Results: gzip & ncompress
Crash when a ﬁle name is longer than 1,024 characters.

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Inputs: 10,000,056
long
filename[ ]

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]

Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
# Relevant (DF + CF): 3
long
filename[ ]

Study 1 Results: pine
Crash when a “from” ﬁeld contains 22 or more double quote characters.

# Inputs: 15,103,766
...
From clause@boar Tue Feb 20 11:49:53 2007
Return-Path: <clause@boar>
X-Original-To: clause
Delivered-To: clause@boar
Received: by boar (Postﬁx, from userid 1000)
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Message-Id: <20070220164953.88EDD1724523@boar>
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...

# Inputs: 15,103,766
...
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
… …" " " " " " " " " " " "

...
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
… …" " " " " " " " " " " "

# Relevant (DF + CF):15,100,344
...
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
… …" " " " " " " " " " " "

Study 1: Conclusions
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes
effective.
➡ Use data-flow first then, if necessary, use control-flow.

Study 1: Conclusions
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes
effective.
➡ Use data-flow first then, if necessary, use control-flow.
2. Inputs identified by Penumbra correspond to the
failure conditions.
➡ Our technique is effective in assisting the debugging of
real failures.

RQ1: How much manual effort
does each technique require?
RQ2: How long does it take to
ﬁx a considered failure given
the information provided by
each technique?

RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.

RQ1: Manual effort
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid

RQ1: Manual effort
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
Penumbra requires considerably less setup time than Delta Debugging
(although more time time overall for gzip and ncompress).

RQ2: Debugging Effort
Use number of relevant inputs as a proxy for debugging effort.

Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —

DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

Conclusions & Future Work
• Novel technique for identifying failure-relevant
inputs.
• Overcomes limitations of existing approaches
• Single execution
• Minimal manual effort
• Comparable effectiveness
• Combine Penumbra with existing code-centric
techniques.

Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009) (20)

More from James Clause (11)

Recently uploaded (20)

Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)