Penumbra:
Automatically Identifying
Failure-Relevant Inputs
James Clause and Alessandro Orso
College of Computing
Georgia Institute of Technology
Supported in part by:
NSF awards CCF-0725202 and CCF-0541080
to Georgia Tech
Automated Debugging
• Gupta and colleagues ’05
• Jones and colleagues ’02
• Korel and Laski ’88
• Liblit and colleagues ’05
• Nainar and colleagues ’07
• Renieris and Reiss ’03
• Seward and Nethercote ’05
• Tucek and colleagues ’07
• Weiser ’81
• Zhang and colleagues ’05
• Zhang and colleagues ’06
• ...
Automated Debugging
Code-centric
• Gupta and colleagues ’05
• Jones and colleagues ’02
• Korel and Laski ’88
• Liblit and colleagues ’05
• Nainar and colleagues ’07
• Renieris and Reiss ’03
• Seward and Nethercote ’05
• Tucek and colleagues ’07
• Weiser ’81
• Zhang and colleagues ’05
• Zhang and colleagues ’06
• ...
Automated Debugging
Code-centric
• Gupta and colleagues ’05
• Jones and colleagues ’02
• Korel and Laski ’88
• Liblit and colleagues ’05
• Nainar and colleagues ’07
• Renieris and Reiss ’03
• Seward and Nethercote ’05
• Tucek and colleagues ’07
• Weiser ’81
• Zhang and colleagues ’05
• Zhang and colleagues ’06
• ...
What about inputs which cause the failure?
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Data-centric Techniques
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Delta Debugging
Data-centric Techniques
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Delta Debugging
Data-centric Techniques
Requires:
1. Multiple executions
2. Large amounts of manual
effort (oracle creation, setup)
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Delta Debugging
Data-centric Techniques
Requires:
1. Multiple executions
2. Large amounts of manual
effort (oracle creation, setup)
Penumbra
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Delta Debugging
Data-centric Techniques
Requires:
1. Multiple executions
2. Large amounts of manual
effort (oracle creation, setup)
Penumbra
Comparable
performance
• Chan and Lakhotia ’98
• Zeller and Hildebrandt ’02
• Misherghi and Su ’06
Delta Debugging
Data-centric Techniques
Requires:
1. Multiple executions
2. Large amounts of manual
effort (oracle creation, setup)
Requires:
1. Single execution
2. Reduced manual effort
Penumbra
Comparable
performance
Intuition and Terminology
Failure-revealing input vector
Intuition and Terminology
Failure-revealing input vector
Failure-relevant subset
(inputs which are useful for investigating the failure)
Intuition and Terminology
Failure-revealing input vector
Failure-relevant subset
(inputs which are useful for investigating the failure)
Approximate failure-relevant subsets by
identifying inputs that reach the failure along
program dependencies.
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
Command line arguments
(flag, list of file names)
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
File statistics (for each file)
(size, last modified date, ...)
Command line arguments
(flag, list of file names)
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
File statistics (for each file)
(size, last modified date, ...)
File contents (for each file)
(first 50 characters)
Command line arguments
(flag, list of file names)
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
File statistics (for each file)
(size, last modified date, ...)
File contents (for each file)
(first 50 characters)
Command line arguments
(flag, list of file names)
Inputvector
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
Overflow out
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
buf.st_size ≥ 1GB
Overflow out
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
buf.st_size ≥ 1GB
verbose is true
Overflow out
read 50 characters
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
1. Many more inputs than lines
of code.
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
1. Many more inputs than lines
of code.
2. Understanding the failure
requires tracing interactions
between inputs from multiple
sources.
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
1. Many more inputs than lines
of code.
2. Understanding the failure
requires tracing interactions
between inputs from multiple
sources.
3. Only a small percentage of all
inputs are relevant for the
failure.
Motivating Example
int main(int argc, char **argv) {
1. int verbose, i, total_size = 0;
2. struct stat buf;
3. verbose = atoi(argv[1]);
4. for(i = 2; i < argc; i++) {
5. int fd = open(argv[i], O_RDONLY);
6. fstat(fd, &buf);
7. char *out = malloc(60);
8. sprintf(out, "%d", buf.st_size);
9. if(verbose) {
10. char *pview = malloc(51);
11. read(fd, pview, 50);
12. pview[50] = '0';
13. strcat(out, pview);
14. }
15. printf("%s: %sn", argv[i], out);
16. total_size += buf.st_size;
17. }
18. printf("total: %dn", total_size);
}
fileinfo
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
Relevant context:
1. When the failure occurs.
2. Which data are involved in
the failure.
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
13. strcat(out, pview);
In general, it is chosen using
traditional debugging methods.
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
1
2
3
4
5
6
7
8
9
0
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
1
2
3
4
5
6
7
8
9
0
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
fileinfo
Penumbra Overview
foo: 512 ... bar: 1024 ... baz: 150... total: 150...
Foo
512B
Bar
1KB
Baz
1.5GB
1 Taint inputs
2 Propagate
taint marks
3 Identify
relevant inputs
0 8 9
verbose is true
read 50 characters
buf.st_size ≥ 1GB
Outline
• Penumbra approach
1. Tainting inputs
2. Propagating taint marks
3. Identifying relevant inputs
• Evaluation
• Conclusions and future work
1: Tainting Inputs
Assign a taint mark to each input as it enters the application.
1: Tainting Inputs
Assign a taint mark to each input as it enters the application.
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Per-byte Per-entity Domain specific
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Per-byte Per-entity Domain specific
Maintains per -
byte precision
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Per-byte Per-entity Domain specific
Maintains per -
byte precision
Further increases
scalability
1: Tainting Inputs
Assign a unique
taint mark to each
byte.
(read from files)
Assign the same
taint mark to
related bytes.
(argv, argc, fstat, ...)
Assign taint marks
based on user-
provided
information.
Assign a taint mark to each input as it enters the application.
When a taint mark is assigned to an input, log the
input’s value and where the input was read from.
Precise
identification
Unnecessarily
expensive
Maintains per -
byte precision
Increases
scalability
Per-byte Per-entity Domain specific
Maintains per -
byte precision
Further increases
scalability
2: Propagating Taint Marks
2: Propagating Taint Marks
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
1 2
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
1 21 2
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
1 21 2
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
1 2 3
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
2: Propagating Taint Marks
Taint marks flow along only
data dependencies.
Taint marks flow along data and
control dependencies.
C = A + B;
if(X) {
C = A + B;
}
1 21 2
1 2
3
1 2 3
The effectiveness of each option depends on the particular failure.
Data-flow
Propagation (DF)
Data- and control-flow
Propagation (DF + CF)
3: Identifying Relevant-inputs
1. Relevant context indicates
which data is involved in the
considered failure.
2. Identify which taint marks as
associated with the data
indicated by the relevant
context.
3. Use recorded logs to
reconstruct inputs that are
identified by the taint marks.
Baz
1.5GB
Prototype Implementation
Trace
Processor
Trace
generator
input vector
executable
trace
relevant context
Prototype Implementation
Trace
Processor
Trace
generator
input vector
executable
trace
relevant context
Prototype Implementation
Trace
Processor
Trace
generator
Implemented using Dytan, a
generic x86 tainting framework
developed in previous work
[Clause and Orso 2007].
input vector
executable
trace
relevant context
Prototype Implementation
Trace
Processor
Trace
generator
input vector
executable
trace
relevant context
Prototype Implementation
Trace
Processor
Trace
generator
input subset
(DF)
input subset
(DF+CF)
Evaluation
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging
Evaluation
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
Evaluation
Study 1: Effectiveness for debugging real failures
Study 2: Comparison with Delta Debugging
Application KLoC Fault location
bc 1.06 10.5 more_arrays : 177
gzip 1.24 6.3 get_istat : 828
ncompress 4.24 1.4 comprexx : 896
pine 4.44 239.1 rfc822_cat : 260
squid 2.3 69.9 ftpBuildTitleUrl : 1024
Subjects:
We selected a failure-revealing input vector for each subject.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
• Location: statement where
the failure occurs.
• Data: any data read by such
statement
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
• Use gdb to inspect stack
trace and program data.
• One second timeout to
prevent incorrect results.
Data Generation
Penumbra Delta Debugging
Setup
(manual)
Execution
(automated)
Choose a relevant
context
Create an automated
oracle
Use prototype tool to
identify failure-relevant
inputs (DF and DF +
CF)
Use the standard Delta
Debugging
implementation to
minimize inputs.
Study 1: Effectiveness
Is the information that
Penumbra provides helpful for
debugging real failures?
Study 1 Results: gzip & ncompress
Crash when a file name is longer than 1,024 characters.
Study 1 Results: gzip & ncompress
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056
long
filename[ ]
Study 1 Results: gzip & ncompress
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]
Study 1 Results: gzip & ncompress
Contents
&
Attributes
Contents
&
Attributes
bar
Contents
&
Attributes
foo./gzip
Crash when a file name is longer than 1,024 characters.
# Relevant (DF + CF): 3
# Inputs: 10,000,056 # Relevant (DF): 1
long
filename[ ]
Study 1 Results: pine
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766
...
From clause@boar Tue Feb 20 11:49:53 2007
Return-Path: <clause@boar>
X-Original-To: clause
Delivered-To: clause@boar
Received: by boar (Postfix, from userid 1000)
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Message-Id: <20070220164953.88EDD1724523@boar>
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766
...
From clause@boar Tue Feb 20 11:49:53 2007
Return-Path: <clause@boar>
X-Original-To: clause
Delivered-To: clause@boar
Received: by boar (Postfix, from userid 1000)
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Message-Id: <20070220164953.88EDD1724523@boar>
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
…            …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Inputs: 15,103,766 # Relevant (DF): 26
...
From clause@boar Tue Feb 20 11:49:53 2007
Return-Path: <clause@boar>
X-Original-To: clause
Delivered-To: clause@boar
Received: by boar (Postfix, from userid 1000)
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Message-Id: <20070220164953.88EDD1724523@boar>
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
…            …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1 Results: pine
# Relevant (DF + CF):15,100,344
# Inputs: 15,103,766 # Relevant (DF): 26
...
From clause@boar Tue Feb 20 11:49:53 2007
Return-Path: <clause@boar>
X-Original-To: clause
Delivered-To: clause@boar
Received: by boar (Postfix, from userid 1000)
id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST)
To: clause@boar
Subject: test
Message-Id: <20070220164953.88EDD1724523@boar>
Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST)
From: """"""""""""""""""""""""""""""""@host.fubar
X-IMAPbase: 1172160370 390
Status: O
X-Status:
X-Keywords:
X-UID: 5
...
…            …" " " " " " " " " " " "
Crash when a “from” field contains 22 or more double quote characters.
Study 1: Conclusions
Study 1: Conclusions
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes
effective.
➡ Use data-flow first then, if necessary, use control-flow.
Study 1: Conclusions
1. Data-flow propagation is always effective,
data- and control-flow propagation is sometimes
effective.
➡ Use data-flow first then, if necessary, use control-flow.
2. Inputs identified by Penumbra correspond to the
failure conditions.
➡ Our technique is effective in assisting the debugging of
real failures.
Study 2: Comparison with Delta Debugging
RQ1: How much manual effort
does each technique require?
RQ2: How long does it take to
fix a considered failure given
the information provided by
each technique?
RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.
RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
RQ1: Manual effort
Use setup-time as a proxy for manual (developer) effort.
5,400
12,600
1,8001,800
1259731470163
ncompress bc pine
Setup-time(s)
gzip
Penumbra
Delta Debugging
squid
Penumbra requires considerably less setup time than Delta Debugging
(although more time time overall for gzip and ncompress).
RQ2: Debugging Effort
Use number of relevant inputs as a proxy for debugging effort.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
RQ2: Debugging Effort
Subject PenumbraPenumbra Delta Debugging
DF DF + CF
bc 209 743 285
gzip 1 3 1
ncompress 1 3 1
pine 26 15,100,344 90
squid 89 2,056 —
Use number of relevant inputs as a proxy for debugging effort.
• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
• Penumbra (DF + CF) is likely less effective for bc, pine, and squid
Conclusions & Future Work
• Novel technique for identifying failure-relevant
inputs.
• Overcomes limitations of existing approaches
• Single execution
• Minimal manual effort
• Comparable effectiveness
• Combine Penumbra with existing code-centric
techniques.

More Related Content

PPTX
Самые вкусные баги из игрового кода: как ошибаются наши коллеги-программисты ...
PDF
JavaScript Proxy (ES6)
PPT
Евгений Крутько, Многопоточные вычисления, современный подход.
PDF
NSClient++: Monitoring Simplified at OSMC 2013
PDF
Python 炒股指南
PDF
[2007 CodeEngn Conference 01] seaofglass - Linux Virus Analysis
PDF
Qt Rest Server
Самые вкусные баги из игрового кода: как ошибаются наши коллеги-программисты ...
JavaScript Proxy (ES6)
Евгений Крутько, Многопоточные вычисления, современный подход.
NSClient++: Monitoring Simplified at OSMC 2013
Python 炒股指南
[2007 CodeEngn Conference 01] seaofglass - Linux Virus Analysis
Qt Rest Server

What's hot (20)

PPTX
Hack ASP.NET website
PDF
Powered by Python - PyCon Germany 2016
PDF
Cluj.py Meetup: Extending Python in C
PDF
Secure Programming Practices in C++ (NDC Security 2018)
PDF
Errors detected in C++Builder
PDF
What Lies Beneath
PPTX
Потоки в перле изнутри
PDF
Bai Giang 11
PDF
TensorFlow XLA RPC
PDF
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
PDF
OSMC 2013 | Making monitoring simple? by Michael Medin
PDF
Code obfuscation, php shells & more
PPTX
ONLINE STUDENT MANAGEMENT SYSTEM
PDF
Коварный code type ITGM #9
PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PDF
Architecture for Massively Parallel HDL Simulations
PDF
Arduino、Web 到 IoT
PDF
Catch a spider monkey
PDF
Top 10 php classic traps confoo
PDF
The Anatomy of an Exploit
Hack ASP.NET website
Powered by Python - PyCon Germany 2016
Cluj.py Meetup: Extending Python in C
Secure Programming Practices in C++ (NDC Security 2018)
Errors detected in C++Builder
What Lies Beneath
Потоки в перле изнутри
Bai Giang 11
TensorFlow XLA RPC
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
OSMC 2013 | Making monitoring simple? by Michael Medin
Code obfuscation, php shells & more
ONLINE STUDENT MANAGEMENT SYSTEM
Коварный code type ITGM #9
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
Architecture for Massively Parallel HDL Simulations
Arduino、Web 到 IoT
Catch a spider monkey
Top 10 php classic traps confoo
The Anatomy of an Exploit
Ad

Viewers also liked (7)

PPT
Tweet for Business
PPTX
Bella2010 Misc
ODP
Prezentace
PDF
Enabling and Supporting the Debugging of Software Failures (PhD Defense)
ODP
Prezentace
PDF
Taint-based Dynamic Analysis (CoC Research Day 2009)
PDF
Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)
Tweet for Business
Bella2010 Misc
Prezentace
Enabling and Supporting the Debugging of Software Failures (PhD Defense)
Prezentace
Taint-based Dynamic Analysis (CoC Research Day 2009)
Dytan: A Generic Dynamic Taint Analysis Framework (ISSTA 2007)
Ad

Similar to Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009) (20)

DOC
C - aptitude3
DOC
C aptitude questions
DOCX
All functions
PDF
C for Java programmers (part 1)
PDF
C_Dayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy 9.pdf
DOCX
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
DOCX
1 CMPS 12M Introduction to Data Structures Lab La.docx
PPT
File handling(some slides only)
PPTX
Lecturer notes on file handling in programming C
PPT
C-Programming Chapter 5 File-handling-C.ppt
PPT
file_handling_in_c.ppt
PPT
C-Programming Chapter 5 File-handling-C.ppt
DOCX
Linux 系統程式--第一章 i/o 函式
PPT
Files_in_C.ppt
PPTX
PPS PPT 2.pptx
PPTX
pre processor and file handling in c language ppt
PPTX
File Handling in C Programming for Beginners
PDF
OS_Compilation_Makefile_kt4jerb34834343553
PPTX
Engineering Computers L34-L35-File Handling.pptx
C - aptitude3
C aptitude questions
All functions
C for Java programmers (part 1)
C_Dayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy 9.pdf
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
1 CMPS 12M Introduction to Data Structures Lab La.docx
File handling(some slides only)
Lecturer notes on file handling in programming C
C-Programming Chapter 5 File-handling-C.ppt
file_handling_in_c.ppt
C-Programming Chapter 5 File-handling-C.ppt
Linux 系統程式--第一章 i/o 函式
Files_in_C.ppt
PPS PPT 2.pptx
pre processor and file handling in c language ppt
File Handling in C Programming for Beginners
OS_Compilation_Makefile_kt4jerb34834343553
Engineering Computers L34-L35-File Handling.pptx

More from James Clause (11)

PDF
Investigating the Impacts of Web Servers on Web Application Energy Usage (GRE...
PDF
Energy-directed Test Suite Optimization (GREENS 2013)
PDF
Enabling and Supporting the Debugging of Field Failures (Job Talk)
PDF
Leakpoint: Pinpointing the Causes of Memory Leaks (ICSE 2010)
PDF
Debugging Field Failures by Minimizing Captured Executions (ICSE 2009: NIER e...
PDF
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)
PDF
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
PDF
Initial Explorations on Design Pattern Energy Usage (GREENS 12)
PDF
Effective Memory Protection Using Dynamic Tainting (ASE 2007)
PDF
Advanced Dynamic Analysis for Leak Detection (Apple Internship 2008)
PDF
Camouflage: Automated Anonymization of Field Data (ICSE 2011)
Investigating the Impacts of Web Servers on Web Application Energy Usage (GRE...
Energy-directed Test Suite Optimization (GREENS 2013)
Enabling and Supporting the Debugging of Field Failures (Job Talk)
Leakpoint: Pinpointing the Causes of Memory Leaks (ICSE 2010)
Debugging Field Failures by Minimizing Captured Executions (ICSE 2009: NIER e...
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)
Demand-Driven Structural Testing with Dynamic Instrumentation (ICSE 2005)
Initial Explorations on Design Pattern Energy Usage (GREENS 12)
Effective Memory Protection Using Dynamic Tainting (ASE 2007)
Advanced Dynamic Analysis for Leak Detection (Apple Internship 2008)
Camouflage: Automated Anonymization of Field Data (ICSE 2011)

Recently uploaded (20)

PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
substrate PowerPoint Presentation basic one
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Identification of potential depression in social media posts
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Decision Optimization - From Theory to Practice
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Information-Technology-in-Human-Society.pptx
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
How to use fields_get method in Odoo 18
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
NewMind AI Journal Monthly Chronicles - August 2025
Build Real-Time ML Apps with Python, Feast & NoSQL
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
Presentation - Principles of Instructional Design.pptx
giants, standing on the shoulders of - by Daniel Stenberg
substrate PowerPoint Presentation basic one
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Co-training pseudo-labeling for text classification with support vector machi...
Identification of potential depression in social media posts
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Decision Optimization - From Theory to Practice
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Information-Technology-in-Human-Society.pptx
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
How to use fields_get method in Odoo 18
A symptom-driven medical diagnosis support model based on machine learning te...
NewMind AI Journal Monthly Chronicles - August 2025

Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

  • 1. Penumbra: Automatically Identifying Failure-Relevant Inputs James Clause and Alessandro Orso College of Computing Georgia Institute of Technology Supported in part by: NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech
  • 2. Automated Debugging • Gupta and colleagues ’05 • Jones and colleagues ’02 • Korel and Laski ’88 • Liblit and colleagues ’05 • Nainar and colleagues ’07 • Renieris and Reiss ’03 • Seward and Nethercote ’05 • Tucek and colleagues ’07 • Weiser ’81 • Zhang and colleagues ’05 • Zhang and colleagues ’06 • ...
  • 3. Automated Debugging Code-centric • Gupta and colleagues ’05 • Jones and colleagues ’02 • Korel and Laski ’88 • Liblit and colleagues ’05 • Nainar and colleagues ’07 • Renieris and Reiss ’03 • Seward and Nethercote ’05 • Tucek and colleagues ’07 • Weiser ’81 • Zhang and colleagues ’05 • Zhang and colleagues ’06 • ...
  • 4. Automated Debugging Code-centric • Gupta and colleagues ’05 • Jones and colleagues ’02 • Korel and Laski ’88 • Liblit and colleagues ’05 • Nainar and colleagues ’07 • Renieris and Reiss ’03 • Seward and Nethercote ’05 • Tucek and colleagues ’07 • Weiser ’81 • Zhang and colleagues ’05 • Zhang and colleagues ’06 • ... What about inputs which cause the failure?
  • 5. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Data-centric Techniques
  • 6. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Delta Debugging Data-centric Techniques
  • 7. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Delta Debugging Data-centric Techniques Requires: 1. Multiple executions 2. Large amounts of manual effort (oracle creation, setup)
  • 8. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Delta Debugging Data-centric Techniques Requires: 1. Multiple executions 2. Large amounts of manual effort (oracle creation, setup) Penumbra
  • 9. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Delta Debugging Data-centric Techniques Requires: 1. Multiple executions 2. Large amounts of manual effort (oracle creation, setup) Penumbra Comparable performance
  • 10. • Chan and Lakhotia ’98 • Zeller and Hildebrandt ’02 • Misherghi and Su ’06 Delta Debugging Data-centric Techniques Requires: 1. Multiple executions 2. Large amounts of manual effort (oracle creation, setup) Requires: 1. Single execution 2. Reduced manual effort Penumbra Comparable performance
  • 12. Intuition and Terminology Failure-revealing input vector Failure-relevant subset (inputs which are useful for investigating the failure)
  • 13. Intuition and Terminology Failure-revealing input vector Failure-relevant subset (inputs which are useful for investigating the failure) Approximate failure-relevant subsets by identifying inputs that reach the failure along program dependencies.
  • 14. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 15. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo Command line arguments (flag, list of file names)
  • 16. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo File statistics (for each file) (size, last modified date, ...) Command line arguments (flag, list of file names)
  • 17. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo File statistics (for each file) (size, last modified date, ...) File contents (for each file) (first 50 characters) Command line arguments (flag, list of file names)
  • 18. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo File statistics (for each file) (size, last modified date, ...) File contents (for each file) (first 50 characters) Command line arguments (flag, list of file names) Inputvector
  • 19. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 20. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo Overflow out
  • 21. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo buf.st_size ≥ 1GB Overflow out
  • 22. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo buf.st_size ≥ 1GB verbose is true Overflow out
  • 23. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo buf.st_size ≥ 1GB verbose is true Overflow out read 50 characters
  • 24. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 25. 1. Many more inputs than lines of code. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 26. 1. Many more inputs than lines of code. 2. Understanding the failure requires tracing interactions between inputs from multiple sources. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 27. 1. Many more inputs than lines of code. 2. Understanding the failure requires tracing interactions between inputs from multiple sources. 3. Only a small percentage of all inputs are relevant for the failure. Motivating Example int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) { 10. char *pview = malloc(51); 11. read(fd, pview, 50); 12. pview[50] = '0'; 13. strcat(out, pview); 14. } 15. printf("%s: %sn", argv[i], out); 16. total_size += buf.st_size; 17. } 18. printf("total: %dn", total_size); } fileinfo
  • 28. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB
  • 29. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB Relevant context: 1. When the failure occurs. 2. Which data are involved in the failure.
  • 30. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 13. strcat(out, pview); In general, it is chosen using traditional debugging methods.
  • 31. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB
  • 32. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs
  • 33. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 1 2 3 4 5 6 7 8 9 0
  • 34. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks 1 2 3 4 5 6 7 8 9 0
  • 35. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks
  • 36. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs
  • 37. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs 0 8 9
  • 38. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs 0 8 9
  • 39. fileinfo Penumbra Overview foo: 512 ... bar: 1024 ... baz: 150... total: 150... Foo 512B Bar 1KB Baz 1.5GB 1 Taint inputs 2 Propagate taint marks 3 Identify relevant inputs 0 8 9 verbose is true read 50 characters buf.st_size ≥ 1GB
  • 40. Outline • Penumbra approach 1. Tainting inputs 2. Propagating taint marks 3. Identifying relevant inputs • Evaluation • Conclusions and future work
  • 41. 1: Tainting Inputs Assign a taint mark to each input as it enters the application.
  • 42. 1: Tainting Inputs Assign a taint mark to each input as it enters the application. Per-byte Per-entity Domain specific
  • 43. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Per-byte Per-entity Domain specific
  • 44. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Per-byte Per-entity Domain specific
  • 45. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Per-byte Per-entity Domain specific
  • 46. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Per-byte Per-entity Domain specific
  • 47. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Maintains per - byte precision Per-byte Per-entity Domain specific
  • 48. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Maintains per - byte precision Increases scalability Per-byte Per-entity Domain specific
  • 49. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Maintains per - byte precision Increases scalability Per-byte Per-entity Domain specific
  • 50. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Maintains per - byte precision Increases scalability Per-byte Per-entity Domain specific Maintains per - byte precision
  • 51. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. Precise identification Unnecessarily expensive Maintains per - byte precision Increases scalability Per-byte Per-entity Domain specific Maintains per - byte precision Further increases scalability
  • 52. 1: Tainting Inputs Assign a unique taint mark to each byte. (read from files) Assign the same taint mark to related bytes. (argv, argc, fstat, ...) Assign taint marks based on user- provided information. Assign a taint mark to each input as it enters the application. When a taint mark is assigned to an input, log the input’s value and where the input was read from. Precise identification Unnecessarily expensive Maintains per - byte precision Increases scalability Per-byte Per-entity Domain specific Maintains per - byte precision Further increases scalability
  • 54. 2: Propagating Taint Marks Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 55. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 56. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 57. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; 1 2 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 58. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; 1 21 2 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 59. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; 1 21 2 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 60. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; if(X) { C = A + B; } 1 21 2 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 61. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; if(X) { C = A + B; } 1 21 2 1 2 3 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 62. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; if(X) { C = A + B; } 1 21 2 1 2 3 1 2 3 Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 63. 2: Propagating Taint Marks Taint marks flow along only data dependencies. Taint marks flow along data and control dependencies. C = A + B; if(X) { C = A + B; } 1 21 2 1 2 3 1 2 3 The effectiveness of each option depends on the particular failure. Data-flow Propagation (DF) Data- and control-flow Propagation (DF + CF)
  • 64. 3: Identifying Relevant-inputs 1. Relevant context indicates which data is involved in the considered failure. 2. Identify which taint marks as associated with the data indicated by the relevant context. 3. Use recorded logs to reconstruct inputs that are identified by the taint marks. Baz 1.5GB
  • 66. input vector executable trace relevant context Prototype Implementation Trace Processor Trace generator
  • 67. input vector executable trace relevant context Prototype Implementation Trace Processor Trace generator Implemented using Dytan, a generic x86 tainting framework developed in previous work [Clause and Orso 2007].
  • 68. input vector executable trace relevant context Prototype Implementation Trace Processor Trace generator
  • 69. input vector executable trace relevant context Prototype Implementation Trace Processor Trace generator input subset (DF) input subset (DF+CF)
  • 70. Evaluation Study 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging
  • 71. Evaluation Study 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging Application KLoC Fault location bc 1.06 10.5 more_arrays : 177 gzip 1.24 6.3 get_istat : 828 ncompress 4.24 1.4 comprexx : 896 pine 4.44 239.1 rfc822_cat : 260 squid 2.3 69.9 ftpBuildTitleUrl : 1024 Subjects:
  • 72. Evaluation Study 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging Application KLoC Fault location bc 1.06 10.5 more_arrays : 177 gzip 1.24 6.3 get_istat : 828 ncompress 4.24 1.4 comprexx : 896 pine 4.44 239.1 rfc822_cat : 260 squid 2.3 69.9 ftpBuildTitleUrl : 1024 Subjects: We selected a failure-revealing input vector for each subject.
  • 73. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs.
  • 74. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs.
  • 75. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs. • Location: statement where the failure occurs. • Data: any data read by such statement
  • 76. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs.
  • 77. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs.
  • 78. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs. • Use gdb to inspect stack trace and program data. • One second timeout to prevent incorrect results.
  • 79. Data Generation Penumbra Delta Debugging Setup (manual) Execution (automated) Choose a relevant context Create an automated oracle Use prototype tool to identify failure-relevant inputs (DF and DF + CF) Use the standard Delta Debugging implementation to minimize inputs.
  • 80. Study 1: Effectiveness Is the information that Penumbra provides helpful for debugging real failures?
  • 81. Study 1 Results: gzip & ncompress Crash when a file name is longer than 1,024 characters.
  • 82. Study 1 Results: gzip & ncompress Contents & Attributes Contents & Attributes bar Contents & Attributes foo./gzip Crash when a file name is longer than 1,024 characters. # Inputs: 10,000,056 long filename[ ]
  • 83. Study 1 Results: gzip & ncompress Contents & Attributes Contents & Attributes bar Contents & Attributes foo./gzip Crash when a file name is longer than 1,024 characters. # Inputs: 10,000,056 # Relevant (DF): 1 long filename[ ]
  • 84. Study 1 Results: gzip & ncompress Contents & Attributes Contents & Attributes bar Contents & Attributes foo./gzip Crash when a file name is longer than 1,024 characters. # Relevant (DF + CF): 3 # Inputs: 10,000,056 # Relevant (DF): 1 long filename[ ]
  • 85. Study 1 Results: pine Crash when a “from” field contains 22 or more double quote characters.
  • 86. Study 1 Results: pine # Inputs: 15,103,766 ... From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: """"""""""""""""""""""""""""""""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5 ... Crash when a “from” field contains 22 or more double quote characters.
  • 87. Study 1 Results: pine # Inputs: 15,103,766 ... From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: """"""""""""""""""""""""""""""""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5 ... … …" " " " " " " " " " " " Crash when a “from” field contains 22 or more double quote characters.
  • 88. Study 1 Results: pine # Inputs: 15,103,766 # Relevant (DF): 26 ... From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: """"""""""""""""""""""""""""""""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5 ... … …" " " " " " " " " " " " Crash when a “from” field contains 22 or more double quote characters.
  • 89. Study 1 Results: pine # Relevant (DF + CF):15,100,344 # Inputs: 15,103,766 # Relevant (DF): 26 ... From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: """"""""""""""""""""""""""""""""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5 ... … …" " " " " " " " " " " " Crash when a “from” field contains 22 or more double quote characters.
  • 91. Study 1: Conclusions 1. Data-flow propagation is always effective, data- and control-flow propagation is sometimes effective. ➡ Use data-flow first then, if necessary, use control-flow.
  • 92. Study 1: Conclusions 1. Data-flow propagation is always effective, data- and control-flow propagation is sometimes effective. ➡ Use data-flow first then, if necessary, use control-flow. 2. Inputs identified by Penumbra correspond to the failure conditions. ➡ Our technique is effective in assisting the debugging of real failures.
  • 93. Study 2: Comparison with Delta Debugging RQ1: How much manual effort does each technique require? RQ2: How long does it take to fix a considered failure given the information provided by each technique?
  • 94. RQ1: Manual effort Use setup-time as a proxy for manual (developer) effort.
  • 95. RQ1: Manual effort Use setup-time as a proxy for manual (developer) effort. 5,400 12,600 1,8001,800 1259731470163 ncompress bc pine Setup-time(s) gzip Penumbra Delta Debugging squid
  • 96. RQ1: Manual effort Use setup-time as a proxy for manual (developer) effort. 5,400 12,600 1,8001,800 1259731470163 ncompress bc pine Setup-time(s) gzip Penumbra Delta Debugging squid
  • 97. RQ1: Manual effort Use setup-time as a proxy for manual (developer) effort. 5,400 12,600 1,8001,800 1259731470163 ncompress bc pine Setup-time(s) gzip Penumbra Delta Debugging squid
  • 98. RQ1: Manual effort Use setup-time as a proxy for manual (developer) effort. 5,400 12,600 1,8001,800 1259731470163 ncompress bc pine Setup-time(s) gzip Penumbra Delta Debugging squid Penumbra requires considerably less setup time than Delta Debugging (although more time time overall for gzip and ncompress).
  • 99. RQ2: Debugging Effort Use number of relevant inputs as a proxy for debugging effort.
  • 100. RQ2: Debugging Effort Subject PenumbraPenumbra Delta Debugging DF DF + CF bc 209 743 285 gzip 1 3 1 ncompress 1 3 1 pine 26 15,100,344 90 squid 89 2,056 — Use number of relevant inputs as a proxy for debugging effort.
  • 101. RQ2: Debugging Effort Subject PenumbraPenumbra Delta Debugging DF DF + CF bc 209 743 285 gzip 1 3 1 ncompress 1 3 1 pine 26 15,100,344 90 squid 89 2,056 — Use number of relevant inputs as a proxy for debugging effort. • Penumbra (DF) is comparable to (slightly better than) Delta Debugging.
  • 102. RQ2: Debugging Effort Subject PenumbraPenumbra Delta Debugging DF DF + CF bc 209 743 285 gzip 1 3 1 ncompress 1 3 1 pine 26 15,100,344 90 squid 89 2,056 — Use number of relevant inputs as a proxy for debugging effort. • Penumbra (DF) is comparable to (slightly better than) Delta Debugging. • Penumbra (DF + CF) is likely less effective for bc, pine, and squid
  • 103. Conclusions & Future Work • Novel technique for identifying failure-relevant inputs. • Overcomes limitations of existing approaches • Single execution • Minimal manual effort • Comparable effectiveness • Combine Penumbra with existing code-centric techniques.