SlideShare a Scribd company logo
DR. BEN LIVSHITS
IMPERIAL COLLEGE LONDON
GAP TESTING:
COMBINING
DIVERSE
TESTING
STRATEGIES FOR
FUN AND
PROFIT
MY BACKGROUND
 Professor at Imperial College London
 Industrial researcher
 Stanford Ph.D.
 Here to talk about some of the technologies
underlying testing
 Learn about industrial practice
 Work on a range of topics including
 Software reliability
 Program analysis
 Security and privacy
 Crowd-sourcing
 etc.
FOR FUNCTIONAL TESTING: MANY STRATEGIES
Human effort
 Test suites written by developers and/or
testers
 Field testing
 Crowd-based testing
 Penetration testing
Automation
 (Black box) Fuzzing
 White box fuzzing or symbolic execution
 We might even throw in other automated
strategies into this category such as static
analysis
MANUAL VS. AUTOMATED
 My focus is on automation, generally
 However, ultimately, these two approaches should be
complimentary to each other
 Case in point: consider the numerous companies that
do mobile app testing, i.e. Applause
 The general approach is to upload an app binary, have a
crowd of people on call, they jump on the app,
encounter bugs, report bugs, etc.
 Generally, not many guarantees from this kind of
approach
 But it’s quite useful as the first level of testing
https://www.slideshare.net/IosifItkin/extent2016-the-future-of-software-testing
MANUAL VS. AUTOMATED: HOW DO THEY COMPARE?
 Fundamentally, a difficult question to answer
 What is our goal
 Operational goals
 Make sure the application doesn’t crash at the start
 Make sure the application isn’t easy to hack into
 Development/design goals
 Make sure the coverage is high or 100%, for some
definition of what coverage is
 Make sure the application doesn’t crash, ever, or
violate assertions, ever?
Do we have to choose?
MULTIPLE, COMPETING, UNCOORDINATED
TECHNIQUES ARE NORMAL
 We would love to have a situation of when one
solution delivers all the value
 Case in point: symbolic execution was advertised
as a the best thing since sliced bread:
 Precision of runtime execution
 Coverage of static analysis
 How can this go wrong?
 The practice of symbolic execution is
unfortunately different
 Coverage numbers from KLEE and SAGE
SO, MAYBE ONE TECHNIQUE ALONE IS NOT GOOD ENOUGH
 What can we do?
 Well, let’s assume we have the compute cycles
(which we often do) and the money to hire
testers (which we often don’t)
 How do combine these efforts?
 Fundamental challenges
 Overlap is significant, bling fuzzing is not so helpful
 Differences are hard to hit – for example, how do
we hit a specific code execution path to get closer
to 100% path coverage? Symbolic execution is a
heavy-weight, less-than-scalable answer
DEVELOPER-WRITTEN TESTS VS. IN-THE-FIELD EXECUTION
 Study four large open-source Java projects
 We find that developer-written test suites fail to accurately
represent field executions: the tests, on average, miss 6.2% of
the statements and 7.7% of the methods exercised in the field;
 The behavior exercised only in the field kills an extra 8.6% of the
mutants; finally, the tests miss 52.6% of the behavioral invariants
that occur in the field.
LET’S FOCUS ON EXECUTION PATHS
 Need to coordinate our testing efforts
 Gap testing principles
 Avoid repeated, wasteful work
 Find ways to hit methods/statements/basic
blocks/paths that are not covered by other
methods
Common paths:
Covered multiple times
Extra work is not warranted
However, extra testers are
likely to hit exactly this
Occasionally
encountered:
How do we effectively
cover this?
Rarely seen:
How do we hit this
without wasting
effort?
TWO EXAMPLES OF MORE TARGETED TESTING
Crowd-based UI testing
aiming for 100% coverage
Targeted symbolic execution
aiming to hit interesting parts of the code
GAP TESTING FOR UI
 Testing Android apps
 Goal: to have 100% UI coverage
 How to define that is sometimes a little murky
 But let’s assume we have a notion of screen
coverage
 Move away from covered screens
 By shutting off parts of the app
 Aim is to to get as close as 100% coverage by
guiding crowd-sourced testers
CROWD OF TESTERS WITH THE SYSTEM
GUIDING THEM TOWARD UNEXPLORED PATHS
GUIDING SYMBOLIC EXECUTION
 Continue exploring the program until we find
something “interesting’
 That may be a crash or an alarm from a tool such
as AddressSanitizer, ThreadSanitizer, Valgrind,
etc.
 Suffers from exponential blow-up issues and
solver overhead
 If we instead know what we are looking for, for
example, a method in the code we want to see
called, we can direct our analysis better
 Prioritize branch outcomes so as to him the
target
ULTIMATE VISION
 A portfolio of testing strategies that can be
invoked on demand
 Deployed together to improve the ultimate
outcome
 Sometimes, manual testing in the right thing,
sometimes it’s not
 We’ve seen some examples of complimentary
testing strategies
 The list is nowhere close to exhaustive…
OPTIMIZING TESTING EFFORTS
 How to get the most out of your
portfolio of testing approaches,
minimizing the time and money
spent
 It would be nice to be able to
estimate the efficacy of a particular
method and the cost in terms of
time, human involvement, and
machine cycles
 That’s actually possible with
machine learning-based predictive
models, i.e. mean time to the next
bug found is something we can
THE END.
GAP TESTING: COMBINING DIVERSE TESTING STRATEGIES FOR
FUN AND PROFIT
We have seen a number of testing techniques such as fuzzing, symbolic execution, and crowd-sourced
testing emerge as viable alternatives to the more traditional strategies of developer-driven testing in the
last decade.
While there is a lot of excitement around many of these ideas, how to property combine diverse testing
techniques in order to achieve a specific goal, i.e. maximize statement-level coverage remains unclear.
The goal of this talks is to illustrate how to combine different testing techniques by having them naturally
complement each other, i.e. if there is a set of methods that are not covered via automated testing, how
do we use a crowd of users and direct their efforts toward those methods, while minimizing effort
duplication?
Can multiple testing strategies peacefully co-exist? When combined, can they add up to a
comprehensive strategy that gives us something that was impossible before, i.e. 100% test coverage?

More Related Content

What's hot (10)

PPT
Nii shonan-meeting-gsrm-20141021 - コピー
Hironori Washizaki
 
PDF
DevOpsDays - Pick any Three - Devops from scratch
Pete Cheslock
 
PPTX
Startups & the Product Management Perspective
Amarpreet Kalkat
 
PDF
Smartphone Security Assessment.LGiles2015
Lance Giles
 
PDF
Future Of Software Testing
99tests
 
PDF
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24
 
PDF
Moving Fast and Managing Complexity - Serhiy Kostyshyn
Serhiy Kostyshyn
 
PDF
Watson DevCon 2016 - The Future of Discovery: Deep Insights with Cognitive APIs
IBM Watson
 
PDF
Measurement and Metrics for Test Managers
TechWell
 
PPTX
Root cause analysis webinar social
Medgate Inc.
 
Nii shonan-meeting-gsrm-20141021 - コピー
Hironori Washizaki
 
DevOpsDays - Pick any Three - Devops from scratch
Pete Cheslock
 
Startups & the Product Management Perspective
Amarpreet Kalkat
 
Smartphone Security Assessment.LGiles2015
Lance Giles
 
Future Of Software Testing
99tests
 
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24
 
Moving Fast and Managing Complexity - Serhiy Kostyshyn
Serhiy Kostyshyn
 
Watson DevCon 2016 - The Future of Discovery: Deep Insights with Cognitive APIs
IBM Watson
 
Measurement and Metrics for Test Managers
TechWell
 
Root cause analysis webinar social
Medgate Inc.
 

Similar to EXTENT-2017: Gap Testing: Combining Diverse Testing Strategies for Fun and Profit (20)

PPTX
The Gap Between Academic Research and Industrial Practice in Software Testing
Zoltan Micskei
 
PPTX
UNIT TESTING PPT
suhasreddy1
 
PDF
Testing practicies not only in scala
Paweł Panasewicz
 
PPTX
Unit testing
princezzlove
 
PDF
Writing Tests Effectively
Paul Boocock
 
PPT
SE-CyclomaticComplexityand Testing.ppt
vishal choudhary
 
PPTX
Testing strategies part -1
Divya Tiwari
 
PPT
Lecture18- Testing Strategy.ppt by aiman
AIMANFATIMA866050
 
PPTX
Testing 101
Noam Barkai
 
PDF
DSR Testing (Part 1)
Steve Upton
 
PPTX
Unit testing & TDD concepts with best practice guidelines.
Mohamed Taman
 
PDF
Introduction to Automated Testing
Lars Thorup
 
PDF
Introduction to-automated-testing
BestBrains
 
PPT
Chapter 13 software testing strategies
SHREEHARI WADAWADAGI
 
PPTX
Software Testing Strategies ,Validation Testing and System Testing.
Tanzeem Aslam
 
PDF
Functional Testing - A Detailed Guide.pdf
kalichargn70th171
 
PPT
software-testing-strategies888888888.ppt
sameera abu-ghalyoon
 
PPT
Testing
Muni Ram
 
PDF
Let's test!
Andrea Giuliano
 
PDF
Presentation
SATYALOK
 
The Gap Between Academic Research and Industrial Practice in Software Testing
Zoltan Micskei
 
UNIT TESTING PPT
suhasreddy1
 
Testing practicies not only in scala
Paweł Panasewicz
 
Unit testing
princezzlove
 
Writing Tests Effectively
Paul Boocock
 
SE-CyclomaticComplexityand Testing.ppt
vishal choudhary
 
Testing strategies part -1
Divya Tiwari
 
Lecture18- Testing Strategy.ppt by aiman
AIMANFATIMA866050
 
Testing 101
Noam Barkai
 
DSR Testing (Part 1)
Steve Upton
 
Unit testing & TDD concepts with best practice guidelines.
Mohamed Taman
 
Introduction to Automated Testing
Lars Thorup
 
Introduction to-automated-testing
BestBrains
 
Chapter 13 software testing strategies
SHREEHARI WADAWADAGI
 
Software Testing Strategies ,Validation Testing and System Testing.
Tanzeem Aslam
 
Functional Testing - A Detailed Guide.pdf
kalichargn70th171
 
software-testing-strategies888888888.ppt
sameera abu-ghalyoon
 
Testing
Muni Ram
 
Let's test!
Andrea Giuliano
 
Presentation
SATYALOK
 
Ad

More from Iosif Itkin (20)

PDF
Foundations of Software Testing Lecture 4
Iosif Itkin
 
PPTX
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
Iosif Itkin
 
PDF
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Iosif Itkin
 
PDF
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Iosif Itkin
 
PDF
Operational Resilience in Financial Market Infrastructures
Iosif Itkin
 
PDF
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
Iosif Itkin
 
PDF
Testing the Intelligence of your AI
Iosif Itkin
 
PDF
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
Iosif Itkin
 
PDF
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
Iosif Itkin
 
PPTX
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
Iosif Itkin
 
PDF
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
Iosif Itkin
 
PDF
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
Iosif Itkin
 
PPTX
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
Iosif Itkin
 
PDF
QA Community Saratov: Past, Present, Future (2019-02-08)
Iosif Itkin
 
PDF
Machine Learning and RoboCop Testing
Iosif Itkin
 
PDF
Behaviour Driven Development: Oltre i limiti del possibile
Iosif Itkin
 
PDF
2018 - Exactpro Year in Review
Iosif Itkin
 
PPTX
Exactpro Discussion about Joy and Strategy
Iosif Itkin
 
PPTX
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
Iosif Itkin
 
PDF
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
Iosif Itkin
 
Foundations of Software Testing Lecture 4
Iosif Itkin
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
Iosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Iosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Iosif Itkin
 
Operational Resilience in Financial Market Infrastructures
Iosif Itkin
 
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
Iosif Itkin
 
Testing the Intelligence of your AI
Iosif Itkin
 
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
Iosif Itkin
 
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
Iosif Itkin
 
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
Iosif Itkin
 
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
Iosif Itkin
 
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
Iosif Itkin
 
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
Iosif Itkin
 
QA Community Saratov: Past, Present, Future (2019-02-08)
Iosif Itkin
 
Machine Learning and RoboCop Testing
Iosif Itkin
 
Behaviour Driven Development: Oltre i limiti del possibile
Iosif Itkin
 
2018 - Exactpro Year in Review
Iosif Itkin
 
Exactpro Discussion about Joy and Strategy
Iosif Itkin
 
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
Iosif Itkin
 
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
Iosif Itkin
 
Ad

Recently uploaded (20)

PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 

EXTENT-2017: Gap Testing: Combining Diverse Testing Strategies for Fun and Profit

  • 1. DR. BEN LIVSHITS IMPERIAL COLLEGE LONDON GAP TESTING: COMBINING DIVERSE TESTING STRATEGIES FOR FUN AND PROFIT
  • 2. MY BACKGROUND  Professor at Imperial College London  Industrial researcher  Stanford Ph.D.  Here to talk about some of the technologies underlying testing  Learn about industrial practice  Work on a range of topics including  Software reliability  Program analysis  Security and privacy  Crowd-sourcing  etc.
  • 3. FOR FUNCTIONAL TESTING: MANY STRATEGIES Human effort  Test suites written by developers and/or testers  Field testing  Crowd-based testing  Penetration testing Automation  (Black box) Fuzzing  White box fuzzing or symbolic execution  We might even throw in other automated strategies into this category such as static analysis
  • 4. MANUAL VS. AUTOMATED  My focus is on automation, generally  However, ultimately, these two approaches should be complimentary to each other  Case in point: consider the numerous companies that do mobile app testing, i.e. Applause  The general approach is to upload an app binary, have a crowd of people on call, they jump on the app, encounter bugs, report bugs, etc.  Generally, not many guarantees from this kind of approach  But it’s quite useful as the first level of testing https://www.slideshare.net/IosifItkin/extent2016-the-future-of-software-testing
  • 5. MANUAL VS. AUTOMATED: HOW DO THEY COMPARE?  Fundamentally, a difficult question to answer  What is our goal  Operational goals  Make sure the application doesn’t crash at the start  Make sure the application isn’t easy to hack into  Development/design goals  Make sure the coverage is high or 100%, for some definition of what coverage is  Make sure the application doesn’t crash, ever, or violate assertions, ever? Do we have to choose?
  • 6. MULTIPLE, COMPETING, UNCOORDINATED TECHNIQUES ARE NORMAL  We would love to have a situation of when one solution delivers all the value  Case in point: symbolic execution was advertised as a the best thing since sliced bread:  Precision of runtime execution  Coverage of static analysis  How can this go wrong?  The practice of symbolic execution is unfortunately different  Coverage numbers from KLEE and SAGE
  • 7. SO, MAYBE ONE TECHNIQUE ALONE IS NOT GOOD ENOUGH  What can we do?  Well, let’s assume we have the compute cycles (which we often do) and the money to hire testers (which we often don’t)  How do combine these efforts?  Fundamental challenges  Overlap is significant, bling fuzzing is not so helpful  Differences are hard to hit – for example, how do we hit a specific code execution path to get closer to 100% path coverage? Symbolic execution is a heavy-weight, less-than-scalable answer
  • 8. DEVELOPER-WRITTEN TESTS VS. IN-THE-FIELD EXECUTION  Study four large open-source Java projects  We find that developer-written test suites fail to accurately represent field executions: the tests, on average, miss 6.2% of the statements and 7.7% of the methods exercised in the field;  The behavior exercised only in the field kills an extra 8.6% of the mutants; finally, the tests miss 52.6% of the behavioral invariants that occur in the field.
  • 9. LET’S FOCUS ON EXECUTION PATHS  Need to coordinate our testing efforts  Gap testing principles  Avoid repeated, wasteful work  Find ways to hit methods/statements/basic blocks/paths that are not covered by other methods Common paths: Covered multiple times Extra work is not warranted However, extra testers are likely to hit exactly this Occasionally encountered: How do we effectively cover this? Rarely seen: How do we hit this without wasting effort?
  • 10. TWO EXAMPLES OF MORE TARGETED TESTING Crowd-based UI testing aiming for 100% coverage Targeted symbolic execution aiming to hit interesting parts of the code
  • 11. GAP TESTING FOR UI  Testing Android apps  Goal: to have 100% UI coverage  How to define that is sometimes a little murky  But let’s assume we have a notion of screen coverage  Move away from covered screens  By shutting off parts of the app  Aim is to to get as close as 100% coverage by guiding crowd-sourced testers
  • 12. CROWD OF TESTERS WITH THE SYSTEM GUIDING THEM TOWARD UNEXPLORED PATHS
  • 13. GUIDING SYMBOLIC EXECUTION  Continue exploring the program until we find something “interesting’  That may be a crash or an alarm from a tool such as AddressSanitizer, ThreadSanitizer, Valgrind, etc.  Suffers from exponential blow-up issues and solver overhead  If we instead know what we are looking for, for example, a method in the code we want to see called, we can direct our analysis better  Prioritize branch outcomes so as to him the target
  • 14. ULTIMATE VISION  A portfolio of testing strategies that can be invoked on demand  Deployed together to improve the ultimate outcome  Sometimes, manual testing in the right thing, sometimes it’s not  We’ve seen some examples of complimentary testing strategies  The list is nowhere close to exhaustive…
  • 15. OPTIMIZING TESTING EFFORTS  How to get the most out of your portfolio of testing approaches, minimizing the time and money spent  It would be nice to be able to estimate the efficacy of a particular method and the cost in terms of time, human involvement, and machine cycles  That’s actually possible with machine learning-based predictive models, i.e. mean time to the next bug found is something we can
  • 17. GAP TESTING: COMBINING DIVERSE TESTING STRATEGIES FOR FUN AND PROFIT We have seen a number of testing techniques such as fuzzing, symbolic execution, and crowd-sourced testing emerge as viable alternatives to the more traditional strategies of developer-driven testing in the last decade. While there is a lot of excitement around many of these ideas, how to property combine diverse testing techniques in order to achieve a specific goal, i.e. maximize statement-level coverage remains unclear. The goal of this talks is to illustrate how to combine different testing techniques by having them naturally complement each other, i.e. if there is a set of methods that are not covered via automated testing, how do we use a crowd of users and direct their efforts toward those methods, while minimizing effort duplication? Can multiple testing strategies peacefully co-exist? When combined, can they add up to a comprehensive strategy that gives us something that was impossible before, i.e. 100% test coverage?