Data-driven software engineering @Microsoft 
Michaela Greiler
Data-driven software engineering @Microsoft 
•How can we optimize the testing process? 
•Do code reviews make a difference? 
•Is coding velocity and quality always a tradeoff? 
•What’s the optimal way to organize work on a large team? 
MSR Redmond/TSE: 
Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta 
MSR Redmond: 
Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann 
MSR Cambridge: Brendan MurphyKim Herzig
0 
20 
40 
60 
80 
100 
2010 
2010 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2011 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2012 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
2013 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
Code Coverage trigger of Checkins 
% completely covered 
% somewhat covered 
% not covered
Reviewer recommendation: Does experience matter?
Can we change with what we can measure? 
Michaela Greiler
YES
YES 
that’s the danger!
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
What is measured? 
0 
1 
2 
3 
4 
5 
6 
7 
8 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
What is changed? 
0 
0.5 
1 
1.5 
2 
2.5 
Carl 
Lisa 
Rob 
Danny 
Number Bugs 
Code Quality
SOCIO TECHNICAL CONGRUENCE 
“Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
So should we go without any measurements?
Interpretation 
Data Collection 
Usage 
Lessons learned 
No 
Garbage!
•What is codemine? What data does codeminehave?
GMQ vs. Opportunistic data collection 
•Easily available ≠ what’s needed 
•Determine the needed data 
•Find proxy measures if needed 
•Know the analysis before collecting the data 
Otherwise, data is not usable for the intended purpose 
•Goal –Question –Metric 
•Check for completeness, cleanness/ noise and usefulness 
•Data background 
•How was data generated? 
•Why was it generated? 
•Who consumes the data? 
•What about outliers? 
•How was the data processed?
Interpretation needs domain knowledge
Tools, processes, 
practices and policies. 
Release schedule 
Time 
Engineers 
What roles exist? 
Who does what? 
Responsibilities? 
M1 
M2 
Beta 
Organization of code bases 
Team structure and culture.
You cannot compare 1:1
Engineers want to understand the nitty-gritty 
•How do you calculate the recommended reviewers? 
•Why was that person recommended? 
•Why is Lisa not recommended?
Simplicity first 
Files 
without 
bugs 
Files 
with 
bugs 
Files withoutbugs: main contributor made > 50% of all edits 
Files withbugs: main contributor made < 60% of all edits 
Ownership metric: 
Proportion of edits of all edits for the contributor with the most edits 
Reporting vs. Prediction 
Comprehension 
vs. automation 
If you can do it with a decision tree… do it…
Iterative process with very close involvement of product teams and domain experts. 
It’s a dialog 
It’s a back and forth
Mixed Method Research 
Is a research approach or methodology 
•for questions that call for real-life contextual understandings; 
•employing rigorous quantitative research assessing magnitude and frequency of constructs and 
•rigorous qualitative researchexploring the meaning and understanding of constructs; 
DR. MARGARET-ANNESTOREY 
Professor of Computer Science University of Victoria 
All methods are inherently flawed! 
Generalizability 
Precision 
Realism 
DR. ARIEVANDEURSEN 
Professor of Software Engineering Delft University of Technology
Foundations of Mixed 
Methods Research 
Designing 
Social Inquiry 
Qualitative Research: Mixed Method Research 
•Interviews 
•Observations 
•Focus groups 
•Contextual Inquiry 
•Grounded Theory 
•…
A Grounded Theory Study 
23 
Systematic procedure to discover a theory from (qualitative) data 
S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. 
B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. 
Glaser and Strauss
Deductiveversus inductive 
A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) 
Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). 
Theory 
Hypotheses 
Observation 
Confirm/Reject 
Observation 
Patterns 
Theory
All models are wrong but some are useful 
(George E. P. Box)
Theo: Test Effectiveness Optimization from History 
Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* 
*Microsoft Research, Cambridge 
+Microsoft Corporation, US
Improving Development Processes 
Product / 
Service 
Legacy 
changes 
New product 
features 
Technology 
changes 
Development Environment 
$ 
Speed 
R 
Cost 
Quality / Risk 
(should be well balanced) 
Microsoft aims for shorter release cycles 
Empirical data to support & drive decisions 
• Speed up development processes (e.g. code velocity) 
• More frequent releases 
• Maintaining / increasing product quality 
Joint effort by MSR & product teams 
• MSR Cambridge: Brendan Murphy, Kim Herzig 
• TSE Redmond: Jacek Czerwonka, Michaela Greiler 
• MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan 
• Windows, Windows Phone, Office, Dynamics product teams
Software Testing for Windows 
Winmain (main branch) 
Quality gate 
(system testing) 
Quality gate 
(system & component testing) 
Quality gate 
(component testing) 
time 
Development branch 
Multiple area branches 
Multiple component branches 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
{Simplified illustration}
Software Testing for Office 
Software testing is very expensive 
• Thousands test suites executed, millions test cases executed 
• On different branches, architectures, languages, etc. 
• We tend to repeat the same tests over and over again 
• Too many false alarms (failures due to test and infrastructure issues) 
• Each test failures slows down product development 
• Aims to find code issues as early as possible 
• At the cost of slower product development 
Actual problem 
Current process aims for maximal protection 
Dev Inner Loop 
BVT and CVT 
on main 
Dog food 
Different 
• Branching structure 
• Development process 
• Testing process 
• Release schedules 
• … 
{Simplified illustration}
Goal 
Reduce the number of test executions … 
… without sacrificing code quality 
Dynamic, self-adaptive optimization model
Solution 
Reduce the number of test executions … 
•Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). 
•We eventually find all code issues but take riskof finding them later (on higher level branches). 
… without sacrificing code quality 
High cost, unknown value 
$$$$$ 
High cost, low value$$$$ 
Low cost, 
low value$ 
Low cost, good value$$ 
How likely is a test causing: 
1)false positivesor 
2)finding code issues? 
Analyzehistoric data: 
-Test Events 
-Builds 
-Code Integrations 
Analyzepast test results 
-Passing tests, false alarms, detected code issues
Bug finding capabilities change with context
Solution 
Using cost function to model risk. 
푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 
퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" 
=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 
퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" 
=푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ 
Test 
Costto run a test. 
Valueof output.
Current Results 
Simulated on Windows 8.1 development period (BVT only)
Dynamic, Self-Adaptive 
Decision points are connected to each other 
Skipping tests influences the risk factorsof higher level branches 
We re-enable testsif code quality drops (e.g. different milestone) 
0.00% 
10.00% 
20.00% 
30.00% 
40.00% 
50.00% 
60.00% 
70.00% 
relative test reduction rate 
Time (Windows 8.1) 
Training period
Bug Finding Performance of Tests 
How many test executions fail? 
#failed test exec 
Branch level 
Number of test executions 
How many of the failed test executions result in bug reports? 
FP 
TP test-unspecific 
TP test-specific 
Branch level
Impact on Development Process 
Secondary Improvements 
•Machine Setup: we may lower the number of machines allocated to testing process 
•Developer satisfaction: Removing false test failures increases confidence in testing process 
…hard to estimate speed improvement through simulation 
“We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
Michaela Greiler 
@mgreiler 
www.michaelagreiler.com 
http://research.microsoft.com/en-us/projects/tse/

More Related Content

PDF
Strategies to Avoid Test Fixture Smells durin Software Evolution
PDF
On to code review lessons learned at microsoft
PPT
Mattias Ratert - Incremental Scenario Testing
PDF
Exploring Exploratory Testing
PPT
A survey of software testing
PDF
Programming with GUTs
PPT
Erik Boelen - Testing, The Next Level
PDF
Exploratory Testing
Strategies to Avoid Test Fixture Smells durin Software Evolution
On to code review lessons learned at microsoft
Mattias Ratert - Incremental Scenario Testing
Exploring Exploratory Testing
A survey of software testing
Programming with GUTs
Erik Boelen - Testing, The Next Level
Exploratory Testing

What's hot (19)

PPT
Better Software Classic Testing Mistakes
PDF
A Study: The Analysis of Test Driven Development And Design Driven Test
PPTX
Writing acceptable patches: an empirical study of open source project patches
PDF
Exploratory Testing Basics and Future
PDF
On The Relation of Test Smells to Software Code Quality
PDF
ISTQB CTAL - Test Analyst
PDF
Software testing
DOC
Ôn tập kiến thức ISTQB
PPT
Klaus Olsen - Agile Test Management Using Scrum
PPT
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
ODP
Effective unit testing
PPT
Mats Grindal - Risk-Based Testing - Details of Our Success
PPTX
OmniTestingConf: Taking Test Automation to the Next Level
PDF
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
PDF
S440999102
PDF
01 software test engineering (manual testing)
PDF
Julian Harty - Alternatives To Testing - EuroSTAR 2010
PDF
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
PPTX
IT8076 - SOFTWARE TESTING
Better Software Classic Testing Mistakes
A Study: The Analysis of Test Driven Development And Design Driven Test
Writing acceptable patches: an empirical study of open source project patches
Exploratory Testing Basics and Future
On The Relation of Test Smells to Software Code Quality
ISTQB CTAL - Test Analyst
Software testing
Ôn tập kiến thức ISTQB
Klaus Olsen - Agile Test Management Using Scrum
'Continuous Quality Improvements – A Journey Through The Largest Scrum Projec...
Effective unit testing
Mats Grindal - Risk-Based Testing - Details of Our Success
OmniTestingConf: Taking Test Automation to the Next Level
QUALITY METRICS OF TEST SUITES IN TESTDRIVEN DESIGNED APPLICATIONS
S440999102
01 software test engineering (manual testing)
Julian Harty - Alternatives To Testing - EuroSTAR 2010
Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010
IT8076 - SOFTWARE TESTING
Ad

Similar to Can we induce change with what we measure? (20)

PPTX
ISTQB foundation level - day 2
PPTX
A Software Testing Intro
PPTX
Software Testing Foundations Part 7 - Basics of Test Management
PPT
Software testing part
PPT
Oose unit 5 ppt
PPTX
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
PPT
OOSE Unit 5 PPT.ppt
PPT
Black box-software-testing-douglas-hoffman2483
PPT
Testing 2 - Thinking Like A Tester
PDF
software-testing-yogesh-singh (1).pdf
PPTX
History Class - For software testers
DOCX
Chapter 10 Testing and Quality Assurance1Unders.docx
PPSX
DISE - Software Testing and Quality Management
PPSX
Introduction to software testing
PPT
Software testing-and-analysis
PPT
Types of Software Testing
PPTX
Software testing
PPT
Software Engineering Lec 10 -software testing--
PPTX
Software Testing_A_mmmmmmmmmmmmmmmmmmmmm
PDF
Software testing
ISTQB foundation level - day 2
A Software Testing Intro
Software Testing Foundations Part 7 - Basics of Test Management
Software testing part
Oose unit 5 ppt
The Art of Testing Less without Sacrificing Quality @ ICSE 2015
OOSE Unit 5 PPT.ppt
Black box-software-testing-douglas-hoffman2483
Testing 2 - Thinking Like A Tester
software-testing-yogesh-singh (1).pdf
History Class - For software testers
Chapter 10 Testing and Quality Assurance1Unders.docx
DISE - Software Testing and Quality Management
Introduction to software testing
Software testing-and-analysis
Types of Software Testing
Software testing
Software Engineering Lec 10 -software testing--
Software Testing_A_mmmmmmmmmmmmmmmmmmmmm
Software testing
Ad

Recently uploaded (20)

PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
DOCX
search engine optimization ppt fir known well about this
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Advancing precision in air quality forecasting through machine learning integ...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
future_of_ai_comprehensive_20250822032121.pptx
sustainability-14-14877-v2.pddhzftheheeeee
Microsoft User Copilot Training Slide Deck
Rapid Prototyping: A lecture on prototyping techniques for interface design
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
giants, standing on the shoulders of - by Daniel Stenberg
search engine optimization ppt fir known well about this
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
The influence of sentiment analysis in enhancing early warning system model f...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
4 layer Arch & Reference Arch of IoT.pdf
Flame analysis and combustion estimation using large language and vision assi...
MuleSoft-Compete-Deck for midddleware integrations
Consumable AI The What, Why & How for Small Teams.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf

Can we induce change with what we measure?

  • 1. Data-driven software engineering @Microsoft Michaela Greiler
  • 2. Data-driven software engineering @Microsoft •How can we optimize the testing process? •Do code reviews make a difference? •Is coding velocity and quality always a tradeoff? •What’s the optimal way to organize work on a large team? MSR Redmond/TSE: Michaela GreilerJacek CzerwonkaWolfram SchulteSuresh Thummalapenta MSR Redmond: Christian BirdKathryn McKinleyNachi NagappanThomas Zimmermann MSR Cambridge: Brendan MurphyKim Herzig
  • 3. 0 20 40 60 80 100 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 Code Coverage trigger of Checkins % completely covered % somewhat covered % not covered
  • 4. Reviewer recommendation: Does experience matter?
  • 5. Can we change with what we can measure? Michaela Greiler
  • 6. YES
  • 8. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 9. What is measured? 0 1 2 3 4 5 6 7 8 Carl Lisa Rob Danny Number Bugs What is changed? 0 0.5 1 1.5 2 2.5 Carl Lisa Rob Danny Number Bugs Code Quality
  • 10. SOCIO TECHNICAL CONGRUENCE “Design and programming are human activities; forget that and all is lost” –Bjarne Stroustrop
  • 11. So should we go without any measurements?
  • 12. Interpretation Data Collection Usage Lessons learned No Garbage!
  • 13. •What is codemine? What data does codeminehave?
  • 14. GMQ vs. Opportunistic data collection •Easily available ≠ what’s needed •Determine the needed data •Find proxy measures if needed •Know the analysis before collecting the data Otherwise, data is not usable for the intended purpose •Goal –Question –Metric •Check for completeness, cleanness/ noise and usefulness •Data background •How was data generated? •Why was it generated? •Who consumes the data? •What about outliers? •How was the data processed?
  • 16. Tools, processes, practices and policies. Release schedule Time Engineers What roles exist? Who does what? Responsibilities? M1 M2 Beta Organization of code bases Team structure and culture.
  • 18. Engineers want to understand the nitty-gritty •How do you calculate the recommended reviewers? •Why was that person recommended? •Why is Lisa not recommended?
  • 19. Simplicity first Files without bugs Files with bugs Files withoutbugs: main contributor made > 50% of all edits Files withbugs: main contributor made < 60% of all edits Ownership metric: Proportion of edits of all edits for the contributor with the most edits Reporting vs. Prediction Comprehension vs. automation If you can do it with a decision tree… do it…
  • 20. Iterative process with very close involvement of product teams and domain experts. It’s a dialog It’s a back and forth
  • 21. Mixed Method Research Is a research approach or methodology •for questions that call for real-life contextual understandings; •employing rigorous quantitative research assessing magnitude and frequency of constructs and •rigorous qualitative researchexploring the meaning and understanding of constructs; DR. MARGARET-ANNESTOREY Professor of Computer Science University of Victoria All methods are inherently flawed! Generalizability Precision Realism DR. ARIEVANDEURSEN Professor of Software Engineering Delft University of Technology
  • 22. Foundations of Mixed Methods Research Designing Social Inquiry Qualitative Research: Mixed Method Research •Interviews •Observations •Focus groups •Contextual Inquiry •Grounded Theory •…
  • 23. A Grounded Theory Study 23 Systematic procedure to discover a theory from (qualitative) data S. Adolph, W. Hall, Ph. Kruchten. Using Grounded theory to study the experience of software development. Empirical Software Engineering,2011. B. Glaser and J. Holton. Remodeling grounded theory. Forum Qualitative Res., 2004. Glaser and Strauss
  • 24. Deductiveversus inductive A deductive approach is concerned with developing a hypothesis (or hypotheses) based on existing theory, and then designing a research strategy to test the hypothesis (Wilson, 2010, p.7) Inductive approach starts with observations. Theories emerge towards the end of the research and as a result of careful examination of patterns in observations (Goddard and Melville, 2004). Theory Hypotheses Observation Confirm/Reject Observation Patterns Theory
  • 25. All models are wrong but some are useful (George E. P. Box)
  • 26. Theo: Test Effectiveness Optimization from History Kim Herzig*, Michaela Greiler+, Jacek Czerwonka+, Brendan Murphy* *Microsoft Research, Cambridge +Microsoft Corporation, US
  • 27. Improving Development Processes Product / Service Legacy changes New product features Technology changes Development Environment $ Speed R Cost Quality / Risk (should be well balanced) Microsoft aims for shorter release cycles Empirical data to support & drive decisions • Speed up development processes (e.g. code velocity) • More frequent releases • Maintaining / increasing product quality Joint effort by MSR & product teams • MSR Cambridge: Brendan Murphy, Kim Herzig • TSE Redmond: Jacek Czerwonka, Michaela Greiler • MSR Redmond: Tom Zimmermann, Chris Bird, Nachi Nagappan • Windows, Windows Phone, Office, Dynamics product teams
  • 28. Software Testing for Windows Winmain (main branch) Quality gate (system testing) Quality gate (system & component testing) Quality gate (component testing) time Development branch Multiple area branches Multiple component branches Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection {Simplified illustration}
  • 29. Software Testing for Office Software testing is very expensive • Thousands test suites executed, millions test cases executed • On different branches, architectures, languages, etc. • We tend to repeat the same tests over and over again • Too many false alarms (failures due to test and infrastructure issues) • Each test failures slows down product development • Aims to find code issues as early as possible • At the cost of slower product development Actual problem Current process aims for maximal protection Dev Inner Loop BVT and CVT on main Dog food Different • Branching structure • Development process • Testing process • Release schedules • … {Simplified illustration}
  • 30. Goal Reduce the number of test executions … … without sacrificing code quality Dynamic, self-adaptive optimization model
  • 31. Solution Reduce the number of test executions … •Runevery test at least once beforeintegrating code change into main branch (e.g., winmain). •We eventually find all code issues but take riskof finding them later (on higher level branches). … without sacrificing code quality High cost, unknown value $$$$$ High cost, low value$$$$ Low cost, low value$ Low cost, good value$$ How likely is a test causing: 1)false positivesor 2)finding code issues? Analyzehistoric data: -Test Events -Builds -Code Integrations Analyzepast test results -Passing tests, false alarms, detected code issues
  • 32. Bug finding capabilities change with context
  • 33. Solution Using cost function to model risk. 푪풐풔풕푬풙풆풄풖풕풊풐풏>푪풐풔풕푺풌풊풑?suspend∶executetest 퐶표푠푡퐸푥푒푐푢푡푖표푛=퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+"Costofpotentialfalsealarm" =퐶표푠푡푀푎푐ℎ푖푛푒/푇푖푚푒∗푇푖푚푒퐸푥푒푐푢푡푖표푛+(푃푟표푏퐹푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒푇푟푖푎푔푒) 퐶표푠푡푆푘푖푝="Potentialcostoffindingadefectlater" =푃푟표푏푇푃∗퐶표푠푡퐷푒푣푒푙표푝푒푟/푇푖푚푒∗푇푖푚푒퐹푟푒푒푧푒푏푟푎푛푐ℎ∗#퐷푒푣푒푙표푝푒푟푠퐵푟푎푛푐ℎ Test Costto run a test. Valueof output.
  • 34. Current Results Simulated on Windows 8.1 development period (BVT only)
  • 35. Dynamic, Self-Adaptive Decision points are connected to each other Skipping tests influences the risk factorsof higher level branches We re-enable testsif code quality drops (e.g. different milestone) 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% relative test reduction rate Time (Windows 8.1) Training period
  • 36. Bug Finding Performance of Tests How many test executions fail? #failed test exec Branch level Number of test executions How many of the failed test executions result in bug reports? FP TP test-unspecific TP test-specific Branch level
  • 37. Impact on Development Process Secondary Improvements •Machine Setup: we may lower the number of machines allocated to testing process •Developer satisfaction: Removing false test failures increases confidence in testing process …hard to estimate speed improvement through simulation “We used the data […] to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
  • 38. Michaela Greiler @mgreiler www.michaelagreiler.com http://research.microsoft.com/en-us/projects/tse/