Google Testing Blog: Jeff Listfield

Testing Blog

Where do our flaky tests come from?

Monday, April 17, 2017

author: Jeff Listfield

Flaky Tests at Google

Google has around 4.2 million tests that run on our continuous integration system. Of these, around 63 thousand have a flaky run over the course of a week. While this represents less than 2% of our tests, it still causes significant drag on our engineers.

post

Test size - Large tests are more likely to be flaky

We categorize our tests into three general sizes: small, medium and large. Every test has a size, but the choice of label is subjective. The engineer chooses the size when they initially write the test, and the size is not always updated as the test changes. For some tests it doesn't reflect the nature of the test anymore. Nonetheless, it has some predictive value. Over the course of a week, 0.5% of our small tests were flaky, 1.6% of our medium tests were flaky, and 14% of our large tests were flaky ^[1]. There's a clear increase in flakiness from small to medium and from medium to large. But this still leaves open a lot of questions. There's only so much we can learn looking at three sizes.

The larger the test, the more likely it will be flaky[2][3][4]

Correlation between metric and likelihood of test being flaky

Metric

r2

Binary size

0.82

RAM used

0.76

[5][6]

Certain tools correlate with a higher rate of flaky tests

Some tools get blamed for being the cause of flaky tests. For example, WebDriver tests (whether written in Java, Python, or JavaScript) have a reputation for being flaky ^[7]. For a few of our common testing tools, I determined the percentage of all the tests written with that tool that were flaky. Of note, all of these tools tend to be used with our larger tests. This is not an exhaustive list of all our testing tools, and represents around a third of our overall tests. The remainder of the tests use less common tools or have no readily identifiable tool.

Flakiness of tests using some of our common testing tools

Labels

TotT 104
GTAC 61
James Whittaker 42
Misko Hevery 32
Code Health 31
Anthony Vallone 27
Patrick Copeland 23
Jobs 18
Andrew Trenk 13
C++ 11
Patrik Höglund 8
JavaScript 7
Allen Hutchison 6
George Pirocanac 6
Zhanyong Wan 6
Harry Robinson 5
Java 5
Julian Harty 5
Adam Bender 4
Alberto Savoia 4
Ben Yu 4
Erik Kuefler 4
Philip Zembrod 4
Shyam Seshadri 4
Chrome 3
Dillon Bly 3
John Thomas 3
Lesley Katzen 3
Marc Kaplan 3
Markus Clermont 3
Max Kanat-Alexander 3
Sonal Shah 3
APIs 2
Abhishek Arya 2
Alan Myrvold 2
Alek Icev 2
Android 2
April Fools 2
Chaitali Narla 2
Chris Lewis 2
Chrome OS 2
Diego Salas 2
Dori Reuveni 2
Jason Arbon 2
Jochen Wuttke 2
Kostya Serebryany 2
Marc Eaddy 2
Marko Ivanković 2
Mobile 2
Oliver Chang 2
Simon Stewart 2
Stefan Kennedy 2
Test Flakiness 2
Titus Winters 2
Tony Voellm 2
WebRTC 2
Yiming Sun 2
Yvette Nameth 2
Zuri Kemp 2
Aaron Jacobs 1
Adam Porter 1
Adam Raider 1
Adel Saoud 1
Alan Faulkner 1
Alex Eagle 1
Amy Fu 1
Anantha Keesara 1
Antoine Picard 1
App Engine 1
Ari Shamash 1
Arif Sukoco 1
Benjamin Pick 1
Bob Nystrom 1
Bruce Leban 1
Carlos Arguelles 1
Carlos Israel Ortiz García 1
Cathal Weakliam 1
Christopher Semturs 1
Clay Murphy 1
Dagang Wei 1
Dan Maksimovich 1
Dan Shi 1
Dan Willemsen 1
Dave Chen 1
Dave Gladfelter 1
David Bendory 1
David Mandelberg 1
Derek Snyder 1
Diego Cavalcanti 1
Dmitry Vyukov 1
Eduardo Bravo Ortiz 1
Ekaterina Kamenskaya 1
Elliott Karpilovsky 1
Elliotte Rusty Harold 1
Espresso 1
Felipe Sodré 1
Francois Aube 1
Gene Volovich 1
Google+ 1
Goran Petrovic 1
Goranka Bjedov 1
Hank Duan 1
Havard Rast Blok 1
Hongfei Ding 1
Jason Elbaum 1
Jason Huggins 1
Jay Han 1
Jeff Hoy 1
Jeff Listfield 1
Jessica Tomechak 1
Jim Reardon 1
Joe Allan Muharsky 1
Joel Hynoski 1
John Micco 1
John Penix 1
Jonathan Rockway 1
Jonathan Velasquez 1
Josh Armour 1
Julie Ralph 1
Kai Kent 1
Kanu Tewary 1
Karin Lundberg 1
Kaue Silveira 1
Kevin Bourrillion 1
Kevin Graney 1
Kirkland 1
Kurt Alfred Kluever 1
Manjusha Parvathaneni 1
Marek Kiszkis 1
Marius Latinis 1
Mark Ivey 1
Mark Manley 1
Mark Striebeck 1
Matt Lowrie 1
Meredith Whittaker 1
Michael Bachman 1
Michael Klepikov 1
Mike Aizatsky 1
Mike Wacker 1
Mona El Mahdy 1
Noel Yap 1
Palak Bansal 1
Patricia Legaspi 1
Per Jacobsson 1
Peter Arrenbrecht 1
Peter Spragins 1
Phil Norman 1
Phil Rollet 1
Pooja Gupta 1
Project Showcase 1
Radoslav Vasilev 1
Rajat Dewan 1
Rajat Jain 1
Rich Martin 1
Richard Bustamante 1
Roshan Sembacuttiaratchy 1
Ruslan Khamitov 1
Sam Lee 1
Sean Jordan 1
Sebastian Dörner 1
Sharon Zhou 1
Shiva Garg 1
Siddartha Janga 1
Simran Basi 1
Stan Chan 1
Stephen Ng 1
Tejas Shah 1
Test Analytics 1
Test Engineer 1
Tim Lyakhovetskiy 1
Tom O'Neill 1
Vojta Jína 1
automation 1
dead code 1
iOS 1
mutation testing 1

Archive

▼ 2025 (1)
- ▼ Jan (1)
  - Arrange Your Code to Communicate Data Flow

► 2024 (13)
- ► Dec (1)
- ► Oct (1)
- ► Sep (1)
- ► Aug (1)
- ► Jul (1)
- ► May (3)
- ► Apr (3)
- ► Mar (1)
- ► Feb (1)

► 2023 (14)
- ► Dec (2)
- ► Nov (2)
- ► Oct (5)
- ► Sep (3)
- ► Aug (1)
- ► Apr (1)

► 2022 (2)
- ► Feb (2)

► 2021 (3)
- ► Jun (1)
- ► Apr (1)
- ► Mar (1)

► 2020 (8)
- ► Dec (2)
- ► Nov (1)
- ► Oct (1)
- ► Aug (2)
- ► Jul (1)
- ► May (1)

► 2019 (4)
- ► Dec (1)
- ► Nov (1)
- ► Jul (1)
- ► Jan (1)

► 2018 (7)
- ► Nov (1)
- ► Sep (1)
- ► Jul (1)
- ► Jun (2)
- ► May (1)
- ► Feb (1)

► 2017 (17)
- ► Dec (1)
- ► Nov (1)
- ► Oct (1)
- ► Sep (1)
- ► Aug (1)
- ► Jul (2)
- ► Jun (2)
- ► May (3)
- ► Apr (2)
- ► Feb (1)
- ► Jan (2)

► 2016 (15)
- ► Dec (1)
- ► Nov (2)
- ► Oct (1)
- ► Sep (2)
- ► Aug (1)
- ► Jun (2)
- ► May (3)
- ► Apr (1)
- ► Mar (1)
- ► Feb (1)

► 2015 (14)
- ► Dec (1)
- ► Nov (1)
- ► Oct (2)
- ► Aug (1)
- ► Jun (1)
- ► May (2)
- ► Apr (2)
- ► Mar (1)
- ► Feb (1)
- ► Jan (2)

► 2014 (24)
- ► Dec (2)
- ► Nov (1)
- ► Oct (2)
- ► Sep (2)
- ► Aug (2)
- ► Jul (3)
- ► Jun (3)
- ► May (2)
- ► Apr (2)
- ► Mar (2)
- ► Feb (1)
- ► Jan (2)

► 2013 (16)
- ► Dec (1)
- ► Nov (1)
- ► Oct (1)
- ► Aug (2)
- ► Jul (1)
- ► Jun (2)
- ► May (2)
- ► Apr (2)
- ► Mar (2)
- ► Jan (2)

► 2012 (11)
- ► Dec (1)
- ► Nov (2)
- ► Oct (3)
- ► Sep (1)
- ► Aug (4)

► 2011 (39)
- ► Nov (2)
- ► Oct (5)
- ► Sep (2)
- ► Aug (4)
- ► Jul (2)
- ► Jun (5)
- ► May (4)
- ► Apr (3)
- ► Mar (4)
- ► Feb (5)
- ► Jan (3)

► 2010 (37)
- ► Dec (3)
- ► Nov (3)
- ► Oct (4)
- ► Sep (8)
- ► Aug (3)
- ► Jul (3)
- ► Jun (2)
- ► May (2)
- ► Apr (3)
- ► Mar (3)
- ► Feb (2)
- ► Jan (1)

► 2009 (54)
- ► Dec (3)
- ► Nov (2)
- ► Oct (3)
- ► Sep (5)
- ► Aug (4)
- ► Jul (15)
- ► Jun (8)
- ► May (3)
- ► Apr (2)
- ► Feb (5)
- ► Jan (4)

► 2008 (75)
- ► Dec (6)
- ► Nov (8)
- ► Oct (9)
- ► Sep (8)
- ► Aug (9)
- ► Jul (9)
- ► Jun (6)
- ► May (6)
- ► Apr (4)
- ► Mar (4)
- ► Feb (4)
- ► Jan (2)

► 2007 (41)
- ► Oct (6)
- ► Sep (5)
- ► Aug (3)
- ► Jul (2)
- ► Jun (2)
- ► May (2)
- ► Apr (7)
- ► Mar (5)
- ► Feb (5)
- ► Jan (4)

Feed

Google
Privacy
Terms