AB Testing
Revolution through constsant
evolution
AB Testing at Expedia
Expedia SF
114 Sansome
www.expedia.com
@expediaeng
Work with us:
letstalk@expedia.com
Paul Lucas
Sr Director, Technology
Want to visit next? Greece
Jeff Madynski
Director, Technology
Want to visit next? Croatia
Anuj Gupta
Sr Software Dev Engineer
Want to visit next? Peru
Revolution through constant
evolution
AB Testing at Expedia
Technology Evolution
V0 – batch processing from abacus
exposure logs, Omniture, and booking
datamart. Tableau visualization
V1 - Storm, Kestrel, DynamoDB /
Postgresql reading UIS messages and
client log data. (Nov 2014 - Dec 2015)
V2 - Introduce Kafka and Cassandra (May
2016)
TNL – original solution
• Batch processing
• Tableau visualization
• Merged data from OMS/omniture
• Problems:
– 1-2d feedback loop – what if we had mistakes
in test implementation(bucketing not what
anticipated)?
– In order to fix data import errors - start over
again
TNL Dashboard v0
Omniture
click data
Booking
datamart
Abacus
exposures
Tableau
Hadoop ETL
TNL v0 -> v1
Begin Jeff
delete this page
AB Testing at Expedia
TNL v1 Problems
• Database size 420GB, queries took 3-5
minutes
• Data drop (kestrel)
• Increase in data (multi-brand,
+customers)
TNL v1->v1.1, v2
• Fighting fires, borrowing more time
• POC next
Fighting fires – borrowing more time
User Interaction Service(UIS) Traffic
Scaling messaging system
Kafka
• Publish-subscribe based
messaging system
• Distributed and reliable
• Longer retention and
persistence
• Monitoring dashboard
and alerts
• Buffer for system
downtime
Kestrel limitation
• Message durability is not
available
• Reaching potential
scalability issues
• In-active open source
project
Scaling database performance
• Database views for caching
–Views created every 6 hours
–UI only loads data from views
–Read-only replicas for select queries
• Archive data
–Moved old and completed experiment data to
separate tables
–DB cleanup using vacuum and re-indexing
TNL Dashboard v2
Product Demo
Streaming
•Column-oriented, time series schema
•Time-to-live(TTL) on data
•Only store most popular aggregates
v1 VS v2
•New Architecture
–More scalable
–More responsive
–Less prone to data loss
• Lessons learnt
–System is as fast as the slowest component
–Fault-tolerance and resilience
–Partition data
–Pre-production environment
Questions/discussion
APPENDIX
27
Apply statistical power to test results results
Using 90% confidence level, 1 out of 10
tests will be false positive or negative
Heads Tails
Right hand 51 49
Left hand 49 51
Right hand is superior at
getting heads!
Do’s and Don’ts when concluding tests
Don’t call test too early;
this increases false
positives or negatives
Don’t call tests as soon
as you see positive
results because test
result frequently goes
up and down
To claim a test Winner/Loser, the
positive/negative effect has to stay for
at least 5 consecutive days and the
trend is stable
Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test
Define one success metric and run
tests for a pre-determined duration;
(For hotel/flight tests in the US,
suggest running until confidence
interval of conversion change is
within +/- 1%); tests should run at
least 10 days
Don’t assume the midpoint (observed
% change during the test period) will
hold true after the feature is rolled
out: a 4.0% +/- 4.0% test may have
zero impact and may not be much
better than a 1.0% +/- 1.0% test
Don’t call an inconclusive test
“trending positive” or “trending
negative” as test result fluctuates
Contact ARM testing
team for questions
gaotest@expedia..com
Using 90% confidence level
Winner: Lower bound of % change >= 0 (or
probability of test being positive >= 95%);
Loser: Higher bound of % change <= 0 (or
probability of test being negative >= 95%)
Else: Inconclusive or Neutral

More Related Content

PPTX
SDLC vs STLC
PDF
Software testing
PPTX
Toolkits and tips for UX analytics CRO by Craig Sullivan
PPTX
ACS DataMart_ppt
PPTX
ACS DataMart_ppt
PDF
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
PDF
楽天市場を取り巻く状況と開発
PDF
Case study on Travel
SDLC vs STLC
Software testing
Toolkits and tips for UX analytics CRO by Craig Sullivan
ACS DataMart_ppt
ACS DataMart_ppt
AppSphere 15 - Expedia Lessons from the Trenches: Managing AppDynamics at Scale
楽天市場を取り巻く状況と開発
Case study on Travel

Viewers also liked (9)

PDF
ヤフー株式会社はアクセシビリティ対応を
なぜ始めたのか、どう進めているのか
PDF
Expedia at London Technology Week 2015
PDF
A Millennial’s Magnetic Influence on LinkedIn
PDF
デブサミ2013 【15-A-1】「爆速」を支えるテクノロジー
PPT
Expedia case study
PPTX
Brand Management Study of Amazon
PDF
ライブショッピングモール
PDF
【トピカル】風雲急を告げるEC業界を展望する
PDF
OpenStack Summitの歩き方
ヤフー株式会社はアクセシビリティ対応を
なぜ始めたのか、どう進めているのか
Expedia at London Technology Week 2015
A Millennial’s Magnetic Influence on LinkedIn
デブサミ2013 【15-A-1】「爆速」を支えるテクノロジー
Expedia case study
Brand Management Study of Amazon
ライブショッピングモール
【トピカル】風雲急を告げるEC業界を展望する
OpenStack Summitの歩き方
Ad

Similar to AB Testing at Expedia (20)

PDF
Big Data Science - hype?
PPTX
Exploratory testing workshop
PDF
1-Line AB Tests in Django by Greg Detre
PDF
Data science toolkit for product managers
PDF
Data Science Toolkit for Product Managers
PPTX
The Finishing Line
PDF
Rediscover Exploratory Testing
PPTX
Progression by Regression: How to increase your A/B Test Velocity
PDF
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
PPTX
ABTest-20231020.pptx
PDF
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
PPTX
Big Data and the Art of Data Science
PPTX
The Art & Science of Standing Out in a Saturated Market
PDF
Test Automation using UiPath Test Suite - Developer Circle Part-1.pdf
PPTX
Introduction_to_Automation Testing.pptx
PDF
TestIstanbul May 2013 Keynote Experiences With Exploratory Testing
PDF
What is Regression Testing?
PDF
Data Insights Talk
PPTX
Data Driven Testing (Part 5)
PDF
Saksham Sarode - Innovation Through Introspection - EuroSTAR 2012
Big Data Science - hype?
Exploratory testing workshop
1-Line AB Tests in Django by Greg Detre
Data science toolkit for product managers
Data Science Toolkit for Product Managers
The Finishing Line
Rediscover Exploratory Testing
Progression by Regression: How to increase your A/B Test Velocity
Matt Archer - How To Regression Test A Billion Rows Of Financial Data Every S...
ABTest-20231020.pptx
[QE 2018] Paul Gerrard – Automating Assurance: Tools, Collaboration and DevOps
Big Data and the Art of Data Science
The Art & Science of Standing Out in a Saturated Market
Test Automation using UiPath Test Suite - Developer Circle Part-1.pdf
Introduction_to_Automation Testing.pptx
TestIstanbul May 2013 Keynote Experiences With Exploratory Testing
What is Regression Testing?
Data Insights Talk
Data Driven Testing (Part 5)
Saksham Sarode - Innovation Through Introspection - EuroSTAR 2012
Ad

AB Testing at Expedia

  • 1. AB Testing Revolution through constsant evolution
  • 4. Paul Lucas Sr Director, Technology Want to visit next? Greece Jeff Madynski Director, Technology Want to visit next? Croatia Anuj Gupta Sr Software Dev Engineer Want to visit next? Peru
  • 7. Technology Evolution V0 – batch processing from abacus exposure logs, Omniture, and booking datamart. Tableau visualization V1 - Storm, Kestrel, DynamoDB / Postgresql reading UIS messages and client log data. (Nov 2014 - Dec 2015) V2 - Introduce Kafka and Cassandra (May 2016)
  • 8. TNL – original solution • Batch processing • Tableau visualization • Merged data from OMS/omniture • Problems: – 1-2d feedback loop – what if we had mistakes in test implementation(bucketing not what anticipated)? – In order to fix data import errors - start over again
  • 9. TNL Dashboard v0 Omniture click data Booking datamart Abacus exposures Tableau Hadoop ETL
  • 10. TNL v0 -> v1
  • 13. TNL v1 Problems • Database size 420GB, queries took 3-5 minutes • Data drop (kestrel) • Increase in data (multi-brand, +customers)
  • 14. TNL v1->v1.1, v2 • Fighting fires, borrowing more time • POC next
  • 15. Fighting fires – borrowing more time
  • 17. Scaling messaging system Kafka • Publish-subscribe based messaging system • Distributed and reliable • Longer retention and persistence • Monitoring dashboard and alerts • Buffer for system downtime Kestrel limitation • Message durability is not available • Reaching potential scalability issues • In-active open source project
  • 18. Scaling database performance • Database views for caching –Views created every 6 hours –UI only loads data from views –Read-only replicas for select queries • Archive data –Moved old and completed experiment data to separate tables –DB cleanup using vacuum and re-indexing
  • 22. •Column-oriented, time series schema •Time-to-live(TTL) on data •Only store most popular aggregates
  • 23. v1 VS v2 •New Architecture –More scalable –More responsive –Less prone to data loss • Lessons learnt –System is as fast as the slowest component –Fault-tolerance and resilience –Partition data –Pre-production environment
  • 26. 27 Apply statistical power to test results results Using 90% confidence level, 1 out of 10 tests will be false positive or negative Heads Tails Right hand 51 49 Left hand 49 51 Right hand is superior at getting heads!
  • 27. Do’s and Don’ts when concluding tests Don’t call test too early; this increases false positives or negatives Don’t call tests as soon as you see positive results because test result frequently goes up and down To claim a test Winner/Loser, the positive/negative effect has to stay for at least 5 consecutive days and the trend is stable Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test Define one success metric and run tests for a pre-determined duration; (For hotel/flight tests in the US, suggest running until confidence interval of conversion change is within +/- 1%); tests should run at least 10 days Don’t assume the midpoint (observed % change during the test period) will hold true after the feature is rolled out: a 4.0% +/- 4.0% test may have zero impact and may not be much better than a 1.0% +/- 1.0% test Don’t call an inconclusive test “trending positive” or “trending negative” as test result fluctuates Contact ARM testing team for questions [email protected] Using 90% confidence level Winner: Lower bound of % change >= 0 (or probability of test being positive >= 95%); Loser: Higher bound of % change <= 0 (or probability of test being negative >= 95%) Else: Inconclusive or Neutral