AB Testing at Expedia

AB Testing
Revolution through constsant
evolution

Expedia SF
114 Sansome
www.expedia.com
@expediaeng
Work with us:
letstalk@expedia.com

Paul Lucas
Sr Director, Technology
Want to visit next? Greece
Jeff Madynski
Director, Technology
Want to visit next? Croatia
Anuj Gupta
Sr Software Dev Engineer
Want to visit next? Peru

Revolution through constant
evolution

Technology Evolution
V0 – batch processing from abacus
exposure logs, Omniture, and booking
datamart. Tableau visualization
V1 - Storm, Kestrel, DynamoDB /
Postgresql reading UIS messages and
client log data. (Nov 2014 - Dec 2015)
V2 - Introduce Kafka and Cassandra (May
2016)

TNL – original solution
• Batch processing
• Tableau visualization
• Merged data from OMS/omniture
• Problems:
– 1-2d feedback loop – what if we had mistakes
in test implementation(bucketing not what
anticipated)?
– In order to fix data import errors - start over
again

TNL Dashboard v0
Omniture
click data
Booking
datamart
Abacus
exposures
Tableau
Hadoop ETL

TNL v1 Problems
• Database size 420GB, queries took 3-5
minutes
• Data drop (kestrel)
• Increase in data (multi-brand,
+customers)

TNL v1->v1.1, v2
• Fighting fires, borrowing more time
• POC next

Fighting fires – borrowing more time

User Interaction Service(UIS) Traffic

Scaling messaging system
Kafka
• Publish-subscribe based
messaging system
• Distributed and reliable
• Longer retention and
persistence
• Monitoring dashboard
and alerts
• Buffer for system
downtime
Kestrel limitation
• Message durability is not
available
• Reaching potential
scalability issues
• In-active open source
project

Scaling database performance
• Database views for caching
–Views created every 6 hours
–UI only loads data from views
–Read-only replicas for select queries
• Archive data
–Moved old and completed experiment data to
separate tables
–DB cleanup using vacuum and re-indexing

•Column-oriented, time series schema
•Time-to-live(TTL) on data
•Only store most popular aggregates

v1 VS v2
•New Architecture
–More scalable
–More responsive
–Less prone to data loss
• Lessons learnt
–System is as fast as the slowest component
–Fault-tolerance and resilience
–Partition data
–Pre-production environment

27
Apply statistical power to test results results
Using 90% confidence level, 1 out of 10
tests will be false positive or negative
Heads Tails
Right hand 51 49
Left hand 49 51
Right hand is superior at
getting heads!

Do’s and Don’ts when concluding tests
Don’t call test too early;
this increases false
positives or negatives
Don’t call tests as soon
as you see positive
results because test
result frequently goes
up and down
To claim a test Winner/Loser, the
positive/negative effect has to stay for
at least 5 consecutive days and the
trend is stable
Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test
Define one success metric and run
tests for a pre-determined duration;
(For hotel/flight tests in the US,
suggest running until confidence
interval of conversion change is
within +/- 1%); tests should run at
least 10 days
Don’t assume the midpoint (observed
% change during the test period) will
hold true after the feature is rolled
out: a 4.0% +/- 4.0% test may have
zero impact and may not be much
better than a 1.0% +/- 1.0% test
Don’t call an inconclusive test
“trending positive” or “trending
negative” as test result fluctuates
Contact ARM testing
team for questions
gaotest@expedia..com
Using 90% confidence level
Winner: Lower bound of % change >= 0 (or
probability of test being positive >= 95%);
Loser: Higher bound of % change <= 0 (or
probability of test being negative >= 95%)
Else: Inconclusive or Neutral

AB Testing at Expedia

More Related Content

Viewers also liked (9)

Similar to AB Testing at Expedia (20)

AB Testing at Expedia