DataOps at
TripActions
27 Oct 2020
Agenda
04 - Dev/Test Flow
TA goals, customers, and team How our team builds and tests changes
02 - Data at TripActions 05 - Deployments
Data team objectives and architectural objectives How code moves into production and is monitored
03 - Infrastructure 06 - The Future
Architecture, Platforms, and tooling Future objectives in tooling and process
01 - Intro TripActions
TripActions Overview
4
OUR MISSION
To Move People,
Ideas &
Businesses
Forward
63% feel they have
to handle everything on
their own when
something goes wrong
83% spend over
an hour booking a trip
Built for the traveler by the traveler
6
and many more...
97%
Traveler adoption
34%
Hotel cost savings
1.5M
Hotel rooms
By the numbers
Managing travel for >4000
companies
Partners range from small
businesses to Fortune 100
companies in a variety of industries
Supporting more than a
million travellers
TripActions provides booking and
support services for all forms of
business and personal travel
800 employees around the
globe
Headquartered in California,
TripActions has offices around the
globe including Amsterdam
Data at TripActions
Who is the Data Team?
BI Palo Alto - 3 CA, 1 IL
BI-PA ● Product BI
● Liquid (credit card product) reporting and analysis
BI Amsterdam - 6 AMS
BI-AMS ● Operational BI (Customer Service, Success, Supply)
● Finance reporting and analysis
Data Science - 7 AMS, 1 Israel
DS
● Insights and analytics
● Predictive modelling
● Production ML services
Data Engineering - 4 AMS, 1 CA
DE
● Data integration
● Data warehousing
● Infrastructure
● Tooling
BI @ TripActions
Business Intelligence
Pillars
Standardized
Reporting
Training and
Development
Ad Hoc
Reporting and
Analytics
● >50% of company uses standard
reporting daily, >1000 daily report
views
● >65% of company has attended BI
training
● ~100 weekly self-service ad hoc
reports
Data Science
Personalizing User Experience Empowering Decision Making
Architecture and Infrastructure
Overall BI/Data Engineering Architecture
Additional Services
Data Flows
Pipelinewise
What is it?
● Extensible, “any source to any target”,
singer.io wrapper
● Provides stitch-like experience for job
management via yml definition files
● TripActions maintains a custom fork
that extends logging, metrics, and
functionality
dbt: Puts the T in ELT
Code Architecture - dbt
Data Warehouse
Core integration of all data for
concepts around users, activity,
finance, etc
● Basis for all reporting and
data science
● Provides rich, integrated
data
● Updated every 30
minutes
Event Models
“Big data” models to transform
raw events from logs and event
tracking into usable data
● Integrates ~15TB of data
from three event sources
● Enriches and normalizes
to a common data model
Reporting Marts
Denormalized reporting views
for BI reporting and self-service
● Underlies every Tableau
dashboard and >1400
self-service reports
Data Science
Data transformations to feed
into our ML analytics and
services
● Used to power every site
interaction via
personalized experiences
● Drives target setting and
operations planning
How We Develop and Test
Development approach
Work close to the
truth
Let analysts use real data
and directly test against
prod DWH to measure
impact
Make it easy to
validate, hard to fail
Tooling should make it hard
to make mistakes and easy
to commit with confidence
ALWAYS test and
document
No change should be
deployed without
documentation and tests in
place first
Rapid, high quality code changes
Combining tooling, process, and education allows anyone to continuously,
confidently make changes to core data models
Analyst/developer workflow
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Analyst/developer workflow
Analyst builds
change in dbt
All analysts and
collaborators are
proficient in dbt and
100% of transformations
are built using it. Tooling
makes it easy
Automated quality
review - local
All analysts use an
automated suite which
verifies transformations
and repeatability, runs
tests, and adds
documentation and new
tests - dbt validator
Automated quality
review - remote
Automated tests on the
PR check for general
code quality, formatting,
dependencies, etc
Guided PR review
and merge
PR processes allow
minimum waiting for
review and minimum
distraction for others
Begin with a Jira
issue
Most changes begin with
Jira tickets to track the
development and
manage stakeholder
communication
dbt Development
Every user has their own
dev database
Prior to starting, analysts can
either clone tables or create
views to production for project
dependencies
All raw data can be modelled
and tested based on actual prod
data
Local quality review
Quality review is intended to check the
following areas:
1. Code runnability
2. Existing tests
3. Data quality
4. Documentation
5. New tests
Code quality testing and automated tests
● Code quality checks run in the
following ways
○ Changed table and all dependent tables in
project
○ If models are incrementally loaded,
incremental refreshes
● Tests run on the changed model and
all dependents
● Other projects are then checked for
potential dependencies
Data Validation - Manual
Documentation and new tests
Net result: Low work, high confidence in changes
PR Review
● PRs follow a standard structure and
labelling
○ Local testing report card becomes the body
of the PR
● Slack automation coordinates the
review
○ Notifies the reviewers of the new PR
○ Informs dev of change requests
○ Tracks and labels when the PR is approved
and then merged
Deployments and Monitoring
Deploying into Snowflake
● Changes in dbt models are detected
when a PR is merged
● Deploy processes kick off
automatically, running
○ The changed model
○ Dependent models (based on model type
and name)
● Global data dictionaries are updated
on server and google sheets with new
information
In depth: deployment evaluation process
What if it goes wrong?
Monitoring via automated testing
● All data tested every six hours
● Any failing tests posted to channel
● SQL added to a pastebin for easy troubleshooting
Looking to the Future
What is it?
● Standardized data profiling and
testing
● Alerting on changes in data quality or
structure
Planned integration at TripActions
● Directly generate test profiles and
configurations via pipelinewise
● Integration of great_expectations
tests and data directly into tadoc /
dbt docs
Pipelinewise 2.0
● Extend to “anywhere to anywhere”
functionality with standardized JSON
API importer functionality
● Source data discovery and reporting
to show analysts/DS new data objects
dbt Validator 2.0
● Smart, dynamic re-cloning of objects
into dev databases for faster testing
○ Cleanup functionality to prevent testing on
stale objects
○ Fast clone based on dbt DAG to accelerate
development
● Extended test capabilities including
custom tests and data validation ->
automated tests
● Automated reporting of BI
dependencies on marts and tables
Rob Winters | Director, Data | rwinters@tripactions.com
Thank you!

More Related Content

PDF
Product Strategy and Product Success
PPTX
DevOps + DataOps = Digital Transformation
PDF
Starting Your Modern DataOps Journey
PDF
The Rise of the DataOps - Dataiku - J On the Beach 2016
PDF
Just Enough Research
PDF
Architecting Agile Data Applications for Scale
PPTX
Screw DevOps, Let's Talk DataOps
PDF
Blueprinting DevOps for Digital Transformation_v4
Product Strategy and Product Success
DevOps + DataOps = Digital Transformation
Starting Your Modern DataOps Journey
The Rise of the DataOps - Dataiku - J On the Beach 2016
Just Enough Research
Architecting Agile Data Applications for Scale
Screw DevOps, Let's Talk DataOps
Blueprinting DevOps for Digital Transformation_v4

What's hot (20)

PPTX
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
PPTX
Product Discovery At Google
PPTX
ODSC May 2019 - The DataOps Manifesto
PPTX
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
PDF
Journey for a data driven organization
PDF
DataOps - The Foundation for Your Agile Data Architecture
PPTX
Modeling TOGAF with ArchiMate
PDF
Databricks Overview for MLOps
PDF
SFA2018 Project to Product - Carmen DeArdo
PDF
Time to Talk about Data Mesh
PPTX
Building a DevOps organization
PDF
Building Data Science Teams
 
PPTX
DataOps introduction : DataOps is not only DevOps applied to data!
PPTX
Product Backlog Management
PPTX
From Data Science to MLOps
PDF
Application Migration: How to Start, Scale and Succeed
PDF
Data Modeling with Neo4j
PDF
DataOps , cbuswaw April '23
PPTX
DW Migration Webinar-March 2022.pptx
PDF
Metrics at Every (Flight) Level [2020 Agile Kanban Istanbul FlowConf]
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Product Discovery At Google
ODSC May 2019 - The DataOps Manifesto
The Data Driven Enterprise - Roadmap to Big Data & Analytics Success
Journey for a data driven organization
DataOps - The Foundation for Your Agile Data Architecture
Modeling TOGAF with ArchiMate
Databricks Overview for MLOps
SFA2018 Project to Product - Carmen DeArdo
Time to Talk about Data Mesh
Building a DevOps organization
Building Data Science Teams
 
DataOps introduction : DataOps is not only DevOps applied to data!
Product Backlog Management
From Data Science to MLOps
Application Migration: How to Start, Scale and Succeed
Data Modeling with Neo4j
DataOps , cbuswaw April '23
DW Migration Webinar-March 2022.pptx
Metrics at Every (Flight) Level [2020 Agile Kanban Istanbul FlowConf]
Ad

Similar to Data Ops at TripActions (20)

PDF
Agility for big data
PDF
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
PDF
Architecting for analytics
PPTX
Finance and Accounting BPM
PPTX
Why is TDD so hard for Data Engineering and Analytics Projects?
PDF
Northern New England TUG May 2024 - Abbott, Taft, Rugemer
PDF
Northern New England Tableau User Group (TUG) May 2024
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PDF
Big Data at a Gaming Company: Spil Games
PPTX
From Business Intelligence to Big Data - hack/reduce Dec 2014
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPT
Agile Data Science: Building Hadoop Analytics Applications
PDF
Behavior Driven Testing - A paradigm shift
PPTX
DWBI Testing and Analytics Testing Services
PPT
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
PDF
Data Management Workshop - ETOT 2016
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPTX
bigdgiuuuuoipopoooojpojhiOohuggbvkllhggjkgjkjkjk
PDF
PXL Data Engineering Workshop By Selligent
PPTX
portfolio of products and processes
Agility for big data
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
Architecting for analytics
Finance and Accounting BPM
Why is TDD so hard for Data Engineering and Analytics Projects?
Northern New England TUG May 2024 - Abbott, Taft, Rugemer
Northern New England Tableau User Group (TUG) May 2024
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Big Data at a Gaming Company: Spil Games
From Business Intelligence to Big Data - hack/reduce Dec 2014
Advanced Analytics and Machine Learning with Data Virtualization
Agile Data Science: Building Hadoop Analytics Applications
Behavior Driven Testing - A paradigm shift
DWBI Testing and Analytics Testing Services
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Data Management Workshop - ETOT 2016
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
bigdgiuuuuoipopoooojpojhiOohuggbvkllhggjkgjkjkjk
PXL Data Engineering Workshop By Selligent
portfolio of products and processes
Ad

More from Rob Winters (10)

PPTX
A brief history of data warehousing
PDF
Building data "Py-pelines"
PDF
Building a Personalized Offer Using Machine Learning
PDF
Architecting for Real-Time Big Data Analytics
PPTX
Design Principles for a Modern Data Warehouse
PPTX
Data Vault Automation at the Bijenkorf
PPTX
HP Discover: Real Time Insights from Big Data
PPTX
Getting Started with Big Data Analytics
PPT
Billions of Rows, Millions of Insights, Right Now
PPTX
Tableau @ Spil Games
A brief history of data warehousing
Building data "Py-pelines"
Building a Personalized Offer Using Machine Learning
Architecting for Real-Time Big Data Analytics
Design Principles for a Modern Data Warehouse
Data Vault Automation at the Bijenkorf
HP Discover: Real Time Insights from Big Data
Getting Started with Big Data Analytics
Billions of Rows, Millions of Insights, Right Now
Tableau @ Spil Games

Recently uploaded (20)

PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PPTX
lung disease detection using transfer learning approach.pptx
PPTX
ifsm.pptx, institutional food service management
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
PPT for Diseases.pptx, there are 3 types of diseases
PPTX
Chapter security of computer_8_v8.1.pptx
PDF
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
PPTX
DATA MODELING, data model concepts, types of data concepts
PDF
technical specifications solar ear 2025.
PPTX
transformers as a tool for understanding advance algorithms in deep learning
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PPTX
langchainpptforbeginners_easy_explanation.pptx
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPT for Diseases (1)-2, types of diseases.pptx
lung disease detection using transfer learning approach.pptx
ifsm.pptx, institutional food service management
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPT for Diseases.pptx, there are 3 types of diseases
Chapter security of computer_8_v8.1.pptx
©️ 01_Algorithm for Microsoft New Product Launch - handling web site - by Ale...
AI AND ML PROPOSAL PRESENTATION MUST.pptx
inbound2857676998455010149.pptxmmmmmmmmm
DATA MODELING, data model concepts, types of data concepts
technical specifications solar ear 2025.
transformers as a tool for understanding advance algorithms in deep learning
1 hour to get there before the game is done so you don’t need a car seat for ...
The Role of Pathology AI in Translational Cancer Research and Education
machinelearningoverview-250809184828-927201d2.pptx
langchainpptforbeginners_easy_explanation.pptx
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf

Data Ops at TripActions

  • 2. Agenda 04 - Dev/Test Flow TA goals, customers, and team How our team builds and tests changes 02 - Data at TripActions 05 - Deployments Data team objectives and architectural objectives How code moves into production and is monitored 03 - Infrastructure 06 - The Future Architecture, Platforms, and tooling Future objectives in tooling and process 01 - Intro TripActions
  • 4. 4 OUR MISSION To Move People, Ideas & Businesses Forward
  • 5. 63% feel they have to handle everything on their own when something goes wrong 83% spend over an hour booking a trip
  • 6. Built for the traveler by the traveler 6 and many more... 97% Traveler adoption 34% Hotel cost savings 1.5M Hotel rooms
  • 7. By the numbers Managing travel for >4000 companies Partners range from small businesses to Fortune 100 companies in a variety of industries Supporting more than a million travellers TripActions provides booking and support services for all forms of business and personal travel 800 employees around the globe Headquartered in California, TripActions has offices around the globe including Amsterdam
  • 9. Who is the Data Team? BI Palo Alto - 3 CA, 1 IL BI-PA ● Product BI ● Liquid (credit card product) reporting and analysis BI Amsterdam - 6 AMS BI-AMS ● Operational BI (Customer Service, Success, Supply) ● Finance reporting and analysis Data Science - 7 AMS, 1 Israel DS ● Insights and analytics ● Predictive modelling ● Production ML services Data Engineering - 4 AMS, 1 CA DE ● Data integration ● Data warehousing ● Infrastructure ● Tooling
  • 10. BI @ TripActions Business Intelligence Pillars Standardized Reporting Training and Development Ad Hoc Reporting and Analytics ● >50% of company uses standard reporting daily, >1000 daily report views ● >65% of company has attended BI training ● ~100 weekly self-service ad hoc reports
  • 11. Data Science Personalizing User Experience Empowering Decision Making
  • 13. Overall BI/Data Engineering Architecture Additional Services
  • 15. Pipelinewise What is it? ● Extensible, “any source to any target”, singer.io wrapper ● Provides stitch-like experience for job management via yml definition files ● TripActions maintains a custom fork that extends logging, metrics, and functionality
  • 16. dbt: Puts the T in ELT
  • 17. Code Architecture - dbt Data Warehouse Core integration of all data for concepts around users, activity, finance, etc ● Basis for all reporting and data science ● Provides rich, integrated data ● Updated every 30 minutes Event Models “Big data” models to transform raw events from logs and event tracking into usable data ● Integrates ~15TB of data from three event sources ● Enriches and normalizes to a common data model Reporting Marts Denormalized reporting views for BI reporting and self-service ● Underlies every Tableau dashboard and >1400 self-service reports Data Science Data transformations to feed into our ML analytics and services ● Used to power every site interaction via personalized experiences ● Drives target setting and operations planning
  • 18. How We Develop and Test
  • 19. Development approach Work close to the truth Let analysts use real data and directly test against prod DWH to measure impact Make it easy to validate, hard to fail Tooling should make it hard to make mistakes and easy to commit with confidence ALWAYS test and document No change should be deployed without documentation and tests in place first Rapid, high quality code changes Combining tooling, process, and education allows anyone to continuously, confidently make changes to core data models
  • 20. Analyst/developer workflow Begin with a Jira issue Most changes begin with Jira tickets to track the development and manage stakeholder communication Analyst builds change in dbt All analysts and collaborators are proficient in dbt and 100% of transformations are built using it. Tooling makes it easy Automated quality review - local All analysts use an automated suite which verifies transformations and repeatability, runs tests, and adds documentation and new tests - dbt validator Automated quality review - remote Automated tests on the PR check for general code quality, formatting, dependencies, etc Guided PR review and merge PR processes allow minimum waiting for review and minimum distraction for others
  • 21. Analyst/developer workflow Analyst builds change in dbt All analysts and collaborators are proficient in dbt and 100% of transformations are built using it. Tooling makes it easy Automated quality review - local All analysts use an automated suite which verifies transformations and repeatability, runs tests, and adds documentation and new tests - dbt validator Automated quality review - remote Automated tests on the PR check for general code quality, formatting, dependencies, etc Guided PR review and merge PR processes allow minimum waiting for review and minimum distraction for others Begin with a Jira issue Most changes begin with Jira tickets to track the development and manage stakeholder communication
  • 22. dbt Development Every user has their own dev database Prior to starting, analysts can either clone tables or create views to production for project dependencies All raw data can be modelled and tested based on actual prod data
  • 23. Local quality review Quality review is intended to check the following areas: 1. Code runnability 2. Existing tests 3. Data quality 4. Documentation 5. New tests
  • 24. Code quality testing and automated tests ● Code quality checks run in the following ways ○ Changed table and all dependent tables in project ○ If models are incrementally loaded, incremental refreshes ● Tests run on the changed model and all dependents ● Other projects are then checked for potential dependencies
  • 27. Net result: Low work, high confidence in changes
  • 28. PR Review ● PRs follow a standard structure and labelling ○ Local testing report card becomes the body of the PR ● Slack automation coordinates the review ○ Notifies the reviewers of the new PR ○ Informs dev of change requests ○ Tracks and labels when the PR is approved and then merged
  • 30. Deploying into Snowflake ● Changes in dbt models are detected when a PR is merged ● Deploy processes kick off automatically, running ○ The changed model ○ Dependent models (based on model type and name) ● Global data dictionaries are updated on server and google sheets with new information
  • 31. In depth: deployment evaluation process
  • 32. What if it goes wrong?
  • 33. Monitoring via automated testing ● All data tested every six hours ● Any failing tests posted to channel ● SQL added to a pastebin for easy troubleshooting
  • 34. Looking to the Future
  • 35. What is it? ● Standardized data profiling and testing ● Alerting on changes in data quality or structure Planned integration at TripActions ● Directly generate test profiles and configurations via pipelinewise ● Integration of great_expectations tests and data directly into tadoc / dbt docs
  • 36. Pipelinewise 2.0 ● Extend to “anywhere to anywhere” functionality with standardized JSON API importer functionality ● Source data discovery and reporting to show analysts/DS new data objects
  • 37. dbt Validator 2.0 ● Smart, dynamic re-cloning of objects into dev databases for faster testing ○ Cleanup functionality to prevent testing on stale objects ○ Fast clone based on dbt DAG to accelerate development ● Extended test capabilities including custom tests and data validation -> automated tests ● Automated reporting of BI dependencies on marts and tables
  • 38. Rob Winters | Director, Data | [email protected] Thank you!