SlideShare a Scribd company logo
Four Data Architecture Mega-
Patterns for Agility
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Our Focus Is The River Of Work Right In Front Of
Us
• The Model,
• The Algorithm,
• The Data Pipeline,
• The Data Visualization,
• The Governance,
• The Data Itself
What is my next task?
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Next Task Focus Is Making Us Blind To Failure
• The Model,
• The Algorithm,
• The Data Pipeline,
• The Data Visualization,
• The Governance,
• The Data Itself
Task Focus Not Working
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Look Upstream At The Source Of The Problem
• Develop
• Deploy
• Iterate
• Monitor
• Test
• Collaborate
How You Do It
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
How? Focus On Four Key Upstream Processes
Decrease The Cycle Time:
Continuously Deploy
Innovation
Lower Error Rates: Increasing
Customer Data Trust
Improve Collaboration: Less
Meetings & Bureaucracy
Measure Your Team: And
show everyone your success
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Aligns People, Processes,
and Technology
Rapid experimentation and innovation
enables faster delivery
Low error rates
Collaboration across complex sets of
people, technology, and
environments
Clear measurement and monitoring of
results
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
What Problems Do We Need To Solve With
Architecture for AI and Data Analytics?
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Gartner Data Fabric
“Data fabric focuses on composability,
allowing users to build a flexible, agile,
scalable architecture that will be able
to supply data to humans or machine
users.
Data fabric is a design concept, not just
a set of technology components. “
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric Toolchain Elements
Store: Transform:
SQL Code, ETL
Govern:
Catalog
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric Toolchain Elements
Store: Transform:
SQL Code, ETL
Virtualize:
layer
Govern:
Catalog
Includes Data
Virtualization in
Reference Fabric
Design
Includes Data
Streaming in
Reference Fabric
Design
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric: Beware Magic of ‘AI Inside’
Store: Transform:
SQL Code, ETL
Virtualize:
layer
Govern:
Catalog
AI
AI
AI AI
Magic AI:
Danger Will
Robinson
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric: Beware Magic of ‘AI Inside’
Think of ‘AI Inside’ of Data Fabric like
autonomous driving:
• Level 1: Simple, keep your hands
on wheel
• Level 5: Cross Boston, in the
snow, at night
We are at Level 1 of AI in the Data
Fabric
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
AI + New Tools Agility
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
People & Tools in a
DataOps
Architecture
Agility
AI + New Tools
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Canonical ‘Factory’ Data Architecture / Fabric
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Functional Architecture
Cloud/On-Prem
Production
Environment
Test
Dev
Source
Data
Data
Customers
Raw
Lake
Data
Engine
-ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern
-ance
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
Orchestrate, Monitor, Test
DataOps Platform
Storage
&Version
Control
History &
Metadat
a
Auth &
Permissions
Envron-
ment
Secrets
DataOps
Metrics &
Reports
Automated
Deployment
Environment
Creation
and
Management
DataOps
Team
Second
Cloud/On-
Prem Data
Center
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataOps Physical Architecture
Cloud/On-Prem
Data
Center
Production
Environment
Test
Dev
Source
Data
Data
Customers
Agent
Agent
Agent
DataOps Platform
Storage Metadat
a
Auth Secrets Metrics
Raw
Lake
Data
Engine
-ering
Refined
Data
Data
Science
Data
Viz.
Data
Govern
-ance
Second
Cloud/On-
Prem Data
Center
Agent DataOps
Team
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Cloud/ON-Prem
#1
Production
Environment
Test
Dev
Agent
Agent
Agent
DataOps
Team
DataOps Pipeline
Cloud/On
Prem
#2
Production
Environment
Dev
Agent
Agent
DataOps Pipeline
DataOps Platform
Storage
&Version
Control
History &
Metadat
a
Auth &
Permissions
Envron-
ment
Secrets
DataOps
Metrics &
Reports
DataOps Spans Environments
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Fabric – A New Fashion Trend?
• It's Hot Stuff:
Gartner View, Forrester View. Top 10 downloaded report 2020, top inquiry
• What is a data fabric?:
• All the stuff you do with centralized data infrastructure:
ETL, DB, governance, store, lake, warehouse, stream/batch transformation.
• Plus, some fancy new stuff
1. AI component - magic pixie dust of self-driving data
2. Data virtualization/semantic layer
• However, it is missing other parts of the data value chain:
models, visualizations, self service. It’s more ‘hub’ than ‘spoke’
• Why? Moniker that covers the latest trends in data management.
• Caveat: The goal of implementing a data fabric is agility - agility is a second-order effect from
better tools. The primary driver is people & process following DataOps.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Mesh 101
Why Data Mesh?
• Centralized Systems Fail
• Skill-based roles are unable to respond to rapid
customer needs
• Data domain knowledge matters
• Universal, one size fits all patterns fail
• General Data Analytic Project Failure
• Inspired by domain driven design (DDD) in software
The main idea is to take a best practice from
developing software & apply them to data analytics.
(Sound familiar?)
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
The Human Side of a Data Mesh: Main Idea
• The organization structure builds walls
& barriers to the changes
• When you make a change, you need to
update each component & coordinate
between several different teams
The organization creates walls & changes need to cross the traditional organizational boundaries
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
No, Data Engineers Are Not Perfectly Fungible
Data Mesh = Organization Mesh
The use of domain-driven / data mesh
design as the primary means:
1. Assignment of full end-to-end
ownership of a domain to one
cross-functional team that gets the
necessary support to fulfil that
responsibility.
2. Structure data
3. Build composable systems
Data Organization Keys
Let the small team continually own the
data set & not move for project to project
is key
‘You own the product’ thinking provides
the right incentives between the producers
& consumers
Source: thoughtworks.com/insights/blog/data-mesh-its-not-about-tech-its-about-ownership-and-communication
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
● Take the ideas of microservices where a team
owns the dev, test, deploy & running of the
microservice (5-9 people)
● Organize around the domain, not the technology
● The Operational & Data products are created by
the same team
● Domain data as a product - domain data teams
must consider their data assets & artifacts as their
products & others as their customers
● Data Engineers must live, work & understand a
finite number of data sets to really add value
The Human Side of a Data Mesh: Main Idea
The organization creates walls & changes need to cross the traditional organizational boundaries
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What Data is in a Domain?
Domains Aligned with Sources / Types of Data
• ‘Mastered’ Data:
• Entities of business / subject areas
• Customers, products, etc.
• ‘Sources’ of Data:
• Business reality: facts on the ground
• Weblogs, user interaction history
Domains Aligned with Consumption of Data
• Integrated Data / Ready for Consumption
• Facts / Dimensions / Star Schemas
• Aggregated Views
• Product View
• Never Done, Always Improving
• Customer Usage Fucus
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What are the Domain’s Components?
1. Data
2. Artifacts created from that data:
models, views, reports, dashboards, etc.
3. Code that acts upon that data:
pipelines, toolchains, etc.
4. Team used to create/update/run that Domain
5. Metadata: catalogs, lineage, test results,
processing history, etc.
Data Domain 1
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Must Be Composable & Controllable
Data Domain 1
Data Domain
2
Data Domain
3
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Interfaces
Data Domain
The Where:
How to find & access data securely;
e.g., DB connect string
The What:
Description of the data;
e.g., data catalog URL
The When:
Processing Results, Timing,
Test Results, Status, etc.
The How:
Steps, Code/Config, toolchain
& processing pipeline
The With:
Raw Data (or other Data
Domain), hopefully immutable
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Interfaces as URLs
https://cloud.datakitchen.io/#/recipes/dc/Production/agile-analytic-ops/variations/prod-env-DevSprint-build-now
https://cloud.datakitchen.io/#/orders/dc/Production/runs/60e82aa8-2518-11eb-8653-c2e92ba8ebec
jdbc:redshift://endpoint:port/database
https://dkimplementation.atlassian.net/wiki/spaces/
DC/pages/9306114/Dimension+Tables
Data Domain
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What Do You Want Out of a Domain?
A series of independent domains of data that are:
1. Trusted
2. Usable by the teams’ customer
3. Discoverable / Findable
4. Understandable & well-described
5. Secure & permissioned
6. URL/API Driven: & can inter-operate with other domains
7. Have ‘single throat to choke’ for the customer to easily:
• Report problem & get updates on fixes
• Ask for new insights / improvements & get them into
production quickly
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Data Mesh Change in Focus
1. Domains & the grouping of your work into small teams
& partitions over ‘one platform to rule them all’
2. What services you are providing you customer, rather
than what data you are loading
3. Discovering & using over extracting & loading
4. Decentralization & the freedom to innovate over
central control
5. Ecosystem of data products linked together over a
centralized lake / warehouse
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
An Example of Domains
US Commercial Pharma Domains
• NPP (Non-Personal Promotion): emails, web site visits, even radio ads
• Physician: doctor (& other outlets) sales, claims data, anonymized patient data
• Payer: Payer/Plan, rebates, formulary
Launch:
NPP Domain
Growth:
Physician
Domain
Mature:
Payer
Domain
Commercial Pharma Analytics
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
What About the Data?
What about the data in each domain?
• Each domain has separate data sources
• Overlapping entities (e.g., physicians) exist in
each domain
• Each domain has different cycle times of product
(i.e., daily, weekly, hourly, etc.)
• Each data domain has its unique characteristics.
• For instance, subnational physician data from
IQVIA - purchased by pharma companies -
may not 1:1 match claims data, which may
not match payer data. This is due to data
supplier issues & timing projection
algorithms.
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Pharma Sales & Marketing Teams
NPP Domain Marketing & Sales Team
One part of the pharma brand team focused on ads, digital & other non-personal
promotions. This team matters most pre-launch & during the growth phase of a product
Physician Domain Marketing & Sales Team
Another part of the pharma team focused on in-person sales. Those are the good-looking
people you see in doctors waiting rooms. Sales calls, samples, doctor visits, messages,
call alignments, etc. This team matters the most during the first years of a pharma launch.
Payer Domain Marketing & Sales Team
A third part is focused on Payer Marketing. This part is - in essence - controlling the price
of a pharmaceutical product due to the rebate given to any payer. They are concerned
about the rebate contract, being on formulary & tier & copays. Payer Marketing matters
more during the 'mature' phase of a pharma product lifecycle.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers
1. Mastering & small files foundation files are a domain layer
There are 1M physicians in the US, but the company master of
physicians is only 40K. This work is done by separate teams working
independently.
2. Of course, the main data warehouse is a domain layer
There are facts & dimensions, along with multiple tables used for specific
analysts needed.
3. Self/Service & Data Science are a domain layers
They can keep their owned cached data sets (e.g., tableau extract) or
have their own small data sets that they mix with the central data in
Alteryx (or other) tools. Data Science teams have their own segmentation
models dependent on specific views or extracts of data.
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Mastering Domain:
Physician MDM
Mastering Domain:
Target Lists, Product
Market Baskets
Brand Team
Reporting Domain
Field Sales Reporting
Domain
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Relationships
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Mastering Domain:
Physician MDM
Mastering Domain:
Target Lists, Product
Market Baskets
Brand Team
Reporting Domain
Field Sales Reporting
Domain
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Steps
Sub-national Weekly data
Sub-national Payer Data
Sub-national Institutional (DDD) Data
National Prescription Audit Data
Sales Force Alignment Data
Longitudinal Patient Data
Sub-national Profit and Loss Data
Sub-national Claims and Co-pay Data
Payer and Plan Formulary Data
Census Data
Stocking Data
Source of Business
AMA Data
Retail OTC Data
Buy and Bill Data
Field Calls and Promotional Activity Data
Rep Expenses and Vacancy Data
Hotline Verification Data
Contract and Payer Rebates Data
Veeva CRM Data
ERP Data
NPP Data
Forecast Data
Primary Research Data
Raw, Sourced Data
(Various)
Mastered Data Sets
(IT)
Integrated Data Sets
(Data Engineers)
Self Service Tools
(Analyst)
Business
Customer
Mastering Domain:
Physician MDM
Brand Team
Reporting Domain
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Benefits of Approach
• Yes, you can do all these four Data Architecture Mega-
Patterns for Agility!
• Benefits
• Support over $10 Billion in sales
• Integrated 100s of data sets
• Very, very few errors or missed SLAs
• > 50,000 automated tests
• > 100 of schema/data changes per week
• Staff of seven data and DataOps engineers
• Low total yearly costs
hardware/hosting/software/staffing
• DataKitchen software enables those four patterns:
Recipes, Tests, Kitchens and Especially Ingredients can
handle all the needs
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Conclusion and More Information
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Built With Functional Programming
• Start with immutable (never
changing) data
• Pure functions (you put some
data in & get some data out)
• Idempotency (you can run it over
again & get the same thing)
• No side effects
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Approach Benefits
Reproducibility
• Foundational to the scientific method
and data science / AI
• Critical from a legal standpoint and
sanity standpoint
Complexity Reduction
Cloud Native
• Storage and compute are cheap
Faster Time To Value
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Data Mesh Systems
Production
Data
Analytic
Customers
Production Team
Yeah! All my tests & monitors
are passing!
Happy Customers!
Think of all your data & analytic work as a
“Big Function” in domain
• In that function are your data & AI toolchain
• Everybody works that function
(whether they know it or not!)
• Re-running a task for the same date should
always produce same output
• Data can be repaired by rerunning the new code
• A ‘big red/green light’ on the system telling you
everything is OK
Data
Domain
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Functional Data Systems Are Easier to Test & Deploy
Yeah! All my tests & monitors are
passing!
I did not break any code!
I can safely push to production!
A safe controlled process
Production
Data
Production Team
Data
Domain
Test
Data
Development Team
Data
Domain
Just flip the DNS entry for
the production URL!
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Agenda
Four Data Architecture Mega-Patterns for Agility
1. DataOps
2. Data Fabric
3. Data Mesh
4. Functional Data Engineering
An Example that Combines all Four Patterns
Why DataKitchen supports these four patterns
easily!
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Layers Processing Relationships
How do we update the data?
• Each Domain layer its own domain update processing
• Each layer has their own toolchain (i.e., SQL, Python, Informatica, etc.)
• Each layer has a series of sub-steps (i.e., a ‘DAG’)
• Each layer wants to know if the build is completed, the test applied & if the data is data is correct
What causes the update of each domain?
• Time / Schedule
• Order of operations, a meta-orchestrated coupling of each Domain, one part may need to be done
before the other or after.
• Event-orchestrated coupling. When new data arrives, kick off a change.
You Need a ‘Master DAG’ to run them all
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Inter-Domain Communication Links
Field Sales
Reporting Domain
Inter-Domain Communication Question / Steps Asked
Domain Query
“When was the last time you were updated?”
Successful or failure? Warnings?
Domain Query
“Is the data or artifacts in your domain good?
Can you prove it with some test results?”
Process Linkage
“Ok, you start. I am done.”
Process Linkage
“Ok, you start. I am done & here are a bunch of parameters you need to
keep going.”
Event Linkage
“Here is an event: e.g., processing completed, error, warnings, etc.”
Data Linkage
“We share a common table (e.g., a dimension table) in our domain.”
Development Linkage
“Can I re-create your domain in development?”
Can I see the code you used to create it?”
“Can I modify that code in development?”
“Is there a path to production?”
{ … }
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Supported Inter-Domain Communication Links
Field Sales
Reporting Domain
Inter Domain Communication DataKitchen Support
Domain Query YES
Domain Query YES
Process Linkage YES
Process Linkage YES
Event Linkage YES
Data Linkage NO
Development Linkage YES
{ … }
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Domain Development Process
The development process is essential.
• Code changes or new data sets may affect
downstream parts of the mesh.
• DataKitchen encapsules the development
& production environments
Key Questions
• How does a developer change one part
& not break things?
• How do you allow local change to a
domain & global governance & control?
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
Production Domains
Development of Domains
How do I change
this part & not
break things?
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Software's Role (Recipes)
DataKitchen DataOps Capability
Intelligent, test-informed, system-wide production
orchestration (meta-orchestration)
What workflow tools like Airflow, Control-
M, or Azure Data Factory do not have
• Integrated Production Testing & Monitoring
• A set of connectors to the complex chain of
data engineering, science, analytics, self-
service, governance & database tools.
• DataKitchen Recipes Meta-Orchestration or a
‘DAG of DAGs’
Mastering Domain: Physician
MDM
Brand Team Reporting Domain
DataKitchen Recipe
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Domain Interfaces As URLs
https://cloud.datakitchen.io/#/recipes/dc/Production/agile-
analytic-ops/variations/prod-env-DevSprint-build-now
Data Domain
The When:
DataKitchen OrderRun information
The How:
DataKitchen Recipe
https://cloud.datakitchen.io/#/orders/dc/
Production/runs/60e82aa8-2518-11eb-
8653-c2e92ba8ebec
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
DataKitchen Ingredients Allow Composition
• DataKitchen Ingredients allow reusable components that
can be incorporated into other processing
• Each domain can change independently, with a centralized
process to make sure the entire system is correct
• While DataKitchen Kitchens lets people work
independently, Ingredients let people work dependently:
• Recipes can reuse the data or artifacts that other Recipe
Variations produce
• Recipes need to incorporate other Recipes Variations
when they run
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Conclusion
Data Fabric, Data Mesh, and Functional Data engineering are exciting new paradigms
However, the DataOps part of is of paramount importance!
• The lineages & composition between domains are important
• Managing central process control & governance with local domain independence is very important
DataKitchen Features (e.g., Recipes, Tests, Kitchens & Ingredients) can handle all the needs of
the DataOps part of the mesh
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Accelerate Theses Patterns With DataKitchen
Software
DataKitchen DataOps Software Platform
that delivers new business insights by
enabling the development and
deployment of innovative, high quality
data analytic pipelines. Rapidly
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering
Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
Learn More !
Sign The DataOps Manifesto:
http://dataopsmanifesto.org
Free DataOps Cookbook:
https://datakitchen.io/the-dataops-cookbook/
Free DataOps Transformation Book
https://datakitchen.io/recipes-for-dataops-success-guide-to-dataops-transformation/
DataOps Data Fabric
Data Mesh
Functional
Data
Engineering

More Related Content

What's hot (20)

PDF
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
PDF
Data Governance Best Practices
DATAVERSITY
 
PDF
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PDF
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
PDF
Future of Data Engineering
C4Media
 
PDF
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
PPTX
Capability Model_Data Governance
Steve Novak
 
PDF
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DATAVERSITY
 
PDF
Mdm: why, when, how
Jean-Michel Franco
 
PDF
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PDF
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
PDF
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
PDF
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Data Quality Best Practices
DATAVERSITY
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Governance Best Practices
DATAVERSITY
 
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Tristan Baker
 
Time to Talk about Data Mesh
LibbySchulze
 
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Future of Data Engineering
C4Media
 
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Capability Model_Data Governance
Steve Novak
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DATAVERSITY
 
Mdm: why, when, how
Jean-Michel Franco
 
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Data Quality Best Practices
DATAVERSITY
 
Azure Synapse Analytics Overview (r2)
James Serra
 

Similar to DataOps - The Foundation for Your Agile Data Architecture (20)

PDF
Data and Application Modernization in the Age of the Cloud
redmondpulver
 
PDF
Self-Service Analytics with Guard Rails
Denodo
 
PPTX
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
 
PDF
Big Data Companies and Apache Software
Bob Marcus
 
PDF
Modernize your Infrastructure and Mobilize Your Data
Precisely
 
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
PPTX
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
PPTX
[DSC DACH 24] Ship data faster with dbt - Sean McIntyre
DataScienceConferenc1
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PDF
Bridging the Gap: Analyzing Data in and Below the Cloud
Inside Analysis
 
PDF
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PDF
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
PDF
Horses for Courses: Database Roundtable
Eric Kavanagh
 
PPTX
Your Data Nerd Friends Need You!
DataKitchen
 
PPTX
Architecting for Big Data: Trends, Tips, and Deployment Options
Caserta
 
PDF
What is the future of data strategy?
Denodo
 
PDF
Why Data Virtualization? An Introduction
Denodo
 
PDF
2022 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
Future of Data Strategy (ASEAN)
Denodo
 
Data and Application Modernization in the Age of the Cloud
redmondpulver
 
Self-Service Analytics with Guard Rails
Denodo
 
Washington DC DataOps Meetup -- Nov 2019
DataKitchen
 
Big Data Companies and Apache Software
Bob Marcus
 
Modernize your Infrastructure and Mobilize Your Data
Precisely
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
[DSC DACH 24] Ship data faster with dbt - Sean McIntyre
DataScienceConferenc1
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Inside Analysis
 
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Memoori
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Your Data Nerd Friends Need You!
DataKitchen
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Caserta
 
What is the future of data strategy?
Denodo
 
Why Data Virtualization? An Introduction
Denodo
 
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Future of Data Strategy (ASEAN)
Denodo
 
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
PDF
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
PDF
Exploring Levels of Data Literacy
DATAVERSITY
 
PDF
Make Data Work for You
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
PDF
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
PDF
Data Modeling Fundamentals
DATAVERSITY
 
PDF
Showing ROI for Your Analytic Project
DATAVERSITY
 
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
PDF
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
PDF
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
PDF
2023 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
Data Strategy Best Practices
DATAVERSITY
 
PDF
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
PDF
Data Management Best Practices
DATAVERSITY
 
PDF
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
PDF
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
DATAVERSITY
 
PDF
Empowering the Data Driven Business with Modern Business Intelligence
DATAVERSITY
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
DATAVERSITY
 
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
DATAVERSITY
 
Empowering the Data Driven Business with Modern Business Intelligence
DATAVERSITY
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 

DataOps - The Foundation for Your Agile Data Architecture

  • 1. Four Data Architecture Mega- Patterns for Agility
  • 2. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Agenda Four Data Architecture Mega-Patterns for Agility 1. DataOps 2. Data Fabric 3. Data Mesh 4. Functional Data Engineering An Example that Combines all Four Patterns Conclusion and More Information DataOps Data Fabric Data Mesh Functional Data Engineering
  • 3. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Our Focus Is The River Of Work Right In Front Of Us • The Model, • The Algorithm, • The Data Pipeline, • The Data Visualization, • The Governance, • The Data Itself What is my next task?
  • 4. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Next Task Focus Is Making Us Blind To Failure • The Model, • The Algorithm, • The Data Pipeline, • The Data Visualization, • The Governance, • The Data Itself Task Focus Not Working
  • 5. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Look Upstream At The Source Of The Problem • Develop • Deploy • Iterate • Monitor • Test • Collaborate How You Do It
  • 6. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. How? Focus On Four Key Upstream Processes Decrease The Cycle Time: Continuously Deploy Innovation Lower Error Rates: Increasing Customer Data Trust Improve Collaboration: Less Meetings & Bureaucracy Measure Your Team: And show everyone your success
  • 7. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataOps Aligns People, Processes, and Technology Rapid experimentation and innovation enables faster delivery Low error rates Collaboration across complex sets of people, technology, and environments Clear measurement and monitoring of results
  • 8. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Agenda What Problems Do We Need To Solve With Architecture for AI and Data Analytics? Four Data Architecture Mega-Patterns for Agility 1. DataOps 2. Data Fabric 3. Data Mesh 4. Functional Data Engineering An Example that Combines all Four Patterns Conclusion and More Information DataOps Data Fabric Data Mesh Functional Data Engineering
  • 9. Copyright 2021 by DataKitchen, Inc. All Rights Reserved.
  • 10. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Gartner Data Fabric “Data fabric focuses on composability, allowing users to build a flexible, agile, scalable architecture that will be able to supply data to humans or machine users. Data fabric is a design concept, not just a set of technology components. “
  • 11. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Fabric Toolchain Elements Store: Transform: SQL Code, ETL Govern: Catalog
  • 12. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Fabric Toolchain Elements Store: Transform: SQL Code, ETL Virtualize: layer Govern: Catalog Includes Data Virtualization in Reference Fabric Design Includes Data Streaming in Reference Fabric Design
  • 13. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Fabric: Beware Magic of ‘AI Inside’ Store: Transform: SQL Code, ETL Virtualize: layer Govern: Catalog AI AI AI AI Magic AI: Danger Will Robinson
  • 14. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Fabric: Beware Magic of ‘AI Inside’ Think of ‘AI Inside’ of Data Fabric like autonomous driving: • Level 1: Simple, keep your hands on wheel • Level 5: Cross Boston, in the snow, at night We are at Level 1 of AI in the Data Fabric
  • 15. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. AI + New Tools Agility
  • 16. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. People & Tools in a DataOps Architecture Agility AI + New Tools
  • 17. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Canonical ‘Factory’ Data Architecture / Fabric
  • 18. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataOps Functional Architecture Cloud/On-Prem Production Environment Test Dev Source Data Data Customers Raw Lake Data Engine -ering Refined Data Data Science Data Viz. Data Govern -ance Orchestrate, Monitor, Test Orchestrate, Monitor, Test Orchestrate, Monitor, Test DataOps Platform Storage &Version Control History & Metadat a Auth & Permissions Envron- ment Secrets DataOps Metrics & Reports Automated Deployment Environment Creation and Management DataOps Team Second Cloud/On- Prem Data Center
  • 19. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataOps Physical Architecture Cloud/On-Prem Data Center Production Environment Test Dev Source Data Data Customers Agent Agent Agent DataOps Platform Storage Metadat a Auth Secrets Metrics Raw Lake Data Engine -ering Refined Data Data Science Data Viz. Data Govern -ance Second Cloud/On- Prem Data Center Agent DataOps Team
  • 20. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Cloud/ON-Prem #1 Production Environment Test Dev Agent Agent Agent DataOps Team DataOps Pipeline Cloud/On Prem #2 Production Environment Dev Agent Agent DataOps Pipeline DataOps Platform Storage &Version Control History & Metadat a Auth & Permissions Envron- ment Secrets DataOps Metrics & Reports DataOps Spans Environments
  • 21. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Fabric – A New Fashion Trend? • It's Hot Stuff: Gartner View, Forrester View. Top 10 downloaded report 2020, top inquiry • What is a data fabric?: • All the stuff you do with centralized data infrastructure: ETL, DB, governance, store, lake, warehouse, stream/batch transformation. • Plus, some fancy new stuff 1. AI component - magic pixie dust of self-driving data 2. Data virtualization/semantic layer • However, it is missing other parts of the data value chain: models, visualizations, self service. It’s more ‘hub’ than ‘spoke’ • Why? Moniker that covers the latest trends in data management. • Caveat: The goal of implementing a data fabric is agility - agility is a second-order effect from better tools. The primary driver is people & process following DataOps.
  • 22. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Agenda Four Data Architecture Mega-Patterns for Agility 1. DataOps 2. Data Fabric 3. Data Mesh 4. Functional Data Engineering An Example that Combines all Four Patterns Conclusion and More Information DataOps Data Fabric Data Mesh Functional Data Engineering
  • 23. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Mesh 101 Why Data Mesh? • Centralized Systems Fail • Skill-based roles are unable to respond to rapid customer needs • Data domain knowledge matters • Universal, one size fits all patterns fail • General Data Analytic Project Failure • Inspired by domain driven design (DDD) in software The main idea is to take a best practice from developing software & apply them to data analytics. (Sound familiar?)
  • 24. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. The Human Side of a Data Mesh: Main Idea • The organization structure builds walls & barriers to the changes • When you make a change, you need to update each component & coordinate between several different teams The organization creates walls & changes need to cross the traditional organizational boundaries
  • 25. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. No, Data Engineers Are Not Perfectly Fungible Data Mesh = Organization Mesh The use of domain-driven / data mesh design as the primary means: 1. Assignment of full end-to-end ownership of a domain to one cross-functional team that gets the necessary support to fulfil that responsibility. 2. Structure data 3. Build composable systems Data Organization Keys Let the small team continually own the data set & not move for project to project is key ‘You own the product’ thinking provides the right incentives between the producers & consumers Source: thoughtworks.com/insights/blog/data-mesh-its-not-about-tech-its-about-ownership-and-communication
  • 26. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. ● Take the ideas of microservices where a team owns the dev, test, deploy & running of the microservice (5-9 people) ● Organize around the domain, not the technology ● The Operational & Data products are created by the same team ● Domain data as a product - domain data teams must consider their data assets & artifacts as their products & others as their customers ● Data Engineers must live, work & understand a finite number of data sets to really add value The Human Side of a Data Mesh: Main Idea The organization creates walls & changes need to cross the traditional organizational boundaries
  • 27. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. What Data is in a Domain? Domains Aligned with Sources / Types of Data • ‘Mastered’ Data: • Entities of business / subject areas • Customers, products, etc. • ‘Sources’ of Data: • Business reality: facts on the ground • Weblogs, user interaction history Domains Aligned with Consumption of Data • Integrated Data / Ready for Consumption • Facts / Dimensions / Star Schemas • Aggregated Views • Product View • Never Done, Always Improving • Customer Usage Fucus
  • 28. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. What are the Domain’s Components? 1. Data 2. Artifacts created from that data: models, views, reports, dashboards, etc. 3. Code that acts upon that data: pipelines, toolchains, etc. 4. Team used to create/update/run that Domain 5. Metadata: catalogs, lineage, test results, processing history, etc. Data Domain 1
  • 29. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Must Be Composable & Controllable Data Domain 1 Data Domain 2 Data Domain 3
  • 30. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Interfaces Data Domain The Where: How to find & access data securely; e.g., DB connect string The What: Description of the data; e.g., data catalog URL The When: Processing Results, Timing, Test Results, Status, etc. The How: Steps, Code/Config, toolchain & processing pipeline The With: Raw Data (or other Data Domain), hopefully immutable
  • 31. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Interfaces as URLs https://cloud.datakitchen.io/#/recipes/dc/Production/agile-analytic-ops/variations/prod-env-DevSprint-build-now https://cloud.datakitchen.io/#/orders/dc/Production/runs/60e82aa8-2518-11eb-8653-c2e92ba8ebec jdbc:redshift://endpoint:port/database https://dkimplementation.atlassian.net/wiki/spaces/ DC/pages/9306114/Dimension+Tables Data Domain
  • 32. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. What Do You Want Out of a Domain? A series of independent domains of data that are: 1. Trusted 2. Usable by the teams’ customer 3. Discoverable / Findable 4. Understandable & well-described 5. Secure & permissioned 6. URL/API Driven: & can inter-operate with other domains 7. Have ‘single throat to choke’ for the customer to easily: • Report problem & get updates on fixes • Ask for new insights / improvements & get them into production quickly
  • 33. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Data Mesh Change in Focus 1. Domains & the grouping of your work into small teams & partitions over ‘one platform to rule them all’ 2. What services you are providing you customer, rather than what data you are loading 3. Discovering & using over extracting & loading 4. Decentralization & the freedom to innovate over central control 5. Ecosystem of data products linked together over a centralized lake / warehouse
  • 34. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. An Example of Domains US Commercial Pharma Domains • NPP (Non-Personal Promotion): emails, web site visits, even radio ads • Physician: doctor (& other outlets) sales, claims data, anonymized patient data • Payer: Payer/Plan, rebates, formulary Launch: NPP Domain Growth: Physician Domain Mature: Payer Domain Commercial Pharma Analytics
  • 35. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. What About the Data? What about the data in each domain? • Each domain has separate data sources • Overlapping entities (e.g., physicians) exist in each domain • Each domain has different cycle times of product (i.e., daily, weekly, hourly, etc.) • Each data domain has its unique characteristics. • For instance, subnational physician data from IQVIA - purchased by pharma companies - may not 1:1 match claims data, which may not match payer data. This is due to data supplier issues & timing projection algorithms. Sub-national Weekly data Sub-national Payer Data Sub-national Institutional (DDD) Data National Prescription Audit Data Sales Force Alignment Data Longitudinal Patient Data Sub-national Profit and Loss Data Sub-national Claims and Co-pay Data Payer and Plan Formulary Data Census Data Stocking Data Source of Business AMA Data Retail OTC Data Buy and Bill Data Field Calls and Promotional Activity Data Rep Expenses and Vacancy Data Hotline Verification Data Contract and Payer Rebates Data Veeva CRM Data ERP Data NPP Data Forecast Data Primary Research Data
  • 36. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Pharma Sales & Marketing Teams NPP Domain Marketing & Sales Team One part of the pharma brand team focused on ads, digital & other non-personal promotions. This team matters most pre-launch & during the growth phase of a product Physician Domain Marketing & Sales Team Another part of the pharma team focused on in-person sales. Those are the good-looking people you see in doctors waiting rooms. Sales calls, samples, doctor visits, messages, call alignments, etc. This team matters the most during the first years of a pharma launch. Payer Domain Marketing & Sales Team A third part is focused on Payer Marketing. This part is - in essence - controlling the price of a pharmaceutical product due to the rebate given to any payer. They are concerned about the rebate contract, being on formulary & tier & copays. Payer Marketing matters more during the 'mature' phase of a pharma product lifecycle.
  • 37. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Layers 1. Mastering & small files foundation files are a domain layer There are 1M physicians in the US, but the company master of physicians is only 40K. This work is done by separate teams working independently. 2. Of course, the main data warehouse is a domain layer There are facts & dimensions, along with multiple tables used for specific analysts needed. 3. Self/Service & Data Science are a domain layers They can keep their owned cached data sets (e.g., tableau extract) or have their own small data sets that they mix with the central data in Alteryx (or other) tools. Data Science teams have their own segmentation models dependent on specific views or extracts of data. Mastered Data Sets (IT) Integrated Data Sets (Data Engineers) Self Service Tools (Analyst)
  • 38. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Layers Sub-national Weekly data Sub-national Payer Data Sub-national Institutional (DDD) Data National Prescription Audit Data Sales Force Alignment Data Longitudinal Patient Data Sub-national Profit and Loss Data Sub-national Claims and Co-pay Data Payer and Plan Formulary Data Census Data Stocking Data Source of Business AMA Data Retail OTC Data Buy and Bill Data Field Calls and Promotional Activity Data Rep Expenses and Vacancy Data Hotline Verification Data Contract and Payer Rebates Data Veeva CRM Data ERP Data NPP Data Forecast Data Primary Research Data Mastering Domain: Physician MDM Mastering Domain: Target Lists, Product Market Baskets Brand Team Reporting Domain Field Sales Reporting Domain Raw, Sourced Data (Various) Mastered Data Sets (IT) Integrated Data Sets (Data Engineers) Self Service Tools (Analyst) Business Customer
  • 39. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Layers Processing Relationships Sub-national Weekly data Sub-national Payer Data Sub-national Institutional (DDD) Data National Prescription Audit Data Sales Force Alignment Data Longitudinal Patient Data Sub-national Profit and Loss Data Sub-national Claims and Co-pay Data Payer and Plan Formulary Data Census Data Stocking Data Source of Business AMA Data Retail OTC Data Buy and Bill Data Field Calls and Promotional Activity Data Rep Expenses and Vacancy Data Hotline Verification Data Contract and Payer Rebates Data Veeva CRM Data ERP Data NPP Data Forecast Data Primary Research Data Mastering Domain: Physician MDM Mastering Domain: Target Lists, Product Market Baskets Brand Team Reporting Domain Field Sales Reporting Domain Raw, Sourced Data (Various) Mastered Data Sets (IT) Integrated Data Sets (Data Engineers) Self Service Tools (Analyst) Business Customer
  • 40. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Layers Processing Steps Sub-national Weekly data Sub-national Payer Data Sub-national Institutional (DDD) Data National Prescription Audit Data Sales Force Alignment Data Longitudinal Patient Data Sub-national Profit and Loss Data Sub-national Claims and Co-pay Data Payer and Plan Formulary Data Census Data Stocking Data Source of Business AMA Data Retail OTC Data Buy and Bill Data Field Calls and Promotional Activity Data Rep Expenses and Vacancy Data Hotline Verification Data Contract and Payer Rebates Data Veeva CRM Data ERP Data NPP Data Forecast Data Primary Research Data Raw, Sourced Data (Various) Mastered Data Sets (IT) Integrated Data Sets (Data Engineers) Self Service Tools (Analyst) Business Customer Mastering Domain: Physician MDM Brand Team Reporting Domain
  • 41. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Benefits of Approach • Yes, you can do all these four Data Architecture Mega- Patterns for Agility! • Benefits • Support over $10 Billion in sales • Integrated 100s of data sets • Very, very few errors or missed SLAs • > 50,000 automated tests • > 100 of schema/data changes per week • Staff of seven data and DataOps engineers • Low total yearly costs hardware/hosting/software/staffing • DataKitchen software enables those four patterns: Recipes, Tests, Kitchens and Especially Ingredients can handle all the needs
  • 42. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Agenda Four Data Architecture Mega-Patterns for Agility 1. DataOps 2. Data Fabric 3. Data Mesh 4. Functional Data Engineering An Example that Combines all Four Patterns Conclusion and More Information DataOps Data Fabric Data Mesh Functional Data Engineering
  • 43. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Built With Functional Programming • Start with immutable (never changing) data • Pure functions (you put some data in & get some data out) • Idempotency (you can run it over again & get the same thing) • No side effects
  • 44. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Functional Approach Benefits Reproducibility • Foundational to the scientific method and data science / AI • Critical from a legal standpoint and sanity standpoint Complexity Reduction Cloud Native • Storage and compute are cheap Faster Time To Value
  • 45. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Functional Data Mesh Systems Production Data Analytic Customers Production Team Yeah! All my tests & monitors are passing! Happy Customers! Think of all your data & analytic work as a “Big Function” in domain • In that function are your data & AI toolchain • Everybody works that function (whether they know it or not!) • Re-running a task for the same date should always produce same output • Data can be repaired by rerunning the new code • A ‘big red/green light’ on the system telling you everything is OK Data Domain
  • 46. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Functional Data Systems Are Easier to Test & Deploy Yeah! All my tests & monitors are passing! I did not break any code! I can safely push to production! A safe controlled process Production Data Production Team Data Domain Test Data Development Team Data Domain Just flip the DNS entry for the production URL!
  • 47. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Agenda Four Data Architecture Mega-Patterns for Agility 1. DataOps 2. Data Fabric 3. Data Mesh 4. Functional Data Engineering An Example that Combines all Four Patterns Why DataKitchen supports these four patterns easily! DataOps Data Fabric Data Mesh Functional Data Engineering
  • 48. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Layers Processing Relationships How do we update the data? • Each Domain layer its own domain update processing • Each layer has their own toolchain (i.e., SQL, Python, Informatica, etc.) • Each layer has a series of sub-steps (i.e., a ‘DAG’) • Each layer wants to know if the build is completed, the test applied & if the data is data is correct What causes the update of each domain? • Time / Schedule • Order of operations, a meta-orchestrated coupling of each Domain, one part may need to be done before the other or after. • Event-orchestrated coupling. When new data arrives, kick off a change. You Need a ‘Master DAG’ to run them all
  • 49. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Inter-Domain Communication Links Field Sales Reporting Domain Inter-Domain Communication Question / Steps Asked Domain Query “When was the last time you were updated?” Successful or failure? Warnings? Domain Query “Is the data or artifacts in your domain good? Can you prove it with some test results?” Process Linkage “Ok, you start. I am done.” Process Linkage “Ok, you start. I am done & here are a bunch of parameters you need to keep going.” Event Linkage “Here is an event: e.g., processing completed, error, warnings, etc.” Data Linkage “We share a common table (e.g., a dimension table) in our domain.” Development Linkage “Can I re-create your domain in development?” Can I see the code you used to create it?” “Can I modify that code in development?” “Is there a path to production?” { … }
  • 50. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataKitchen Supported Inter-Domain Communication Links Field Sales Reporting Domain Inter Domain Communication DataKitchen Support Domain Query YES Domain Query YES Process Linkage YES Process Linkage YES Event Linkage YES Data Linkage NO Development Linkage YES { … }
  • 51. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Domain Development Process The development process is essential. • Code changes or new data sets may affect downstream parts of the mesh. • DataKitchen encapsules the development & production environments Key Questions • How does a developer change one part & not break things? • How do you allow local change to a domain & global governance & control? Mastering Domain: Physician MDM Brand Team Reporting Domain Mastering Domain: Physician MDM Brand Team Reporting Domain Production Domains Development of Domains How do I change this part & not break things?
  • 52. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataKitchen Software's Role (Recipes) DataKitchen DataOps Capability Intelligent, test-informed, system-wide production orchestration (meta-orchestration) What workflow tools like Airflow, Control- M, or Azure Data Factory do not have • Integrated Production Testing & Monitoring • A set of connectors to the complex chain of data engineering, science, analytics, self- service, governance & database tools. • DataKitchen Recipes Meta-Orchestration or a ‘DAG of DAGs’ Mastering Domain: Physician MDM Brand Team Reporting Domain DataKitchen Recipe
  • 53. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataKitchen Domain Interfaces As URLs https://cloud.datakitchen.io/#/recipes/dc/Production/agile- analytic-ops/variations/prod-env-DevSprint-build-now Data Domain The When: DataKitchen OrderRun information The How: DataKitchen Recipe https://cloud.datakitchen.io/#/orders/dc/ Production/runs/60e82aa8-2518-11eb- 8653-c2e92ba8ebec
  • 54. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. DataKitchen Ingredients Allow Composition • DataKitchen Ingredients allow reusable components that can be incorporated into other processing • Each domain can change independently, with a centralized process to make sure the entire system is correct • While DataKitchen Kitchens lets people work independently, Ingredients let people work dependently: • Recipes can reuse the data or artifacts that other Recipe Variations produce • Recipes need to incorporate other Recipes Variations when they run
  • 55. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Conclusion Data Fabric, Data Mesh, and Functional Data engineering are exciting new paradigms However, the DataOps part of is of paramount importance! • The lineages & composition between domains are important • Managing central process control & governance with local domain independence is very important DataKitchen Features (e.g., Recipes, Tests, Kitchens & Ingredients) can handle all the needs of the DataOps part of the mesh
  • 56. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Accelerate Theses Patterns With DataKitchen Software DataKitchen DataOps Software Platform that delivers new business insights by enabling the development and deployment of innovative, high quality data analytic pipelines. Rapidly DataOps Data Fabric Data Mesh Functional Data Engineering
  • 57. Copyright 2021 by DataKitchen, Inc. All Rights Reserved. Learn More ! Sign The DataOps Manifesto: http://dataopsmanifesto.org Free DataOps Cookbook: https://datakitchen.io/the-dataops-cookbook/ Free DataOps Transformation Book https://datakitchen.io/recipes-for-dataops-success-guide-to-dataops-transformation/ DataOps Data Fabric Data Mesh Functional Data Engineering