SlideShare a Scribd company logo
2
Data Quality
Automatic enforcement in real-time with machine learning.
Max Martynov, VP of Technology
Introducing Grid Dynamics technology services
Digital transformation Big data, real time analytics, ML & AI
Microservices replatforming DevOps & cloud enablement
Open Source Cloud-ready Scalable Automated
12 years of
experience in digital
transformation.
Dynamic Talks: "Implementing data quality automation with open source stack" -Max Martynov
9.8
9.3 9.6 9.4
10.1
17.5
16.9
8.9 9.6 9.2
10.3
9.8
5.1
4.1 3.9 4 3.9
4.5
7.5 7.1
3.8 4.1 4 4.5
4.2
2.3
0
5
10
15
20
7/1/19 7/2/19 7/3/19 7/4/19 7/5/19 7/6/19 7/7/19 7/8/19 7/9/19 7/10/19 7/11/19 7/12/19 7/13/19 7/14/19
Retailer X, Daily Sales – Executive Summary
Revenue, $M Gross Profit, $M
weekend weekend
DBDBDB
EDW
Data Lake
EDW
DBDBDB FileFileFile
Cloud
Data Lake
EDW
DBDBDB FileFileFile MQ CloudCloud
AppAppAPI
Cloud
EDW
DBDBDB FileFileFile MQ CloudCloud
AppAppAPI
AppAppApp
Data Lake
1 0 1 1
0 1 1 0
1 0 1 0
1 0 0 1
1. Trust is hard to build and easy to lose.
2. Distrust in data slows down decisions.
3. Slow decisions prevent agility.
Data corruption reasons
1. Code 2. Data Sources 3. Infrastructure
Test environment
Input
Actual
Expected
ETL
code
compareTest suite
run test
Traditional approach to testing
Data quality goals
Detect
data corruption
Prevent
it from spreading
Alert
support team
Production data lake
Real-time data quality enforcement
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job
Production data lake
Real-time data quality enforcement
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job data quality job
Production data lake
Real-time data quality enforcement
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job data quality job
alert &
stop pipeline
alert &
continue
x
Data Lake
Data
source
data
1. Compare with
SoR
2. Validate
business rules
3. Data profiling and
anomaly detection
Main data processing pipeline
confidence
data
confidence
1. Control divergence from SoR
Data Lake
Data
source
Imported
dataset
Compare data in
SoR and data lake
1. Validate correctness of
import.
2. Prevent stale data.
3. Prevent corruption
accumulation in stream
processing use cases.
4. Check data before it gets in
the lake.
2. Validate business rules
Data Lake
Dataset
Check for nulls and
data ranges
1. Enforce schema.
2. Check for nulls.
3. Validate data ranges.
4. Specify and enforce
data invariants.
3. Anomaly detection
Data Lake
Dataset
1. Fully automatic data
quality enforcement.
2. Collect data profile,
metrics and statistics.
3. Train ML models.
4. Find anomalies in data.
Data profiling and
anomaly detection
Catalog
Inventory
Orders
Data Lake Data Quality
Reporting &
Alerting
Data Profile
Demo setup
23
Live demo
Anomaly detection example
Anomaly detection example: zooming in
Capabilities for enterprise data quality and governance
 Enables widespread adoption.
 Enforces enterprise-level controls
and data usage policies.
 Increases consistency and
confidence in decision making.
 Decreases the risk of regulatory
fines.
 Improves data security.
 Facilitates accountability for
information quality.
 Minimizes or eliminates efforts
duplication.
Data Governance Platform
Metadata
Management
Full-text
Search
Data Quality
Status
Schema /
Summary
Data
Profiling
Mapping
to
Glossary
Change
Log
Dependency
Detection
Consumers Flow Visualization
Glossary Portal
Knowledge
Base
Fields
Fingerprinting
Data Catalog
Dataset Profile
Lineage Dashboard
Data
Glossary
Data Quality
Access and
Security
Business
Rules
Anomaly
Detection
Alerting
Access Rules
Compliance
Policies
Policy Engine
www.griddynamics.com
Thank you!
28
Demo screenshots
Dataproc cluster
Aifrlow pipeline
Griffin measures
Anomaly detection: normal data
Anomaly detection: anomaly
Anomaly detection: return to normal
Anomaly detection: historical view
Anomaly detection (counts): anomaly & return
Uniqueness: normal data
Uniqueness: anomaly
Uniqueness: return to normal
Uniqueness: historical view
Nulls: normal data
Nulls: anomaly
Nulls: return to normal
Nulls: historical view
Ranges: historical view
Completeness: anomalies

More Related Content

What's hot (20)

PPTX
Data Quality
Vijaya K
 
PDF
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Burak S. Arikan
 
PPT
The data quality challenge
Lenia Miltiadous
 
ODP
Data quality overview
Alex Meadows
 
PPT
Data quality and bi
jeffd00
 
PPT
Data Quality Rules introduction
datatovalue
 
PPTX
Enterprise Analytics: Serving Big Data Projects for Healthcare
DATA360US
 
PDF
Why You Need to Govern Big Data
IBM Analytics
 
PPTX
Unlocking Business Value Using Data
Splunk
 
PDF
( Big ) Data Management - Data Quality - Global concepts in 5 slides
Nicolas Sarramagna
 
PPT
Data Quality Integration (ETL) Open Source
Stratebi
 
PPTX
COVID-19 - How to Improve Outcomes By Improving Data
303Computing
 
PPT
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
raj.kamal13
 
PDF
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
PDF
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
DATAVERSITY
 
PPTX
Тестирование данных с помощью Data Quality Services (MS SQL 12)
SQALab
 
PDF
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
PDF
Introduction to Data Governance
John Bao Vuu
 
PDF
Predictive analytics in decision management systems
Decision Management Solutions
 
PDF
Next generation Data Governance
Vladimiro Borsi
 
Data Quality
Vijaya K
 
Data Quality Management - Data Issue Management & Resolutionn / Practical App...
Burak S. Arikan
 
The data quality challenge
Lenia Miltiadous
 
Data quality overview
Alex Meadows
 
Data quality and bi
jeffd00
 
Data Quality Rules introduction
datatovalue
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
DATA360US
 
Why You Need to Govern Big Data
IBM Analytics
 
Unlocking Business Value Using Data
Splunk
 
( Big ) Data Management - Data Quality - Global concepts in 5 slides
Nicolas Sarramagna
 
Data Quality Integration (ETL) Open Source
Stratebi
 
COVID-19 - How to Improve Outcomes By Improving Data
303Computing
 
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
raj.kamal13
 
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
DATAVERSITY
 
Тестирование данных с помощью Data Quality Services (MS SQL 12)
SQALab
 
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
Introduction to Data Governance
John Bao Vuu
 
Predictive analytics in decision management systems
Decision Management Solutions
 
Next generation Data Governance
Vladimiro Borsi
 

Similar to Dynamic Talks: "Implementing data quality automation with open source stack" -Max Martynov (20)

PPTX
"Implementing data quality automation with open source stack" - Max Martynov,...
Grid Dynamics
 
PDF
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
 
PPT
Real-Time Analytics for Industries
Avadhoot Patwardhan
 
PPTX
Cloud & Big Data - Digital Transformation in Banking
Sutedjo Tjahjadi
 
PPTX
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Precisely
 
PDF
Big data and the data quality imperative
Trillium Software
 
PDF
What is Big Data - Edvicon
edviconin
 
PPTX
Fundamentals of Big Data
The Wisdom Daily
 
PDF
Machine Data Analytics
Nicolas Morales
 
PDF
Anomaly Detection in Telco Networks
Esbert Frédéric
 
PDF
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
PDF
Delivering Analytics at Scale with a Governed Data Lake
Jean-Michel Franco
 
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
PDF
The Bigger They Are The Harder They Fall
Trillium Software
 
PPTX
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
PDF
Li charles biometrics analytics & big data 122013a for release
Charles Li
 
PDF
Don't think DevOps think Compliant Database DevOps
Red Gate Software
 
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
MDS ap
 
PDF
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
AlexandreMacedo50
 
PDF
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
"Implementing data quality automation with open source stack" - Max Martynov,...
Grid Dynamics
 
CWIN17 India / Bigdata architecture yashowardhan sowale
Capgemini
 
Real-Time Analytics for Industries
Avadhoot Patwardhan
 
Cloud & Big Data - Digital Transformation in Banking
Sutedjo Tjahjadi
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Precisely
 
Big data and the data quality imperative
Trillium Software
 
What is Big Data - Edvicon
edviconin
 
Fundamentals of Big Data
The Wisdom Daily
 
Machine Data Analytics
Nicolas Morales
 
Anomaly Detection in Telco Networks
Esbert Frédéric
 
Transforming GE Healthcare with Data Platform Strategy
Databricks
 
Delivering Analytics at Scale with a Governed Data Lake
Jean-Michel Franco
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
The Bigger They Are The Harder They Fall
Trillium Software
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Li charles biometrics analytics & big data 122013a for release
Charles Li
 
Don't think DevOps think Compliant Database DevOps
Red Gate Software
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
MDS ap
 
2024-07-eb-big-book-of-data-engineering-3rd-edition.pdf
AlexandreMacedo50
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Ad

More from Grid Dynamics (20)

PPTX
Are you keeping up with your customer
Grid Dynamics
 
PDF
"How to build cool & useful voice commerce applications (such as devices like...
Grid Dynamics
 
PPTX
"Challenges for AI in Healthcare" - Peter Graven Ph.D
Grid Dynamics
 
PPTX
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 
PPTX
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Grid Dynamics
 
PDF
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Grid Dynamics
 
PDF
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Grid Dynamics
 
PPTX
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
Grid Dynamics
 
PPTX
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
Grid Dynamics
 
PDF
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
Grid Dynamics
 
PPTX
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Grid Dynamics
 
PDF
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Grid Dynamics
 
PPTX
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
Grid Dynamics
 
PPTX
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Grid Dynamics
 
PDF
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Grid Dynamics
 
PPTX
Best practices for enterprise-grade microservices implementations with Google...
Grid Dynamics
 
PPTX
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Grid Dynamics
 
PDF
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Grid Dynamics
 
PDF
Customer intelligence: a machine learning approach- Dynamic talks Dallas Q2
Grid Dynamics
 
PDF
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Grid Dynamics
 
Are you keeping up with your customer
Grid Dynamics
 
"How to build cool & useful voice commerce applications (such as devices like...
Grid Dynamics
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
Grid Dynamics
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Grid Dynamics
 
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Grid Dynamics
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Grid Dynamics
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
Grid Dynamics
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
Grid Dynamics
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
Grid Dynamics
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Grid Dynamics
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Grid Dynamics
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
Grid Dynamics
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Grid Dynamics
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Grid Dynamics
 
Best practices for enterprise-grade microservices implementations with Google...
Grid Dynamics
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Grid Dynamics
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Grid Dynamics
 
Customer intelligence: a machine learning approach- Dynamic talks Dallas Q2
Grid Dynamics
 
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Grid Dynamics
 
Ad

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Dynamic Talks: "Implementing data quality automation with open source stack" -Max Martynov