SlideShare a Scribd company logo
2
Data Quality
Automatic enforcement in real-time with machine learning.
Max Martynov, CTO
Introducing Grid Dynamics technology services
Digital transformation Big data, real time analytics, ML & AI
Microservices replatforming DevOps & cloud enablement
Open Source Cloud-ready Scalable Automated
12 years of
experience in digital
transformation.
"Implementing data quality automation with open source stack" - Max Martynov, CTO of Grid Dynamics
9.8
9.3 9.6 9.4
10.1
17.5
16.9
8.9 9.6 9.2
10.3
9.8
5.1
4.1 3.9 4 3.9
4.5
7.5 7.1
3.8 4.1 4 4.5
4.2
2.3
0
5
10
15
20
7/1/19 7/2/19 7/3/19 7/4/19 7/5/19 7/6/19 7/7/19 7/8/19 7/9/19 7/10/19 7/11/19 7/12/19 7/13/19 7/14/19
Retailer X, Daily Sales – Executive Summary
Revenue, $M Gross Profit, $M
weekend weekend
DBDBDB
EDW
Data Lake
EDW
DBDBDB FileFileFile
Cloud
Data Lake
EDW
DBDBDB FileFileFile MQ CloudCloud
AppAppAPI
Cloud
EDW
DBDBDB FileFileFile MQ CloudCloud
AppAppAPI
AppAppApp
Data Lake
1 0 1 1
0 1 1 0
1 0 1 0
1 0 0 1
1. Trust is hard to build and easy to lose.
2. Distrust in data slows down decisions.
3. Slow decisions prevent agility.
Data corruption reasons
1. Code 2. Data Sources 3. Infrastructure
Test environment
Input
Actual
Expected
ETL
code
compareTest suite
run test
Traditional approach to testing
Production data quality goals
Detect
data corruption
Prevent
it from spreading
Alert
support team
Production data lake
Data quality enforcement in production
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job
Production data lake
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job data quality job
Data quality enforcement in production
Production data lake
DBDBDB
FileFileFile
MQ
AppAppAPI
data processing job data quality job
alert &
stop pipeline
alert &
continue
x
Data quality enforcement in production
Data Lake
Data
source
data
1. Compare with
SoR
2. Validate
business rules
3. Data profiling and
anomaly detection
Main data processing pipeline
confidence
data
confidence
1. Control divergence from SoR
Data Lake
Data
source
Imported
dataset
Compare data in
SoR and data lake
1. Validate correctness of
import.
2. Prevent stale data.
3. Prevent corruption
accumulation in stream
processing use cases.
4. Check data before it gets in
the lake.
2. Validate business rules
Data Lake
Dataset
Check for nulls and
data ranges
1. Enforce schema.
2. Check for nulls.
3. Validate data ranges.
4. Specify and enforce
data invariants.
3. Anomaly detection
Data Lake
Dataset
1. Fully automatic data
quality enforcement.
2. Collect data profile,
metrics and statistics.
3. Train ML models.
4. Find anomalies in data.
Data profiling and
anomaly detection
Catalog
Inventory
Orders
Data Lake Data Quality
Reporting &
Alerting
Data Profile
Demo setup
23
Live demo
Anomaly detection example
Anomaly detection example: zooming in
Capabilities for enterprise data quality and governance
 Enables widespread adoption.
 Enforces enterprise-level controls
and data usage policies.
 Increases consistency and
confidence in decision making.
 Decreases the risk of regulatory
fines.
 Improves data security.
 Facilitates accountability for
information quality.
 Minimizes or eliminates efforts
duplication.
Data Governance Platform
Metadata
Management
Full-text
Search
Data Quality
Status
Schema /
Summary
Data
Profiling
Mapping
to
Glossary
Change
Log
Dependency
Detection
Consumers Flow Visualization
Glossary Portal
Knowledge
Base
Fields
Fingerprinting
Data Catalog
Dataset Profile
Lineage Dashboard
Data
Glossary
Data Quality
Access and
Security
Business
Rules
Anomaly
Detection
Alerting
Access Rules
Compliance
Policies
Policy Engine
www.griddynamics.com
Thank you!
28
Demo screenshots
Dataproc cluster
Aifrlow pipeline
Griffin measures
Anomaly detection: normal data
Anomaly detection: anomaly
Anomaly detection: return to normal
Anomaly detection: historical view
Anomaly detection (counts): anomaly & return
Uniqueness: normal data
Uniqueness: anomaly
Uniqueness: return to normal
Uniqueness: historical view
Nulls: normal data
Nulls: anomaly
Nulls: return to normal
Nulls: historical view
Ranges: historical view
Completeness: anomalies

More Related Content

What's hot (20)

PDF
It’s Not Enough to Just Collect Data
Teradata
 
PPTX
Bigdata analysis in supply chain managment
Kushal Shah
 
PPTX
Big data analytics in banking sector
Anil Rana
 
PPTX
Using Big Data in Finance by Jonah Engler
Jonah Engler
 
PDF
Turning Big Data to Business Advantage
Teradata Aster
 
PPTX
HIPAA Compliant Salesforce Health Cloud – Why Healthcare Organizations Must C...
Algoworks Inc
 
PDF
Exponentially influencing business outcomes with Big Data Analytics
Saama
 
PDF
Top industry use cases for streaming analytics
IBM Analytics
 
PDF
Big data & analytics for banking new york lars hamberg
Lars Hamberg
 
PPTX
IT Solutions for Banking and Financial Services
ScienceSoft
 
PDF
Pres_Big Data for Finance_vsaini
Vandana Saini (Vinnie)
 
PDF
How to optimize the supply chain with ai
GlobalTechCouncil
 
PDF
Top Ten Big Data Trends in Finance
PromptCloud
 
PDF
Hacked: Threats, Trends and the Power of Connected Data
Neo4j
 
PPT
Big datacamp june14_alex_liu
Data Con LA
 
PPTX
Artificial Intelligence in Life Sciences: Friend or Foe? by Luke Stewart
Saama
 
PPT
Big data it’s impact on the finance function
Mike Davis
 
PDF
Top 20 artificial intelligence companies to watch out in 2022
Kavika Roy
 
PPTX
Creating $100 million from Big Data Analytics in Banking
Guy Pearce
 
PPT
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Michael Segel
 
It’s Not Enough to Just Collect Data
Teradata
 
Bigdata analysis in supply chain managment
Kushal Shah
 
Big data analytics in banking sector
Anil Rana
 
Using Big Data in Finance by Jonah Engler
Jonah Engler
 
Turning Big Data to Business Advantage
Teradata Aster
 
HIPAA Compliant Salesforce Health Cloud – Why Healthcare Organizations Must C...
Algoworks Inc
 
Exponentially influencing business outcomes with Big Data Analytics
Saama
 
Top industry use cases for streaming analytics
IBM Analytics
 
Big data & analytics for banking new york lars hamberg
Lars Hamberg
 
IT Solutions for Banking and Financial Services
ScienceSoft
 
Pres_Big Data for Finance_vsaini
Vandana Saini (Vinnie)
 
How to optimize the supply chain with ai
GlobalTechCouncil
 
Top Ten Big Data Trends in Finance
PromptCloud
 
Hacked: Threats, Trends and the Power of Connected Data
Neo4j
 
Big datacamp june14_alex_liu
Data Con LA
 
Artificial Intelligence in Life Sciences: Friend or Foe? by Luke Stewart
Saama
 
Big data it’s impact on the finance function
Mike Davis
 
Top 20 artificial intelligence companies to watch out in 2022
Kavika Roy
 
Creating $100 million from Big Data Analytics in Banking
Guy Pearce
 
Dubai Big Data in Finance, Intro to Hadoop 2-Apr-14 - Michael Segel
Michael Segel
 

Similar to "Implementing data quality automation with open source stack" - Max Martynov, CTO of Grid Dynamics (20)

PPTX
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Grid Dynamics
 
PPT
Artificial Intelligence Expert Session Webinar
ibi
 
PPTX
Deliveinrg explainable AI
Gary Allemann
 
PDF
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
arifulislam946965
 
PPTX
Transform Your Downstream Cloud Analytics with Data Quality 
Precisely
 
PPTX
Automated Data Quality Assurance with Machine Learning and Autoencoders
Institute of Contemporary Sciences
 
PPTX
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Balvinder Hira
 
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
PDF
Data Profiling: The First Step to Big Data Quality
Precisely
 
PPT
Defence IT 2012 - Data Quality and Financial Services - Solvency II
David Twaddell
 
PDF
593 Managing Enterprise Data Quality Using SAP Information Steward
Vinny (Gurvinder) Ahuja
 
PPTX
Kickstart a Data Quality Strategy to Build Trust in Data
Precisely
 
PDF
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
PPTX
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Precisely
 
PDF
Applying Data Quality Best Practices at Big Data Scale
Precisely
 
PDF
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Precisely
 
PDF
Data quality - The True Big Data Challenge
Stefan Kühn
 
PPTX
Data Quality with AI
Vera Ekimenko
 
PPTX
Data Quality+Security
Michael Küsters
 
PDF
Data Quality Success Stories
DATAVERSITY
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Grid Dynamics
 
Artificial Intelligence Expert Session Webinar
ibi
 
Deliveinrg explainable AI
Gary Allemann
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
arifulislam946965
 
Transform Your Downstream Cloud Analytics with Data Quality 
Precisely
 
Automated Data Quality Assurance with Machine Learning and Autoencoders
Institute of Contemporary Sciences
 
Data Quality_ the holy grail for a Data Fluent Organization.pptx
Balvinder Hira
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
Data Profiling: The First Step to Big Data Quality
Precisely
 
Defence IT 2012 - Data Quality and Financial Services - Solvency II
David Twaddell
 
593 Managing Enterprise Data Quality Using SAP Information Steward
Vinny (Gurvinder) Ahuja
 
Kickstart a Data Quality Strategy to Build Trust in Data
Precisely
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Precisely
 
Applying Data Quality Best Practices at Big Data Scale
Precisely
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Precisely
 
Data quality - The True Big Data Challenge
Stefan Kühn
 
Data Quality with AI
Vera Ekimenko
 
Data Quality+Security
Michael Küsters
 
Data Quality Success Stories
DATAVERSITY
 
Ad

More from Grid Dynamics (20)

PPTX
Are you keeping up with your customer
Grid Dynamics
 
PDF
"How to build cool & useful voice commerce applications (such as devices like...
Grid Dynamics
 
PPTX
"Challenges for AI in Healthcare" - Peter Graven Ph.D
Grid Dynamics
 
PPTX
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 
PPTX
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Grid Dynamics
 
PDF
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Grid Dynamics
 
PPTX
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
Grid Dynamics
 
PPTX
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
Grid Dynamics
 
PDF
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
Grid Dynamics
 
PPTX
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Grid Dynamics
 
PDF
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Grid Dynamics
 
PPTX
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
Grid Dynamics
 
PPTX
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Grid Dynamics
 
PDF
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Grid Dynamics
 
PPTX
Best practices for enterprise-grade microservices implementations with Google...
Grid Dynamics
 
PPTX
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Grid Dynamics
 
PDF
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Grid Dynamics
 
PDF
Customer intelligence: a machine learning approach- Dynamic talks Dallas Q2
Grid Dynamics
 
PDF
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Grid Dynamics
 
PDF
Customer intelligence: a machine learning approach 5/21/2019
Grid Dynamics
 
Are you keeping up with your customer
Grid Dynamics
 
"How to build cool & useful voice commerce applications (such as devices like...
Grid Dynamics
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
Grid Dynamics
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Grid Dynamics
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Grid Dynamics
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
Grid Dynamics
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
Grid Dynamics
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
Grid Dynamics
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Grid Dynamics
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Grid Dynamics
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
Grid Dynamics
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Grid Dynamics
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Grid Dynamics
 
Best practices for enterprise-grade microservices implementations with Google...
Grid Dynamics
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Grid Dynamics
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Grid Dynamics
 
Customer intelligence: a machine learning approach- Dynamic talks Dallas Q2
Grid Dynamics
 
Improving Customer Experience via Experimentation Dynamic Talks: San Francisc...
Grid Dynamics
 
Customer intelligence: a machine learning approach 5/21/2019
Grid Dynamics
 
Ad

Recently uploaded (20)

PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
July Patch Tuesday
Ivanti
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 

"Implementing data quality automation with open source stack" - Max Martynov, CTO of Grid Dynamics