Data Quality and Data
Governance
Tuba Yaman Him
Why is Data Quality Important?
• Wrong Reports = Wrong Decisions
Why is Data Quality Important?
• Wrong Reports = Wrong Decisions
• Bad Reputation
Why is Data Quality Important?
• Wrong Reports = Wrong Decisions
• Bad Reputation
• Wasted Money
According to a recent study in the UK, US and France, 16% to 18% of
departmental budgets are eaten up because of poor data quality. The
research also indicates that 90% of surveyed companies admit that
inaccurate data – such as duplicate accounts, lost contacts and missed
sales opportunities – contributes to budget waste. On top of this, a
2009 Gartner study revealed that the average organization surveyed
loses $8.2 million annually because of poor data quality and that most
of this is due to lost productivity.
Modern Data Environment
Enterprise
Data
Warehouse
ERP Systems
(SAP/Oracle
etc)
CRM
(Salesforce,
Dynamics etc)
Manufacturing
Systems
Financial
Systems
Web
Applications
Documents
Marketing
Data
Mart
Sales
Data
Mart
Financial
Data
Mart
Modern Data Environment
Enterprise
Data
Warehouse
ERP Systems
(SAP/Oracle
etc)
CRM
(Salesforce,
Dynamics etc)
Manufacturing
Systems
Financial
Systems
Web
Applications
Documents
Marketing
Data
Mart
Sales
Data
Mart
Financial
Data
Mart
Dimensions Of Data Quality
IntegrityAccuracy
Currency Uniqueness Validity
Completeness
Dimensions Of Data Quality
• Do data objects accurately represent the “real-world” values?
• Is data correct?
• Example: Wrong sales amount, wrong contact information of a
customer etc.
Accuracy
Dimensions Of Data Quality
• Is there are any data missing important relationship linkages?
• Example: A product ownership without a valid owner/customer
record.
Integrity
Dimensions Of Data Quality
• Is any neccessary part of data is missing?
• Example:A customer record which has an address without city,
although city is mandatory.
Completeness
Dimensions Of Data Quality
• Is data up-to-date?
• Do we provide real-time data to our clients?
• Example: Customers with old address information. A bank which can
not provide the real-time amount of funds of its customers.
Currency
Dimensions Of Data Quality
• Are there multiple, unnecessary representations of the same data
objects within your data?
• Example: 3 different records which indicate the same customer.
Misspelling can be the reason.
CurrencyUniqueness
Dimensions Of Data Quality
• Do data values comply with the specified formats and rules?
• Example: A customer record whose DOB is dd/mm/1735. A customer
record with invalid postal code for UK like WC3T.
CurrencyValidity
Methods and Tools For Data Quality
Objective How to
Validation Regular Expressions
Data Merging For Duplicate Data SSIS Fuzzy Lookup, Fuzzy Grouping Packages
Integrity Proper ETL and ELT Process
Completeness Mandatory Fields Rules, ETL/ELT
Verification For Important Information Activation E-mails, Verification SMS
Prevent Typographical Error Autocomplete Tools
Minimizing Human Errors Employee Training
SSIS Fuzzy Matching
• Tuba Yaman Him
• Tuba.yamanhim@yopmail.com
• Deniz Apt.
• Ataşehir
• İstanbul
• Tuba Him
• Tuba.yamanhi@yopmail.com
• Deniz Apt.
• Ataşehir
• istanbul
• Tuğba Yaman Him
• Tuba.yamanhim@yopmail.com
• Deniz Apt.
• Ataşehir
• İstanbul
• Tuba Him
• Tuba.yamanhim@yopmail.com
• Deniz Apt.
• Ataşehir
• istanbul
Data Governance
Data governance is a set of policies, rules and standarts in order to
increase and maintain enterprise data quality.
It is about putting people in charge of fixing and preventing issues
with data so that the enterprise can become more efficient. Data
governance also describes an evolutionary process for a company,
altering the company’s way of thinking and setting up the processes
to handle information so that it may be utilized by the entire
organization. It’s about using technology when necessary in many
forms to help aid the process. When companies desire, or are
required, to gain control of their data, they empower their people,
set up processes and get help from technology to do it
Data Governance
Data Governance –Job Ads
USA1.885
India290
UK253
Canada113
Germany83
Singapore25
Switzerland24
Turkey 0
Data Governance Team Missions
Data Quality Scorecard
Objective Action Plan KPI Target Jul.2016 Aug.2016 Sep.2016
Decrease
Duplicates
A Merging flow
will be
implemented
Number of
duplicate
records in CDB
0 11.276 3.500 200
Increase the
Correctness of
email info
Verification
process will be
implemented
Number of
invalid email
addresses in
Customer DB
<500 25.500 4.700 4.700
Decrease
wrong
relationship of
product and
customer
ETL
enhancement is
planned.
Number of
incorrect
relations
between
products and
customers in
DB
0 2.700 2.700 2.900
Data Quality & Data Governance

Data Quality & Data Governance

  • 1.
    Data Quality andData Governance Tuba Yaman Him
  • 2.
    Why is DataQuality Important? • Wrong Reports = Wrong Decisions
  • 3.
    Why is DataQuality Important? • Wrong Reports = Wrong Decisions • Bad Reputation
  • 4.
    Why is DataQuality Important? • Wrong Reports = Wrong Decisions • Bad Reputation • Wasted Money According to a recent study in the UK, US and France, 16% to 18% of departmental budgets are eaten up because of poor data quality. The research also indicates that 90% of surveyed companies admit that inaccurate data – such as duplicate accounts, lost contacts and missed sales opportunities – contributes to budget waste. On top of this, a 2009 Gartner study revealed that the average organization surveyed loses $8.2 million annually because of poor data quality and that most of this is due to lost productivity.
  • 5.
    Modern Data Environment Enterprise Data Warehouse ERPSystems (SAP/Oracle etc) CRM (Salesforce, Dynamics etc) Manufacturing Systems Financial Systems Web Applications Documents Marketing Data Mart Sales Data Mart Financial Data Mart
  • 6.
    Modern Data Environment Enterprise Data Warehouse ERPSystems (SAP/Oracle etc) CRM (Salesforce, Dynamics etc) Manufacturing Systems Financial Systems Web Applications Documents Marketing Data Mart Sales Data Mart Financial Data Mart
  • 7.
    Dimensions Of DataQuality IntegrityAccuracy Currency Uniqueness Validity Completeness
  • 8.
    Dimensions Of DataQuality • Do data objects accurately represent the “real-world” values? • Is data correct? • Example: Wrong sales amount, wrong contact information of a customer etc. Accuracy
  • 9.
    Dimensions Of DataQuality • Is there are any data missing important relationship linkages? • Example: A product ownership without a valid owner/customer record. Integrity
  • 10.
    Dimensions Of DataQuality • Is any neccessary part of data is missing? • Example:A customer record which has an address without city, although city is mandatory. Completeness
  • 11.
    Dimensions Of DataQuality • Is data up-to-date? • Do we provide real-time data to our clients? • Example: Customers with old address information. A bank which can not provide the real-time amount of funds of its customers. Currency
  • 12.
    Dimensions Of DataQuality • Are there multiple, unnecessary representations of the same data objects within your data? • Example: 3 different records which indicate the same customer. Misspelling can be the reason. CurrencyUniqueness
  • 13.
    Dimensions Of DataQuality • Do data values comply with the specified formats and rules? • Example: A customer record whose DOB is dd/mm/1735. A customer record with invalid postal code for UK like WC3T. CurrencyValidity
  • 14.
    Methods and ToolsFor Data Quality Objective How to Validation Regular Expressions Data Merging For Duplicate Data SSIS Fuzzy Lookup, Fuzzy Grouping Packages Integrity Proper ETL and ELT Process Completeness Mandatory Fields Rules, ETL/ELT Verification For Important Information Activation E-mails, Verification SMS Prevent Typographical Error Autocomplete Tools Minimizing Human Errors Employee Training
  • 15.
    SSIS Fuzzy Matching •Tuba Yaman Him • [email protected] • Deniz Apt. • Ataşehir • İstanbul • Tuba Him • [email protected] • Deniz Apt. • Ataşehir • istanbul • Tuğba Yaman Him • [email protected] • Deniz Apt. • Ataşehir • İstanbul • Tuba Him • [email protected] • Deniz Apt. • Ataşehir • istanbul
  • 16.
    Data Governance Data governanceis a set of policies, rules and standarts in order to increase and maintain enterprise data quality. It is about putting people in charge of fixing and preventing issues with data so that the enterprise can become more efficient. Data governance also describes an evolutionary process for a company, altering the company’s way of thinking and setting up the processes to handle information so that it may be utilized by the entire organization. It’s about using technology when necessary in many forms to help aid the process. When companies desire, or are required, to gain control of their data, they empower their people, set up processes and get help from technology to do it
  • 17.
  • 18.
    Data Governance –JobAds USA1.885 India290 UK253 Canada113 Germany83 Singapore25 Switzerland24 Turkey 0
  • 19.
  • 20.
    Data Quality Scorecard ObjectiveAction Plan KPI Target Jul.2016 Aug.2016 Sep.2016 Decrease Duplicates A Merging flow will be implemented Number of duplicate records in CDB 0 11.276 3.500 200 Increase the Correctness of email info Verification process will be implemented Number of invalid email addresses in Customer DB <500 25.500 4.700 4.700 Decrease wrong relationship of product and customer ETL enhancement is planned. Number of incorrect relations between products and customers in DB 0 2.700 2.700 2.900