Tackling the enterprise
Data Quality challenge
Cognitivo Consulting
January 2020
2
COMPETING IN THE DIGITAL
AGE
In a connected world,
competing effectively
in the digital age
means making the
right decisions at
pace
3
Machine learning
algorithms rely on
data to learn for
themselves
AI could potentially create
$3.5 trillion to $5.8 trillion in
annual value in the global
economy
Source: McKinsey global institute 2018
UNLOCKING THE VALUE OF AI
4
Leaders in the digital
age are able to make
strategic and
operational decisions
based on data, at
scale
THIS IS THE DATA-DRIVEN
ORGANISATION
DATA DRIVEN ORGANISATIONS
5
The quality of your
decisions will be
proportional to the
quality of your data
Data Quality is a foundational
element of achieving digital
success
BUILDING ON STABLE
FOUNDATIONS
6
DQ is a symptom of
poor processes and
systems, which
requires coordination
across the enterprise
DATA QUALITY MUST
SUPPORT PROCESS
ASSURANCE AND
IMPROVEMENT ACROSS
THE ENTERPRISE
DQ MUST BE COORDINATED
ACROSS THE ENTERPRISE
Enterprise Architecture aligned end-to-end DQ approach
7
A successful DQ initiative relies on alignment to existing enterprise and risk management frameworks and assets
Information
Architecture
Business/ Process
Architecture
Integration
Architecture
Application &
Infrastructure
Architecture
Risk based approach to
identify key processes /
use cases in-scope for
DQ improvement
Definition of customer journey’s and
process value chains with customer &
organisational outcomes defined
1 2
Definition of a business
conceptual data model &
business rules based on
in-scope processes
Agreement of definitions
(decomposition of metrics and critical
data elements), sources of truth, RACI
(e.g. owners)
3 4
Document data lineage (data flows)
between key systems for each in-scope
process / use case
5
Catalogue systems and critical data sets and controls environment
within an Information Assets Register or Source Catalogue
6
21
Impact
Likelihood
High
Med
Low
Low Medium High
Inherent Risk
(‘gross’ risk)
DQ Treatment
Process improvement
Risk
Tolerance
<<Party>> <<Item>>
Owns, rents,
buys, sells,
leases
Service
Enters
Provides,
consumes
Uses,
maintains
Creates
conditions
<<Classification>>Arrangement Type of
<<Event>>
Type of
Location
Has
Occurs at
Occurs at
Involved in
Triggers
Consists of
Creates
<<Party>> <<Item>>
Owns, rents,
buys, sells,
leases
Service
Enters
Provides,
consumes
Uses,
maintains
Creates
conditions
<<Classification>>Arrangement Type of
<<Event>>
Type of
Location
Has
Occurs at
Occurs at
Involved in
Triggers
Consists of
Creates
3
Channels
Web Mobile
Broker
Contract
Centre
Branch
CRM
Product
Origination
Fulfilment
Risk /
Capital Mgt
Follow up
Integration (Message/Stream + Batch)
Servicing
Credit
Approval
Settlement
Payments
Finance
Cloud Data Asset
KYC
Sanctions Performance
Channels
Web Mobile
Broker
Contract
Centre
Branch
CRM
Product
Origination
Fulfilment
Risk /
Capital Mgt
Follow up
Integration (Message/Stream + Batch)
Servicing
Credit
Approval
Settlement
Payments
Finance
Cloud Data Asset
KYC
Sanctions Performance
6
5
4
Customer Journey & Associated
Business Value Chain
Operational Risk Matrix
Business Conceptual
Model
Business Metric Decomposition &
Business Definitions
Integration Landscape Data Lineage
Information Asset Register Source Catalogue
EnterpriseDataDecompositionTree
Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits
= NPAT – Cost of Equity (%) x Eco Cap ($)
ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap
Credit Risk Capital
Other
Revenue
IEL (CCA)
X
Eco Cap ($)
Expenses
Cost of Capital
Rate (%)
-
NPAT - EL
basis ($)
Cost of Capital
+
Franking
Credits – Tax
Allocated
Expenses
Controllable
Expenses
mRWA
Loss Data
(ELD/ILD)
Economic
Profit ($)
ROE (%)
Tenor
Customer
Asset Class
Credit RWA
Capital Ratio
X
Market Risk
Capital
Reg EL
Op Risk Capital
Investment Stakes, Fixed Assets,
Deferred Acquisition
Exposure At
Default (EAD)
Provisions &
Delinquencies
Collective
Provisions
Retail Pooling &
Segmentation
Probability of
Default
Loss Given Default
Loan Amount /
Limit
Product Features
Individual
Provisions
On/Off Balance
Sheet
Revocability
Industry / ANZSIC
Salient Financials
CCRDomicile Country
SIHeld Collateral
Pricing
(Rates / Fees)
Bank Capital
*Set by Group Treasury
oRWA
Illustrative
X
Capital Buffer
(Stress Test)
TSR
(Total Shareholder Return)
EnterpriseDataDecompositionTree
Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits
= NPAT – Cost of Equity (%) x Eco Cap ($)
ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap
Credit Risk Capital
Other
Revenue
IEL (CCA)
X
Eco Cap ($)
Expenses
Cost of Capital
Rate (%)
-
NPAT - EL
basis ($)
Cost of Capital
+
Franking
Credits – Tax
Allocated
Expenses
Controllable
Expenses
mRWA
Loss Data
(ELD/ILD)
Economic
Profit ($)
ROE (%)
Tenor
Customer
Asset Class
Credit RWA
Capital Ratio
X
Market Risk
Capital
Reg EL
Op Risk Capital
Investment Stakes, Fixed Assets,
Deferred Acquisition
Exposure At
Default (EAD)
Provisions &
Delinquencies
Collective
Provisions
Retail Pooling &
Segmentation
Probability of
Default
Loss Given Default
Loan Amount /
Limit
Product Features
Individual
Provisions
On/Off Balance
Sheet
Revocability
Industry / ANZSIC
Salient Financials
CCRDomicile Country
SIHeld Collateral
Pricing
(Rates / Fees)
Bank Capital
*Set by Group Treasury
oRWA
Illustrative
X
Capital Buffer
(Stress Test)
TSR
(Total Shareholder Return)
Principles of Cognitivo’s DQ approach
8
A pragmatic approach that doesn’t “boil-the-ocean” is required to focus on priority user cases while leveraging
organisational assets and AI to scale
Risk & Policy Based – Identify key processes that possess material data risk as
prioritised areas to perform DQ diagnosis and treatment
Process (use-case) Centric – Identify data flows that underpin key processes and
address data quality across the entire system data flow
Metadata Driven – Development or use of a conceptual data model as an
abstraction layer to work with business stakeholders to agree definitions and
business rules that is subsequently mapped to physical data models
Analytics & ML Enabled – use of data science techniques (such as ML, text
analytics, vision) to build industry and organisation specific data matching and data
quality diagnosis techniques
Embedded in Business-As-Usual – Roll out of DQ controls, measurement
(dashboard) as part of the organisation’s quality assurance processes, rather than
constructing new data KPI and consequence management framework
Example DQ use cases to improve key business outcomes
9
Cognitivo has extensive experience in executing data quality programmes within Financial Services, Government and
Accounting business domains
Use Case
• KYC / AML / CTF (Assurance of data feeds)
• CPS220, AIRB Accreditation
• APRA / ABS Regulatory Reporting (e.g. report on interest-only loans)
• Basel III Liquidity (FI/non-FI review)
• APS 120 Securitisation (Loan doc reconciliation)
• FATCA, GATCA
• OTC Reform, MiFID II (Cleanse LEI / SWIFT Code, Legal Form, Country of incorporation etc.)
• APS910 – SCV assurance
• IFRS9 / IFRS17 Assurance
• Staff Benefits Review (Review of former employees still on staff benefits programmes)
• Advice Compliance (SOA, PDS vs fees and charges review)
• …
Compliance
Business Management
• Payroll Assurance
• Financial Management reporting (Line of Business)
• Finance cube, business unit, GL structure review
• …
Customer / Sales
• Customer Contact Details (marketing, product service)
• Consent status
• Customer Age Review
• Customer Address Review (e.g. Suburb / postal code combination)
• Customer segmentation review
• CRM – customer structure review (e.g. customer legal structure, customer groupings)
• ..
DQ Execution
Lifecycle
Data Quality execution lifecycle
10
Cognitivo’s DQ Execution Lifecycle is linked to broader data demand management and IT planning lifecycles
Data Risk Demand
Management
Process / System
Improvement
Diagnosis
Conduct qualitative sizing, define
requirements, business rules and
conduct root cause analysis
Profiling - Size/quantify magnitude of each DQ Issue.
• Profile key data elements for validity / completeness
issues
• Correlate data across systems to identify integrity, this can
include use of techniques financial reconciliation
(checksums)
• Deploy an analytical process to find illogical combinations
of data, outliers etc.
• Higher complexity techniques such as text analytics and
computer vision to correlate with unstructured data
sources
• Machine learning approaches to identify patterns for
acceptable values / ranges
Holistic view of DQ issues & prioritisation
• Organisation-wide DQ issues register with self-assessment process to
periodically assess level of DQ risk
• DQ deep-dives through workshop / interviews for high risk areas
• Prioritise high impact and high occurrence issues to go into ‘fix process’
Correction process
• Obtain correct values and subsequent cleansing / system update.
• Automated through cross-system and 3rd party data source lookup
or derivation
• As a final step, client outreach may be required (e.g. establish call-
centre process)
Cleansing process
• Establish process for bulk update, testing and roll back within core
systems
• For systems where bulk update is not possible, develop RPA and
manual update capabilities.
Systemic Fix
• Make recommendations for systemic fixes through the
organisation’s broader change / fail-fix agenda
1
Monitoring & Reporting
• DQ issues profiled / fixed all trace back to a business
unit, hence DQ metrics can form process/compliance
KPI’s for business owners.
• DQ scorecards can automate existing QA processes /
operational risk controls by quantifying instances
where data entry is missing / incorrect.
• Trend analysis on DQ results for each responsible
business unit
DiagnosisA
6
2 ProfilingB
Correction &
Cleansing
C
Monitoring &
Reporting
D
DQ
Discovery
Scalable Data Quality DevOps
11
Cognitivo’s data quality workflow incorporates analytical tools, business testing and deployment into a DQ DevOps process
De-duplicate customers
within systems (collapse
entities)
Customer Outreach (high-priority/risk)
Cross system & 3rd party
lookup
Document /
correspondence lookup
Customer self service
Client
Applications
Correct
(obtain correct values & validate)
Issue and Workflow Management
Manual testing
values to
update via risk-
based sampling
Database Bulk Update /
Amend
Front-End Data Entry
(Inc. use of RPA – robotic
process automation)
Cleanse
(System Update)
Source Systems &
Processes
Profiling
Monitoring &
Reporting
Business Unit QA
to include DQ
measures
Customer Matching &
Analytical Environment
Issue prioritization
(based on risk i.e.
likelihood and
consequence)
Results
Dashboard
Raise new DQ
rules
To be updated
DQ Rules
Engine
DQ Development Workspace
Ingestion
DQ Production Deployment Business / Quality
Assurance
Root Cause
Analysis
Process &
System Fix
Operational Environment
Automated processes Manual processes
DQ Workspace & Platform Architecture
12
Cognitivo has a DQ Technical reference architecture that can be implemented on any vendor-agnostic cloud or on-premise
environment
To be updated
Data Pipeline
IngestionClient
Source Systems
Client Data
Warehouse(s)
Document
Repository
New Extracts
for checksums
etc
Batch Pipeline
Real Time Pipeline
(API’s / Messaging)
User Interface
DQ Policy & Rules
Configuration
Case Management
Conformed / derived values to
expedite DQ rule execution and
provide a history of values for
outlier / drift detection
Raw source system data to
derive row counts and
perform validity checks
User Interface
Data Lake
Data Science / ML Discovery Environment
Source Data Layer
Linked data
(lightly integrated)
SchedulerRules Store
Self-service data
Ingestion
Data Science Tools /
Workbench
Execution of analytical
workloads
DQ dashboards for consumption
by data stewards and business
stakeholders (data owners)
DQ Rule Execution
(Python)
DQ Rules based on the derived
semantic data model stored in
JSON format within the rules
store
Scheduler to execute DQ
Rules on a periodic basis
DQ Rules Engine
Text Extraction &
OCR
Results Dashboard
(PowerBI / Qlik)
Data Workspace
Provisioning
DQ Profile Result
Store
Semantic /
Conformed Data
(with history)
Data Science
Development &
Collaboration Tools e.g.
Git, Jupyter
Cross-system table linking to
correlate values across matched
customers
Store DQ profiling output
results. Contains historical
values to allow historical trend
analysis of DQ
Management of DQ rules,
tolerances and business owners
of DQ events
Case management tool for
logging, investigating and
remediating DQ issues
Provision of persistent
temporary storage and access to
access controlled data sets
(specific to department, user
and use cases)
Data import tools for un-
managed datasets (used
for discovery purposes)
Text extract and
analytics libraries e.g.
Tesseract
Batch data ingestion
using file (CSV) or
ODBC/JDBC
Real-time integration
Key Selected Technologies
DQ profiling techniques to be employed
13
Cognitivo’s analytical DQ framework deploys a number of analytical tests across structured and unstructured data sources
Test for
Completeness
Record count anomalies
Financial Reconciliation
(check-sums)
Test for Validity
Data Type & Format checks
(Regex pattern match)
Allowable values Reference
data lookup
Null Value check
Test for Accuracy
Illogical combinations of
multiple data fields
(e.g. individual with a business
name)
Single Field based logic
check
(e.g. age > 100)
3rd party cross reference
Cross system value cross-
reference
Reasonable value check
(record anomaly / outlier,
value drift over time)
Test for
Timeliness
Data Ingestion (ETL/ELT)
Synchronisation review
Document Text Extraction &
Cross Reference
Computer Vision
Image recognition & object
classification
Test for
Uniqueness
Duplicates within systems
Cross-system master data
reconciliation
Case Management
DQ Analytical
Engine
Cognitivo’s DQ Platform Capabilities
14
Cognitivo has a DQ application framework can be deployed onto private clouds via containers or accessed as a SaaS offering
Data Steward Portal (UX)
• Create profiling rules
• Diagnose DQ issues through reports and dashboards
• Workflow to approve data changes and case manage remediation
• APIs to integrate with 3rd party applications and check valid data entry based on data quality
rules
Core DQ Engine
• Semantic model of parameters for data stewards to create DQ rules
• DQ rule templates (e.g. regex functions, address validity, ABN format etc.)
• Analytical engine to run complex data accuracy / integrity rules
• API to allow 3rd party and customer automation, extension and access to DQ results
Data Pipeline
• Securely connect on-prem data sources to cloud environments in an encrypted manner
(Gateway)
• Database to store multiple time-stamped sampled extracts from source systems
• Efficient data ingestion pipeline with connectors for key council systems (e.g. Dynamics, ..)
Embedding DQ processes
• Build continuous improvement initiatives within directorates based on DQ analysis (e.g.
asking additional questions when customers call/visit)
• Set DQ KPIs within process metrics (e.g. accuracy of mandatory data capture)
Investigation
(Jupyter)
Reporting
Dashboard
Customer Data Sources
DQ Profiling
Datastore
Data Stewards (Users)
User Interface
Data Quality
Hub
Scheduler DQ Profiler
Connectors
DQ Managed
Parameters
(semantic model)
Gateway
Mobile App
API
DQ Rules Library
Web Interface
Cognitivo DQ Platform Screenshots (1/2)
15
..
John Smith John Smith
Cognitivo DQ Platform Screenshots (2/2)
16
John Smith

More Related Content

PPT
BCBS 239 - Risk Data Adequacy
PPTX
Infogix BCBS 239 Implementation Challenges
PDF
CDO Vision: The Value of Data
PDF
Diaku Axon for BCBS239 compliance
PDF
BCBS 239
PDF
BCBS 239 risk data aggregation reporting_Feb15_PRINT
PDF
BCBS 239 Compliance: A Comprehensive Approach
PDF
Overview of PRMIA Vienna event on BCBS 239
BCBS 239 - Risk Data Adequacy
Infogix BCBS 239 Implementation Challenges
CDO Vision: The Value of Data
Diaku Axon for BCBS239 compliance
BCBS 239
BCBS 239 risk data aggregation reporting_Feb15_PRINT
BCBS 239 Compliance: A Comprehensive Approach
Overview of PRMIA Vienna event on BCBS 239

What's hot (16)

PDF
Improving data quality & complying with BCBS239
PDF
Business Intelligence System and instrumental level multi dimensional database
PDF
Creating a Business Case for Big Data
PPTX
Bcbs 239 v4 30 oct
PPTX
London Financial Modelling Group 2015 04 30 - Model driven solutions to BCBS239
PDF
Aligning finance , risk and compliance
PDF
IBOR Middle Office Information Delivery
PDF
KPMG - BCBS239_Bracing for Change
PDF
James Okarimia Aligning Finance , Risk and Compliance to Meet Regulation
PDF
James Okarimia - Aligning Finance, Risk and Data Analytics in Meeting the Req...
PDF
Trillium Software Building the Business Case for Data Quality
PDF
The Changing Data Quality & Data Governance Landscape
PDF
James Okarimia - Aligning Finance , Risk and Data Analytics in Meeting the R...
DOC
Legal Entity Risk and Counter-Party Exposure April 2016
PPTX
Mis2013 chapter 12 business intelligence and knowledge management
PPTX
Achieving Digital Transformation in Regulatory
Improving data quality & complying with BCBS239
Business Intelligence System and instrumental level multi dimensional database
Creating a Business Case for Big Data
Bcbs 239 v4 30 oct
London Financial Modelling Group 2015 04 30 - Model driven solutions to BCBS239
Aligning finance , risk and compliance
IBOR Middle Office Information Delivery
KPMG - BCBS239_Bracing for Change
James Okarimia Aligning Finance , Risk and Compliance to Meet Regulation
James Okarimia - Aligning Finance, Risk and Data Analytics in Meeting the Req...
Trillium Software Building the Business Case for Data Quality
The Changing Data Quality & Data Governance Landscape
James Okarimia - Aligning Finance , Risk and Data Analytics in Meeting the R...
Legal Entity Risk and Counter-Party Exposure April 2016
Mis2013 chapter 12 business intelligence and knowledge management
Achieving Digital Transformation in Regulatory
Ad

Similar to Cognitivo - Tackling the enterprise data quality challenge (20)

PPT
Building a Data Quality Program from Scratch
PPTX
Marketsoft and marketing cube data quality to cc-v3
PDF
Data Quality at the Speed of Work
PPTX
Predictive Analytics: Extending asset management framework for multi-industry...
PDF
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
PDF
DGIQ 2015 The Fundamentals of Data Quality
PPTX
Innovation and Transformation in Financial Services
PDF
CreditDimensions - Data Management Specialist {e-book}
PDF
How Fannie Mae Leverages Data Quality to Improve the Business
PPT
Defence IT 2012 - Data Quality and Financial Services - Solvency II
PPTX
CUAS Data Journey V3
PPT
Is Your Data Ready to Drive Your Company's Future?
PDF
Fried data summit data quality data analytics together
PDF
From Compliance to Customer 360: Winning with Data Quality & Data Governance
PDF
Data Cleansing and Beyond: How to Address Data Debt for AI
PDF
A Better Understanding: Solving Business Challenges with Data
PPTX
Financial Services - New Approach to Data Management in the Digital Era
PPT
Artificial Intelligence Expert Session Webinar
 
PDF
Data Profiling: The First Step to Big Data Quality
Building a Data Quality Program from Scratch
Marketsoft and marketing cube data quality to cc-v3
Data Quality at the Speed of Work
Predictive Analytics: Extending asset management framework for multi-industry...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
DGIQ 2015 The Fundamentals of Data Quality
Innovation and Transformation in Financial Services
CreditDimensions - Data Management Specialist {e-book}
How Fannie Mae Leverages Data Quality to Improve the Business
Defence IT 2012 - Data Quality and Financial Services - Solvency II
CUAS Data Journey V3
Is Your Data Ready to Drive Your Company's Future?
Fried data summit data quality data analytics together
From Compliance to Customer 360: Winning with Data Quality & Data Governance
Data Cleansing and Beyond: How to Address Data Debt for AI
A Better Understanding: Solving Business Challenges with Data
Financial Services - New Approach to Data Management in the Digital Era
Artificial Intelligence Expert Session Webinar
 
Data Profiling: The First Step to Big Data Quality
Ad

Recently uploaded (20)

PPTX
Hushh.ai: Your Personal Data, Your Business
PPTX
lung disease detection using transfer learning approach.pptx
PPT
Classification methods in data analytics.ppt
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
indiraparyavaranbhavan-240418134200-31d840b3.pptx
PPTX
machinelearningoverview-250809184828-927201d2.pptx
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PPTX
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
ifsm.pptx, institutional food service management
PPTX
GPS sensor used agriculture land for automation
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPTX
Introduction to Fundamentals of Data Security
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PDF
REPORT CARD OF GRADE 2 2025-2026 MATATAG
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
Hushh.ai: Your Personal Data, Your Business
lung disease detection using transfer learning approach.pptx
Classification methods in data analytics.ppt
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
indiraparyavaranbhavan-240418134200-31d840b3.pptx
machinelearningoverview-250809184828-927201d2.pptx
AI AND ML PROPOSAL PRESENTATION MUST.pptx
cp-and-safeguarding-training-2018-2019-mmfv2-230818062456-767bc1a7.pptx
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
ifsm.pptx, institutional food service management
GPS sensor used agriculture land for automation
Teal Blue Futuristic Metaverse Presentation.pdf
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Hushh Hackathon for IIT Bombay: Create your very own Agents
Introduction to Fundamentals of Data Security
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
REPORT CARD OF GRADE 2 2025-2026 MATATAG
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs

Cognitivo - Tackling the enterprise data quality challenge

  • 1. Tackling the enterprise Data Quality challenge Cognitivo Consulting January 2020
  • 2. 2 COMPETING IN THE DIGITAL AGE In a connected world, competing effectively in the digital age means making the right decisions at pace
  • 3. 3 Machine learning algorithms rely on data to learn for themselves AI could potentially create $3.5 trillion to $5.8 trillion in annual value in the global economy Source: McKinsey global institute 2018 UNLOCKING THE VALUE OF AI
  • 4. 4 Leaders in the digital age are able to make strategic and operational decisions based on data, at scale THIS IS THE DATA-DRIVEN ORGANISATION DATA DRIVEN ORGANISATIONS
  • 5. 5 The quality of your decisions will be proportional to the quality of your data Data Quality is a foundational element of achieving digital success BUILDING ON STABLE FOUNDATIONS
  • 6. 6 DQ is a symptom of poor processes and systems, which requires coordination across the enterprise DATA QUALITY MUST SUPPORT PROCESS ASSURANCE AND IMPROVEMENT ACROSS THE ENTERPRISE DQ MUST BE COORDINATED ACROSS THE ENTERPRISE
  • 7. Enterprise Architecture aligned end-to-end DQ approach 7 A successful DQ initiative relies on alignment to existing enterprise and risk management frameworks and assets Information Architecture Business/ Process Architecture Integration Architecture Application & Infrastructure Architecture Risk based approach to identify key processes / use cases in-scope for DQ improvement Definition of customer journey’s and process value chains with customer & organisational outcomes defined 1 2 Definition of a business conceptual data model & business rules based on in-scope processes Agreement of definitions (decomposition of metrics and critical data elements), sources of truth, RACI (e.g. owners) 3 4 Document data lineage (data flows) between key systems for each in-scope process / use case 5 Catalogue systems and critical data sets and controls environment within an Information Assets Register or Source Catalogue 6 21 Impact Likelihood High Med Low Low Medium High Inherent Risk (‘gross’ risk) DQ Treatment Process improvement Risk Tolerance <<Party>> <<Item>> Owns, rents, buys, sells, leases Service Enters Provides, consumes Uses, maintains Creates conditions <<Classification>>Arrangement Type of <<Event>> Type of Location Has Occurs at Occurs at Involved in Triggers Consists of Creates <<Party>> <<Item>> Owns, rents, buys, sells, leases Service Enters Provides, consumes Uses, maintains Creates conditions <<Classification>>Arrangement Type of <<Event>> Type of Location Has Occurs at Occurs at Involved in Triggers Consists of Creates 3 Channels Web Mobile Broker Contract Centre Branch CRM Product Origination Fulfilment Risk / Capital Mgt Follow up Integration (Message/Stream + Batch) Servicing Credit Approval Settlement Payments Finance Cloud Data Asset KYC Sanctions Performance Channels Web Mobile Broker Contract Centre Branch CRM Product Origination Fulfilment Risk / Capital Mgt Follow up Integration (Message/Stream + Batch) Servicing Credit Approval Settlement Payments Finance Cloud Data Asset KYC Sanctions Performance 6 5 4 Customer Journey & Associated Business Value Chain Operational Risk Matrix Business Conceptual Model Business Metric Decomposition & Business Definitions Integration Landscape Data Lineage Information Asset Register Source Catalogue EnterpriseDataDecompositionTree Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits = NPAT – Cost of Equity (%) x Eco Cap ($) ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap Credit Risk Capital Other Revenue IEL (CCA) X Eco Cap ($) Expenses Cost of Capital Rate (%) - NPAT - EL basis ($) Cost of Capital + Franking Credits – Tax Allocated Expenses Controllable Expenses mRWA Loss Data (ELD/ILD) Economic Profit ($) ROE (%) Tenor Customer Asset Class Credit RWA Capital Ratio X Market Risk Capital Reg EL Op Risk Capital Investment Stakes, Fixed Assets, Deferred Acquisition Exposure At Default (EAD) Provisions & Delinquencies Collective Provisions Retail Pooling & Segmentation Probability of Default Loss Given Default Loan Amount / Limit Product Features Individual Provisions On/Off Balance Sheet Revocability Industry / ANZSIC Salient Financials CCRDomicile Country SIHeld Collateral Pricing (Rates / Fees) Bank Capital *Set by Group Treasury oRWA Illustrative X Capital Buffer (Stress Test) TSR (Total Shareholder Return) EnterpriseDataDecompositionTree Note: EcoProfit = NPAT - Cost of Equity ($) + IEL(CCA) + Imputation Credits = NPAT – Cost of Equity (%) x Eco Cap ($) ROE = NPAT / Book Equity, Book Equity = EcoCap = Total Reg Cap Credit Risk Capital Other Revenue IEL (CCA) X Eco Cap ($) Expenses Cost of Capital Rate (%) - NPAT - EL basis ($) Cost of Capital + Franking Credits – Tax Allocated Expenses Controllable Expenses mRWA Loss Data (ELD/ILD) Economic Profit ($) ROE (%) Tenor Customer Asset Class Credit RWA Capital Ratio X Market Risk Capital Reg EL Op Risk Capital Investment Stakes, Fixed Assets, Deferred Acquisition Exposure At Default (EAD) Provisions & Delinquencies Collective Provisions Retail Pooling & Segmentation Probability of Default Loss Given Default Loan Amount / Limit Product Features Individual Provisions On/Off Balance Sheet Revocability Industry / ANZSIC Salient Financials CCRDomicile Country SIHeld Collateral Pricing (Rates / Fees) Bank Capital *Set by Group Treasury oRWA Illustrative X Capital Buffer (Stress Test) TSR (Total Shareholder Return)
  • 8. Principles of Cognitivo’s DQ approach 8 A pragmatic approach that doesn’t “boil-the-ocean” is required to focus on priority user cases while leveraging organisational assets and AI to scale Risk & Policy Based – Identify key processes that possess material data risk as prioritised areas to perform DQ diagnosis and treatment Process (use-case) Centric – Identify data flows that underpin key processes and address data quality across the entire system data flow Metadata Driven – Development or use of a conceptual data model as an abstraction layer to work with business stakeholders to agree definitions and business rules that is subsequently mapped to physical data models Analytics & ML Enabled – use of data science techniques (such as ML, text analytics, vision) to build industry and organisation specific data matching and data quality diagnosis techniques Embedded in Business-As-Usual – Roll out of DQ controls, measurement (dashboard) as part of the organisation’s quality assurance processes, rather than constructing new data KPI and consequence management framework
  • 9. Example DQ use cases to improve key business outcomes 9 Cognitivo has extensive experience in executing data quality programmes within Financial Services, Government and Accounting business domains Use Case • KYC / AML / CTF (Assurance of data feeds) • CPS220, AIRB Accreditation • APRA / ABS Regulatory Reporting (e.g. report on interest-only loans) • Basel III Liquidity (FI/non-FI review) • APS 120 Securitisation (Loan doc reconciliation) • FATCA, GATCA • OTC Reform, MiFID II (Cleanse LEI / SWIFT Code, Legal Form, Country of incorporation etc.) • APS910 – SCV assurance • IFRS9 / IFRS17 Assurance • Staff Benefits Review (Review of former employees still on staff benefits programmes) • Advice Compliance (SOA, PDS vs fees and charges review) • … Compliance Business Management • Payroll Assurance • Financial Management reporting (Line of Business) • Finance cube, business unit, GL structure review • … Customer / Sales • Customer Contact Details (marketing, product service) • Consent status • Customer Age Review • Customer Address Review (e.g. Suburb / postal code combination) • Customer segmentation review • CRM – customer structure review (e.g. customer legal structure, customer groupings) • ..
  • 10. DQ Execution Lifecycle Data Quality execution lifecycle 10 Cognitivo’s DQ Execution Lifecycle is linked to broader data demand management and IT planning lifecycles Data Risk Demand Management Process / System Improvement Diagnosis Conduct qualitative sizing, define requirements, business rules and conduct root cause analysis Profiling - Size/quantify magnitude of each DQ Issue. • Profile key data elements for validity / completeness issues • Correlate data across systems to identify integrity, this can include use of techniques financial reconciliation (checksums) • Deploy an analytical process to find illogical combinations of data, outliers etc. • Higher complexity techniques such as text analytics and computer vision to correlate with unstructured data sources • Machine learning approaches to identify patterns for acceptable values / ranges Holistic view of DQ issues & prioritisation • Organisation-wide DQ issues register with self-assessment process to periodically assess level of DQ risk • DQ deep-dives through workshop / interviews for high risk areas • Prioritise high impact and high occurrence issues to go into ‘fix process’ Correction process • Obtain correct values and subsequent cleansing / system update. • Automated through cross-system and 3rd party data source lookup or derivation • As a final step, client outreach may be required (e.g. establish call- centre process) Cleansing process • Establish process for bulk update, testing and roll back within core systems • For systems where bulk update is not possible, develop RPA and manual update capabilities. Systemic Fix • Make recommendations for systemic fixes through the organisation’s broader change / fail-fix agenda 1 Monitoring & Reporting • DQ issues profiled / fixed all trace back to a business unit, hence DQ metrics can form process/compliance KPI’s for business owners. • DQ scorecards can automate existing QA processes / operational risk controls by quantifying instances where data entry is missing / incorrect. • Trend analysis on DQ results for each responsible business unit DiagnosisA 6 2 ProfilingB Correction & Cleansing C Monitoring & Reporting D
  • 11. DQ Discovery Scalable Data Quality DevOps 11 Cognitivo’s data quality workflow incorporates analytical tools, business testing and deployment into a DQ DevOps process De-duplicate customers within systems (collapse entities) Customer Outreach (high-priority/risk) Cross system & 3rd party lookup Document / correspondence lookup Customer self service Client Applications Correct (obtain correct values & validate) Issue and Workflow Management Manual testing values to update via risk- based sampling Database Bulk Update / Amend Front-End Data Entry (Inc. use of RPA – robotic process automation) Cleanse (System Update) Source Systems & Processes Profiling Monitoring & Reporting Business Unit QA to include DQ measures Customer Matching & Analytical Environment Issue prioritization (based on risk i.e. likelihood and consequence) Results Dashboard Raise new DQ rules To be updated DQ Rules Engine DQ Development Workspace Ingestion DQ Production Deployment Business / Quality Assurance Root Cause Analysis Process & System Fix Operational Environment Automated processes Manual processes
  • 12. DQ Workspace & Platform Architecture 12 Cognitivo has a DQ Technical reference architecture that can be implemented on any vendor-agnostic cloud or on-premise environment To be updated Data Pipeline IngestionClient Source Systems Client Data Warehouse(s) Document Repository New Extracts for checksums etc Batch Pipeline Real Time Pipeline (API’s / Messaging) User Interface DQ Policy & Rules Configuration Case Management Conformed / derived values to expedite DQ rule execution and provide a history of values for outlier / drift detection Raw source system data to derive row counts and perform validity checks User Interface Data Lake Data Science / ML Discovery Environment Source Data Layer Linked data (lightly integrated) SchedulerRules Store Self-service data Ingestion Data Science Tools / Workbench Execution of analytical workloads DQ dashboards for consumption by data stewards and business stakeholders (data owners) DQ Rule Execution (Python) DQ Rules based on the derived semantic data model stored in JSON format within the rules store Scheduler to execute DQ Rules on a periodic basis DQ Rules Engine Text Extraction & OCR Results Dashboard (PowerBI / Qlik) Data Workspace Provisioning DQ Profile Result Store Semantic / Conformed Data (with history) Data Science Development & Collaboration Tools e.g. Git, Jupyter Cross-system table linking to correlate values across matched customers Store DQ profiling output results. Contains historical values to allow historical trend analysis of DQ Management of DQ rules, tolerances and business owners of DQ events Case management tool for logging, investigating and remediating DQ issues Provision of persistent temporary storage and access to access controlled data sets (specific to department, user and use cases) Data import tools for un- managed datasets (used for discovery purposes) Text extract and analytics libraries e.g. Tesseract Batch data ingestion using file (CSV) or ODBC/JDBC Real-time integration Key Selected Technologies
  • 13. DQ profiling techniques to be employed 13 Cognitivo’s analytical DQ framework deploys a number of analytical tests across structured and unstructured data sources Test for Completeness Record count anomalies Financial Reconciliation (check-sums) Test for Validity Data Type & Format checks (Regex pattern match) Allowable values Reference data lookup Null Value check Test for Accuracy Illogical combinations of multiple data fields (e.g. individual with a business name) Single Field based logic check (e.g. age > 100) 3rd party cross reference Cross system value cross- reference Reasonable value check (record anomaly / outlier, value drift over time) Test for Timeliness Data Ingestion (ETL/ELT) Synchronisation review Document Text Extraction & Cross Reference Computer Vision Image recognition & object classification Test for Uniqueness Duplicates within systems Cross-system master data reconciliation
  • 14. Case Management DQ Analytical Engine Cognitivo’s DQ Platform Capabilities 14 Cognitivo has a DQ application framework can be deployed onto private clouds via containers or accessed as a SaaS offering Data Steward Portal (UX) • Create profiling rules • Diagnose DQ issues through reports and dashboards • Workflow to approve data changes and case manage remediation • APIs to integrate with 3rd party applications and check valid data entry based on data quality rules Core DQ Engine • Semantic model of parameters for data stewards to create DQ rules • DQ rule templates (e.g. regex functions, address validity, ABN format etc.) • Analytical engine to run complex data accuracy / integrity rules • API to allow 3rd party and customer automation, extension and access to DQ results Data Pipeline • Securely connect on-prem data sources to cloud environments in an encrypted manner (Gateway) • Database to store multiple time-stamped sampled extracts from source systems • Efficient data ingestion pipeline with connectors for key council systems (e.g. Dynamics, ..) Embedding DQ processes • Build continuous improvement initiatives within directorates based on DQ analysis (e.g. asking additional questions when customers call/visit) • Set DQ KPIs within process metrics (e.g. accuracy of mandatory data capture) Investigation (Jupyter) Reporting Dashboard Customer Data Sources DQ Profiling Datastore Data Stewards (Users) User Interface Data Quality Hub Scheduler DQ Profiler Connectors DQ Managed Parameters (semantic model) Gateway Mobile App API DQ Rules Library Web Interface
  • 15. Cognitivo DQ Platform Screenshots (1/2) 15 .. John Smith John Smith
  • 16. Cognitivo DQ Platform Screenshots (2/2) 16 John Smith