SlideShare a Scribd company logo
Fern Halper, Ph.D.
VP and Senior Director, TDWI
Advanced Analytics
March 2, 2022
You Need a Data Catalog. Do You
Know Why?
Copyright © 2022 TDWI
SPONSOR
2
FERN HALPER
VP, Senior Research Director for
Advanced Analytics
TDWI
What we will talk
about today
• Data management challenges and
priorities
• The modern data catalog – what it is
and why it is important
• The role of the modern data catalog
in your data quality and governance
programs
• The kinds of information that should
be in your data catalog and why
Copyright © 2022 TDWI
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Video data
Audio data
Clickstream data
Image still data
Machine generated data (e.g., from sensors, devices)
Real-time event streams
External text data
Geospatial data
Semi-structured data
Internal text data
Time series data
Log data
Demographic data
Transactional data
Structured data
What kind of data is your organization managing now?
Looking to manage in the next year?
Manage now Manage in next year
Organizations are
collecting diverse
data for use in analytics
Copyright © 2022 TDWI
And, they are running
into challenges,
especially around
governance and trust
• “It’s a mess”
• “Finding the right data”
• “Data variety and volumes”
• “Metadata reconciliation, participation by stakeholders
in data governance”
• “Lack of built-in metadata within new Data Lakehouse
platform relating to business and technical metadata
around individual files/tables/elements/columns.”
• “Dispersed data environment and lack of governance”
• “So many disparate systems and sources of truth”
• “Misunderstanding of the data”
• “Understanding it. Do we have the SME's, or data
lineage or metadata to really know what's there? So
end users can leverage it, so technical professionals
can support it, so QA analysts can test it.”
Copyright © 2022 TDWI
Organizations want to
see improvements in
data environment
Copyright © 2022 TDWI
Their priorities reflect
the challenges
Copyright © 2022 TDWI
Data Catalog: a central repository that contains
metadata for describing data sets, how they are
defined, and where to find them.
Copyright © 2022 TDWI
Modern enterprise data
catalogs offer new
features
AI infused
• Search-based interface
– For finding data
– NL Query for basic stats questions
• Automation- metadata extraction,
tagging, cleansing, classification,
access, syntax, lineage, etc.
Collaboration/Crowdsourcing to provide
rating and certification
Copyright © 2022 TDWI
• Completeness
• Accuracy
• Integrity
• Consistency
• Data usage
• Data storage
• Data transfer
• Data disposal
• Data compliance
Data
awareness
Accountability
and
responsibility
Data quality
Data audit
and
compliance
Data
protection
• Organizational
structure
• Data owners
• Data stewards
• Discovery
• Classification
• Treatment
• Access
controls
• Other
controls
• Encryption
• Masking
Data Governance
Data Governance
Principles
Copyright © 2022 TDWI
Data governance
and the data catalog
• Provides an inventory of data to be
controlled
• A collaboration hub for governance
• Provides information needed for
governance (automated)
– Metadata extraction
– Data quality
– Sensitive data
– Data lineage
• Provide quality metrics
• Can integrate to other governance tools
• Respond faster to governance
regulations and audit
Copyright © 2022 TDWI
• Completeness
• Accuracy
• Integrity
• Consistency
• Data usage
• Data storage
• Data transfer
• Data disposal
• Data
compliance
Data
awareness
Accountability
and
responsibility
Data quality
Data audit
and
compliance
Data
protection
• Organizational
structure
• Data owners
• Data stewards
• Discovery
• Classification
• Treatment
• Access
controls
• Other
controls
• Encryption
• Masking
Data Governance
Areas where data
catalog can help in
governance
Copyright © 2022 TDWI
THANK YOU!
SHAUN CONNOLLY
Strategy Principal
Precisely
CHRISTOPHER REED
Sales Engineering Manager
Precisely
Data catalog drivers
17
Today’s challenges
• We stood up a data lake, it’s too much,
now what?
• Profusion of valued analytics. Driving
need for shareable data
• Regulatory compliance
• You’ve been told you need a data catalog
• My spreadsheet and email repositories
are not keeping up
Data catalogs and data governance
18
Data
Internet
Webpages
Marketplace
Appliances
Cosmetics
Groceries
Business
Technical
Positional Fixed length record
Delimited
XML
JSON
Unstructured
AVRO, PARQUET, ORC
MS Excel
Files
Databases & data lakes
Applications
Books
eBooks
Magazines
Library
Reports
Analytics
ETLs and technologies
Cloud
A data catalog enables business-ready data
19
Data that is trusted
Data that is ready to deliver
outcomes
Data that is easy to find and
understand
• Business objectives framework
• Enterprise governance & ownership
• Metrics & scoring
• Balancing, reconciliation & controls
• Stewardship & workflow
• Case management
• Data catalog & smart glossary
• Data lineage & impact analysis
• Data acquisition & analysis
What are the components of a “modern” data catalog?
20
Contains extensive
information about
your data
Has business
context
Intuitive user
interface
Turnkey automation,
administration,
and integration
Promotes governance
and stewardship
What is
governance?
Governance is often
influenced by perspective
Governance activities
22
• In a well-managed data environment
data governance is well integrated with
the business strategy
• Modern data governance does this
proactively and passively
• Best in class solutions incorporate
AI and ML to facilitate recommendation
and automation engines
Data strategy Data governance
Data management /
operations
Strategy
drives actions
The data “W”s drive
awareness (metadata)
How well the data
strategy is working
(metric metadata)
Identifies data that is
important to strategy
Business strategy
Impact of data
strategy on KPIs
Explicit alignment to
business goals &
objectives
Business alignment
& impact metrics
Data governance is the keystone for your data catalog
23
Outcome
Business objectives
Measures & metrics
Processes & stages
People
Data
Governance
Catalog
=
+
Reporting & compliance
Analytics & insights
Operational excellence
Governance focuses on critical data
24
All available data
100% of data
Data we use
40% of data
Data we should govern
10% of data
Data of high value
100-200 data elements
CRITICAL DATA
Data and metadata
Selection of data at the system and
source level (tables and fields)
Information
As required to develop a common
language for important data
Business process excellence
To monitor the effectiveness of our
processes design and execution
Business goals & objectives
Focus on critical data elements required to support value drivers and key initiatives
The evolution has been towards greater reliance on rich metadata as it captures the data’s findability,
usability and appropriateness within a particular context – implies well cataloged data
Death by excel!
25
Getting off excel
Data has too many relationships to be handled in spreadsheets
27
Answers need to be a few
clicks away – not a few
pivot tables away!
Compliance
28
Catalogs for compliance…
If it is not cataloged – it is not governed!
Contained in catalog:
1. How and where used
2. Data is labelled to show where it
is in the lifecycle
3. Data is linked to a data package
4. Data has security classification
5. Personal information “type” label
flags this data as falling under
privacy regulations
Do we know enough about the data to know it is
being managed correctly?
Audit control model has transparency and accountability
30
Catalog contains the audit
“view” of data:
A. Task owner
B. Task detail
C. The data required
D. The standard that guides
the execution
E. The control rule that
enforces the standard
F. The metric that
measures compliance
… which provides data level accountability
Accountability is defined for each control point…
Task Owner
A
Tasks
B Data
C Standard
D
Control
Rule
E Metric
F
Optimizing
operations
31
Data exchange with external manufacturing
MDM Finance Source Pian Make Quality Deliver
SAP P01
JDE 7.3
PC4 PC4 PC4
MBox
PIP
Quality
Prisym
360
Wm
Mbox
Mbox
MDS
SAP P02
Laser
etch
SQLDB at
SAP
forms
DocuSphere
ATP
Bank
Sterling
NRP
Excel
LIDO/
PAGO
Neptune
SAP
GRC BODS SOLMAN
SMI purchase
order
Basic data and
classification
Data
load
script
Data
load
script
Basic data and
classification
A/P check
details
(end
state)
Inventory
Open purchase
orders
Sales orders
Open
purchase
orders
Advanced
shipping
Notification SMI
Delivery note/
Sale order, ASN
Doc. Print
Requests
Label
data
Material,
serial code,
batch data
Production
order –
Lot master data
Manual
load email
Production
order –
Lot master data
Picking, manufacturing
logistics transactions
Manual
interface
with Sap
P02
Lot master data
Agile client
(JnJ network)
Company
For data
Conversion
Solman
LPPF
Print
Print
Supply chain lifecycle
Company
External manufacturer
Exchange
Data exchange with external manufacturing
MDM Finance Source Plan Make Quality Deliver
SAP P01
JDE 7.3
PC4 PC4 PC4
MBox
PIP
Quality
Prisym
360
Wm
Mbox
Mbox
MDS
SAP P02
Laser
etch
SQLDB at
SAP
forms
DocuSphere
ATP
Bank
Sterling
NRP
Excel
LIDO/
PAGO
Neptune
SAP
GRC BODS SOLMAN
SMI purchase
order
Basic data and
classification
Data
load
script
Data
load
script
Basic data and
classification
A/P check
details
(end
state)
Inventory
Open purchase
orders
Sales orders
Open
purchase
orders
Advanced
shipping
Notification SMI
Delivery note/
Sale order, ASN
Doc. Print
Requests
Label
data
Material,
serial code,
batch data
Production
order –
Lot master data
Manual
load email
Production
order –
Lot master data
Picking, manufacturing
logistics transactions
Manual
interface
with Sap
P02
Lot master data
Agile client
(JnJ network)
Company
For data
Conversion
Solman
LPPF
Print
Print
Set
up
Order
request
Shipping
notice
Master data
reference
data
Transaction
data
Billing notice
shipping
detail
Cataloging captures issue PRIOR to production
34
Data quality rule: the shipping notice cannot contain any master data
or reference data that was not contained in the set-up file
Objects
contain data
Interface has 3
objects
In closing… it is not just about the data!
35
How do you manage everything that surrounds the data? (metadata)
Technical
metadata
Platforms; applications;
tables; relationships
Control
metadata
Rules; standards;
ownership; RACI
The data
Semantic metadata /
meaning & context
Hierarchies; classification;
allowed values…
Let’s continue the
conversation…
Contact us
Set up a 30-minute personalized demo
Precisely.com/contact
www.precisely.com
Demos
White Papers
Case Studies
QUESTIONS?
tdwi.org
CONTACT INFORMATION
If you have further questions or comments:
Fern Halper, TDWI Christopher Reed
fhalper@tdwi.org @fhalper christopher.reed@precisely.com
Shaun Connolly
shaun.connolly@precisely.com
tdwi.org

More Related Content

PDF
Data Catalog as a Business Enabler
Srinivasan Sankar
 
PDF
Data Catalog as the Platform for Data Intelligence
Alation
 
PDF
Data Catalog for Better Data Discovery and Governance
Denodo
 
PDF
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
PPTX
The Business Glossary, Data Dictionary, Data Catalog Trifecta
georgefirican
 
PDF
Data Governance and Metadata Management
DATAVERSITY
 
PDF
The Importance of Metadata
DATAVERSITY
 
PDF
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
Data Catalog as a Business Enabler
Srinivasan Sankar
 
Data Catalog as the Platform for Data Intelligence
Alation
 
Data Catalog for Better Data Discovery and Governance
Denodo
 
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
The Business Glossary, Data Dictionary, Data Catalog Trifecta
georgefirican
 
Data Governance and Metadata Management
DATAVERSITY
 
The Importance of Metadata
DATAVERSITY
 
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 

What's hot (20)

PDF
Introduction to Data Governance
John Bao Vuu
 
PDF
Data Governance Best Practices
DATAVERSITY
 
PDF
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
PDF
Data Governance Best Practices, Assessments, and Roadmaps
DATAVERSITY
 
PDF
Glossaries, Dictionaries, and Catalogs Result in Data Governance
DATAVERSITY
 
PDF
Data Quality Best Practices
DATAVERSITY
 
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
PDF
Data Governance
Boris Otto
 
PDF
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
PDF
Best Practices in Metadata Management
DATAVERSITY
 
PDF
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
DATAVERSITY
 
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
PDF
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
PDF
RWDG Slides: What is a Data Steward to do?
DATAVERSITY
 
PPTX
Data Quality & Data Governance
Tuba Yaman Him
 
PDF
Reference master data management
Dr. Hamdan Al-Sabri
 
PDF
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
 
PPTX
How to Build & Sustain a Data Governance Operating Model
DATUM LLC
 
PDF
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
DATAVERSITY
 
PDF
Building a Data Governance Strategy
Analytics8
 
Introduction to Data Governance
John Bao Vuu
 
Data Governance Best Practices
DATAVERSITY
 
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Data Governance Best Practices, Assessments, and Roadmaps
DATAVERSITY
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
DATAVERSITY
 
Data Quality Best Practices
DATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Data Governance
Boris Otto
 
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Best Practices in Metadata Management
DATAVERSITY
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
DATAVERSITY
 
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
RWDG Slides: What is a Data Steward to do?
DATAVERSITY
 
Data Quality & Data Governance
Tuba Yaman Him
 
Reference master data management
Dr. Hamdan Al-Sabri
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
 
How to Build & Sustain a Data Governance Operating Model
DATUM LLC
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
DATAVERSITY
 
Building a Data Governance Strategy
Analytics8
 
Ad

Similar to You Need a Data Catalog. Do You Know Why? (20)

PPTX
You Need a Data Catalog. Do You Know Why?
Precisely
 
PPTX
You Need a Data Catalog. Do You Know Why?
Precisely
 
PPTX
You Need a Data Catalog. Do You Know Why?
Precisely
 
PDF
Chief Data & Analytics Officer Fall Boston - Presentation
Srinivasan Sankar
 
PDF
DAMA Australia: How to Choose a Data Management Tool
Precisely
 
PDF
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Molly Alexander
 
PPTX
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Aseda Owusua Addai-Deseh
 
PDF
Performance management capability
designer DATA
 
PDF
Does Your Data Catalog Tool Have These Capabilities?
Vivek Mishra
 
PPTX
Data Democratization and AI Drive the Scope for Data Governance
Precisely
 
PDF
Delivering Trusted Insights with Integrated Data Quality for Collibra
Precisely
 
PDF
RungananW-DA&DG 201701 V2.0
Runganan Wankundee
 
PDF
DMBOK and Data Governance
Peter Vennel PMP,SCEA,CBIP,CDMP
 
PPTX
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
Ray Mcglew
 
PDF
Business Intelligence: Data Warehouses
Michael Lamont
 
PDF
What Data Do You Have and Where is It?
Caserta
 
PDF
Information & Data Architecture
Sammer Qader
 
PPTX
Data and types in business analytics process
RajiRagukumar2
 
PDF
Noise to Signal - The Biggest Problem in Data
DATAVERSITY
 
PDF
Master data management
Zahra Mansoori
 
You Need a Data Catalog. Do You Know Why?
Precisely
 
You Need a Data Catalog. Do You Know Why?
Precisely
 
You Need a Data Catalog. Do You Know Why?
Precisely
 
Chief Data & Analytics Officer Fall Boston - Presentation
Srinivasan Sankar
 
DAMA Australia: How to Choose a Data Management Tool
Precisely
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Molly Alexander
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Aseda Owusua Addai-Deseh
 
Performance management capability
designer DATA
 
Does Your Data Catalog Tool Have These Capabilities?
Vivek Mishra
 
Data Democratization and AI Drive the Scope for Data Governance
Precisely
 
Delivering Trusted Insights with Integrated Data Quality for Collibra
Precisely
 
RungananW-DA&DG 201701 V2.0
Runganan Wankundee
 
DMBOK and Data Governance
Peter Vennel PMP,SCEA,CBIP,CDMP
 
BI: How Can Your High-Performance BI System Meet Expectations When You Feed I...
Ray Mcglew
 
Business Intelligence: Data Warehouses
Michael Lamont
 
What Data Do You Have and Where is It?
Caserta
 
Information & Data Architecture
Sammer Qader
 
Data and types in business analytics process
RajiRagukumar2
 
Noise to Signal - The Biggest Problem in Data
DATAVERSITY
 
Master data management
Zahra Mansoori
 
Ad

More from Precisely (20)

PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Precisely
 
PDF
Introducing Syncsort™ Storage Management.pdf
Precisely
 
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
Precisely
 
PDF
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Precisely
 
PDF
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Precisely
 
PDF
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
Precisely
 
PDF
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
PDF
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
PDF
The 2025 Guide on What's Next for Automation.pdf
Precisely
 
PDF
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Precisely
 
PDF
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Precisely
 
PDF
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Precisely
 
PDF
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Precisely
 
PDF
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
Precisely
 
PDF
The Changing Compliance Landscape in 2025.pdf
Precisely
 
PDF
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
PDF
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Precisely
 
Introducing Syncsort™ Storage Management.pdf
Precisely
 
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
Precisely
 
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Precisely
 
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Precisely
 
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
Precisely
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
Precisely
 
The 2025 Guide on What's Next for Automation.pdf
Precisely
 
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Precisely
 
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Precisely
 
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Precisely
 
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Precisely
 
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
Precisely
 
The Changing Compliance Landscape in 2025.pdf
Precisely
 
AI You Can Trust: The Critical Role of Governance and Quality.pdf
Precisely
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 

Recently uploaded (20)

PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Software Development Methodologies in 2025
KodekX
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 

You Need a Data Catalog. Do You Know Why?

  • 1. Fern Halper, Ph.D. VP and Senior Director, TDWI Advanced Analytics March 2, 2022 You Need a Data Catalog. Do You Know Why? Copyright © 2022 TDWI
  • 3. FERN HALPER VP, Senior Research Director for Advanced Analytics TDWI
  • 4. What we will talk about today • Data management challenges and priorities • The modern data catalog – what it is and why it is important • The role of the modern data catalog in your data quality and governance programs • The kinds of information that should be in your data catalog and why Copyright © 2022 TDWI
  • 5. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Video data Audio data Clickstream data Image still data Machine generated data (e.g., from sensors, devices) Real-time event streams External text data Geospatial data Semi-structured data Internal text data Time series data Log data Demographic data Transactional data Structured data What kind of data is your organization managing now? Looking to manage in the next year? Manage now Manage in next year Organizations are collecting diverse data for use in analytics Copyright © 2022 TDWI
  • 6. And, they are running into challenges, especially around governance and trust • “It’s a mess” • “Finding the right data” • “Data variety and volumes” • “Metadata reconciliation, participation by stakeholders in data governance” • “Lack of built-in metadata within new Data Lakehouse platform relating to business and technical metadata around individual files/tables/elements/columns.” • “Dispersed data environment and lack of governance” • “So many disparate systems and sources of truth” • “Misunderstanding of the data” • “Understanding it. Do we have the SME's, or data lineage or metadata to really know what's there? So end users can leverage it, so technical professionals can support it, so QA analysts can test it.” Copyright © 2022 TDWI
  • 7. Organizations want to see improvements in data environment Copyright © 2022 TDWI
  • 8. Their priorities reflect the challenges Copyright © 2022 TDWI
  • 9. Data Catalog: a central repository that contains metadata for describing data sets, how they are defined, and where to find them. Copyright © 2022 TDWI
  • 10. Modern enterprise data catalogs offer new features AI infused • Search-based interface – For finding data – NL Query for basic stats questions • Automation- metadata extraction, tagging, cleansing, classification, access, syntax, lineage, etc. Collaboration/Crowdsourcing to provide rating and certification Copyright © 2022 TDWI
  • 11. • Completeness • Accuracy • Integrity • Consistency • Data usage • Data storage • Data transfer • Data disposal • Data compliance Data awareness Accountability and responsibility Data quality Data audit and compliance Data protection • Organizational structure • Data owners • Data stewards • Discovery • Classification • Treatment • Access controls • Other controls • Encryption • Masking Data Governance Data Governance Principles Copyright © 2022 TDWI
  • 12. Data governance and the data catalog • Provides an inventory of data to be controlled • A collaboration hub for governance • Provides information needed for governance (automated) – Metadata extraction – Data quality – Sensitive data – Data lineage • Provide quality metrics • Can integrate to other governance tools • Respond faster to governance regulations and audit Copyright © 2022 TDWI
  • 13. • Completeness • Accuracy • Integrity • Consistency • Data usage • Data storage • Data transfer • Data disposal • Data compliance Data awareness Accountability and responsibility Data quality Data audit and compliance Data protection • Organizational structure • Data owners • Data stewards • Discovery • Classification • Treatment • Access controls • Other controls • Encryption • Masking Data Governance Areas where data catalog can help in governance Copyright © 2022 TDWI
  • 17. Data catalog drivers 17 Today’s challenges • We stood up a data lake, it’s too much, now what? • Profusion of valued analytics. Driving need for shareable data • Regulatory compliance • You’ve been told you need a data catalog • My spreadsheet and email repositories are not keeping up
  • 18. Data catalogs and data governance 18 Data Internet Webpages Marketplace Appliances Cosmetics Groceries Business Technical Positional Fixed length record Delimited XML JSON Unstructured AVRO, PARQUET, ORC MS Excel Files Databases & data lakes Applications Books eBooks Magazines Library Reports Analytics ETLs and technologies Cloud
  • 19. A data catalog enables business-ready data 19 Data that is trusted Data that is ready to deliver outcomes Data that is easy to find and understand • Business objectives framework • Enterprise governance & ownership • Metrics & scoring • Balancing, reconciliation & controls • Stewardship & workflow • Case management • Data catalog & smart glossary • Data lineage & impact analysis • Data acquisition & analysis
  • 20. What are the components of a “modern” data catalog? 20 Contains extensive information about your data Has business context Intuitive user interface Turnkey automation, administration, and integration Promotes governance and stewardship
  • 21. What is governance? Governance is often influenced by perspective
  • 22. Governance activities 22 • In a well-managed data environment data governance is well integrated with the business strategy • Modern data governance does this proactively and passively • Best in class solutions incorporate AI and ML to facilitate recommendation and automation engines Data strategy Data governance Data management / operations Strategy drives actions The data “W”s drive awareness (metadata) How well the data strategy is working (metric metadata) Identifies data that is important to strategy Business strategy Impact of data strategy on KPIs Explicit alignment to business goals & objectives Business alignment & impact metrics
  • 23. Data governance is the keystone for your data catalog 23 Outcome Business objectives Measures & metrics Processes & stages People Data Governance Catalog = + Reporting & compliance Analytics & insights Operational excellence
  • 24. Governance focuses on critical data 24 All available data 100% of data Data we use 40% of data Data we should govern 10% of data Data of high value 100-200 data elements CRITICAL DATA Data and metadata Selection of data at the system and source level (tables and fields) Information As required to develop a common language for important data Business process excellence To monitor the effectiveness of our processes design and execution Business goals & objectives Focus on critical data elements required to support value drivers and key initiatives The evolution has been towards greater reliance on rich metadata as it captures the data’s findability, usability and appropriateness within a particular context – implies well cataloged data
  • 27. Data has too many relationships to be handled in spreadsheets 27 Answers need to be a few clicks away – not a few pivot tables away!
  • 29. Catalogs for compliance… If it is not cataloged – it is not governed! Contained in catalog: 1. How and where used 2. Data is labelled to show where it is in the lifecycle 3. Data is linked to a data package 4. Data has security classification 5. Personal information “type” label flags this data as falling under privacy regulations Do we know enough about the data to know it is being managed correctly?
  • 30. Audit control model has transparency and accountability 30 Catalog contains the audit “view” of data: A. Task owner B. Task detail C. The data required D. The standard that guides the execution E. The control rule that enforces the standard F. The metric that measures compliance … which provides data level accountability Accountability is defined for each control point… Task Owner A Tasks B Data C Standard D Control Rule E Metric F
  • 32. Data exchange with external manufacturing MDM Finance Source Pian Make Quality Deliver SAP P01 JDE 7.3 PC4 PC4 PC4 MBox PIP Quality Prisym 360 Wm Mbox Mbox MDS SAP P02 Laser etch SQLDB at SAP forms DocuSphere ATP Bank Sterling NRP Excel LIDO/ PAGO Neptune SAP GRC BODS SOLMAN SMI purchase order Basic data and classification Data load script Data load script Basic data and classification A/P check details (end state) Inventory Open purchase orders Sales orders Open purchase orders Advanced shipping Notification SMI Delivery note/ Sale order, ASN Doc. Print Requests Label data Material, serial code, batch data Production order – Lot master data Manual load email Production order – Lot master data Picking, manufacturing logistics transactions Manual interface with Sap P02 Lot master data Agile client (JnJ network) Company For data Conversion Solman LPPF Print Print Supply chain lifecycle Company External manufacturer Exchange
  • 33. Data exchange with external manufacturing MDM Finance Source Plan Make Quality Deliver SAP P01 JDE 7.3 PC4 PC4 PC4 MBox PIP Quality Prisym 360 Wm Mbox Mbox MDS SAP P02 Laser etch SQLDB at SAP forms DocuSphere ATP Bank Sterling NRP Excel LIDO/ PAGO Neptune SAP GRC BODS SOLMAN SMI purchase order Basic data and classification Data load script Data load script Basic data and classification A/P check details (end state) Inventory Open purchase orders Sales orders Open purchase orders Advanced shipping Notification SMI Delivery note/ Sale order, ASN Doc. Print Requests Label data Material, serial code, batch data Production order – Lot master data Manual load email Production order – Lot master data Picking, manufacturing logistics transactions Manual interface with Sap P02 Lot master data Agile client (JnJ network) Company For data Conversion Solman LPPF Print Print Set up Order request Shipping notice Master data reference data Transaction data Billing notice shipping detail
  • 34. Cataloging captures issue PRIOR to production 34 Data quality rule: the shipping notice cannot contain any master data or reference data that was not contained in the set-up file Objects contain data Interface has 3 objects
  • 35. In closing… it is not just about the data! 35 How do you manage everything that surrounds the data? (metadata) Technical metadata Platforms; applications; tables; relationships Control metadata Rules; standards; ownership; RACI The data Semantic metadata / meaning & context Hierarchies; classification; allowed values…
  • 36. Let’s continue the conversation… Contact us Set up a 30-minute personalized demo Precisely.com/contact www.precisely.com Demos White Papers Case Studies
  • 38. CONTACT INFORMATION If you have further questions or comments: Fern Halper, TDWI Christopher Reed [email protected] @fhalper [email protected] Shaun Connolly [email protected] tdwi.org