SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What the #$* is a Business
Catalog and Why You Need It!
June 28, 2016
Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Disclaimer
This document may contain product features and technology directions that are under development,
may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from
inception to release through Apache, however, technical feasibility, market demand, user feedback and
the overarching Apache Software Foundation community development process can all effect timing
and final delivery.
This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Hortonworks to deliver these features in any
generally available product.
Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not
rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Problem
• Low confidence in Data - Fragmentation of metadata
across the enterprise
• Duplicate or MIA – Incorrect or missing classification
• Rigid Governance – Traditional MDM tools are not
agile, cannot keep up with rate of data change
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Solution
• Cross component lineage: Dynamically capture dataset
lineage
• Single source: Combine and centralize information about
your data
• Dynamic Access Control: Integration with Ranger
• Taxonomy (Business Catalog!): Common Business
Language. Hierarchically organized – No dupes !
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is the Atlas Business Catalog ?
 Organize data assets along business terms
• Authoritative: Hierarchical Taxonomy Creation
• Agile modeling: Model Conceptual, Logical,
Physical assets
• Definition and assignment of tags like PII
(Personally Identifiable Information)
 Comprehensive features for compliance
• Multiple user profiles including Data Steward
and Business Analysts
• Object auditing to track “Who did it?”
• Metadata Versioning to track ”what did they
do?”
Key Benefits:
Organize data assets
along business terms
Impact analysis,
Compliance, Acceptable
use
Faster Insight
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Taxonomies (catalog) enables:
• Search / Discovery – Business catalog of
conceptual, logical and physical assets
• Security --Dynamic metadata based Access
control
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We conduct open-ended user interviews so that we can learn more
about who are users are and what their needs are. This helps us
validate whether or not we’re solving the right problem.
Research: Focused on Hadoop
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
We test our prototype in InVision - a click through prototyping tool
that allows users to interact with static mockups.
Usability Testing
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Principle Roles & Activities
• Data Steward – Curator, responsible
for catalog veracity
• Data Scientist – Analyst, primary
consumer of Business Catalog
• Administrator – Role management only
• Data Engineer – Data ingress and
egress, semantic data quality
• 50% - 80%+
Time spend
looking for data
• Profit Center • Primary
User of Atlas
• Enables
Scientist
Goal: < 25% spent on
finding data
=
Empowering scientist
to spend their time
uncovering insights --
faster
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Concepts
Business Taxonomy (Catalog)
The practice and science of classification of things or
concepts, including the principles that underlie such
classification. The business organization model is
hierarchical, making it authoritative with no duplication.
Data Lineage (Provenance)
Data lineage is defined as a data life cycle that includes the
data's origins and where it moves over time. It describes what
happens to data as it goes through diverse processes. It helps
provide visibility into the analytics pipeline and simplifies
tracing errors back to their sources.
Tags: Traits vs. Labels vs. Business Taxonomy
Atlas has Tags that are authoritative and prevent duplication.
Tag can span different parts of the business taxonomy. A tag
PII can be used in HR as well Finance or Sales.
Benefits:
A view of data assets
organized by business
language
Impact analysis,
Compliance, Acceptable use
Common tag though
Hadoop components
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Walk Through
• User Setup Atlas via Ranger
• Create & Browse Taxonomy of Business Terms
• Create & Browse Tags
• Search for Assets
• Classify Assets with Business Terms
• Associate Assets with Tags
Summer GA
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Atlas Value
• Designed for Hadoop at platform, not application level
• High Confidence data in Hadoop for regulated verticals
• Compliance and business objectives aligned to data
organization
• Faster discovery for analysts – reduce time to value
• Agile and adaptable – ensures information is current by
native connectors
• Dynamic protection with Ranger in simple audited policies
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
In Flight:
Feature patches being review & committed
• Object Versioning UX – Current state of object active or
deleted
• Comment Tab – User can add comments for
collaboration
• DQ / Profile Notes Tab – Populate by 3rd parties or by
Steward via UI
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Additional Atlas Sessions
• Top 3 Big Data Governance Issues:
Tuesday 4:10PM @ Room 212
• Extend Governance in Hadoop with the Atlas
Ecosystem: integrations with partners Waterline,
Trifacta and Attivo:
Thursday 4:10PM @ Room 210A
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn More:
• Hortonworks links: http://hortonworks.com/solutions/security-and-
governance/
• Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger-
tp/tutorials/hortonworks/atlas-ranger-preview

More Related Content

PPTX
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
PDF
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
PPTX
Log Analytics Optimization
Hortonworks
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
PPTX
Apache deep learning 101
DataWorks Summit
 
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
IOT, Streaming Analytics and Machine Learning
DataWorks Summit/Hadoop Summit
 
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
Log Analytics Optimization
Hortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
Apache deep learning 101
DataWorks Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 

What's hot (20)

PDF
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PDF
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PPTX
Scalable Real-time analytics using Druid
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
PDF
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
 
PPTX
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
 
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
PPTX
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
PPTX
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
PPTX
Embeddable data transformation for real time streams
Joey Echeverria
 
PPTX
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
PDF
Getting involved with Open Source at the ASF
Hortonworks
 
PDF
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
PPTX
Sharing metadata across the data lake and streams
DataWorks Summit
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PPTX
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
Scalable Real-time analytics using Druid
DataWorks Summit/Hadoop Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
 
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
Embeddable data transformation for real time streams
Joey Echeverria
 
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
Getting involved with Open Source at the ASF
Hortonworks
 
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
 
Sharing metadata across the data lake and streams
DataWorks Summit
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop cluster slow?
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Ad

Viewers also liked (20)

PPTX
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
PPTX
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
PPTX
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
PPTX
7 Predictive Analytics, Spark , Streaming use cases
DataWorks Summit/Hadoop Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
PDF
Elephant grooming: quality with Hadoop
Roman Nikitchenko
 
PDF
Industrial Internet
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop do data warehousing rules apply
DataWorks Summit
 
PDF
Hadoop 2.0 - Solving the Data Quality Challenge
Inside Analysis
 
KEY
Real Time BI with Hadoop
Bradford Stephens
 
PPTX
Omid: A Transactional Framework for HBase
DataWorks Summit/Hadoop Summit
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Tin Ho
 
PPTX
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PPTX
The Path to Wellness through Big Data
DataWorks Summit/Hadoop Summit
 
PPTX
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
7 Predictive Analytics, Spark , Streaming use cases
DataWorks Summit/Hadoop Summit
 
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Elephant grooming: quality with Hadoop
Roman Nikitchenko
 
Industrial Internet
DataWorks Summit/Hadoop Summit
 
Hadoop do data warehousing rules apply
DataWorks Summit
 
Hadoop 2.0 - Solving the Data Quality Challenge
Inside Analysis
 
Real Time BI with Hadoop
Bradford Stephens
 
Omid: A Transactional Framework for HBase
DataWorks Summit/Hadoop Summit
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Tin Ho
 
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
The Path to Wellness through Big Data
DataWorks Summit/Hadoop Summit
 
Navigating the World of User Data Management and Data Discovery
DataWorks Summit/Hadoop Summit
 
Ad

Similar to What the #$* is a Business Catalog and why you need it (20)

PPTX
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
DataWorks Summit/Hadoop Summit
 
PPTX
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
PDF
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
PDF
Hortonworks - How Hadoop makes the successful Retailer.
Mats Johansson
 
KEY
Paris HUG - Agile Analytics Applications on Hadoop
Hortonworks
 
KEY
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Hortonworks
 
PDF
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
PPT
LA HUG - Agile Analytics Applications on HDP
Hortonworks
 
PPT
Orange County HUG - Agile Data on HDP
Hortonworks
 
KEY
Hortonworks: Agile Analytics Applications
russell_jurney
 
KEY
Agile analytics applications on hadoop
Hortonworks
 
PPTX
HDP Next: Governance
DataWorks Summit
 
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
PPTX
Classification based security in Hadoop
Madhan Neethiraj
 
KEY
UK - Agile Data Applications on Hadoop
Hortonworks
 
PDF
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
PPTX
Hadoop as data refinery
Steve Loughran
 
PPTX
Hadoop as Data Refinery - Steve Loughran
JAX London
 
PDF
Data Governance - Atlas 7.12.2015
Hortonworks
 
PDF
Hortonworks and HP Vertica Webinar
Hortonworks
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
DataWorks Summit/Hadoop Summit
 
Enterprise Data Classification and Provenance
DataWorks Summit/Hadoop Summit
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
Hortonworks - How Hadoop makes the successful Retailer.
Mats Johansson
 
Paris HUG - Agile Analytics Applications on Hadoop
Hortonworks
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Hortonworks
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Hortonworks
 
LA HUG - Agile Analytics Applications on HDP
Hortonworks
 
Orange County HUG - Agile Data on HDP
Hortonworks
 
Hortonworks: Agile Analytics Applications
russell_jurney
 
Agile analytics applications on hadoop
Hortonworks
 
HDP Next: Governance
DataWorks Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Madhan Neethiraj
 
UK - Agile Data Applications on Hadoop
Hortonworks
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks
 
Hadoop as data refinery
Steve Loughran
 
Hadoop as Data Refinery - Steve Loughran
JAX London
 
Data Governance - Atlas 7.12.2015
Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks
 

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Doc9.....................................
SofiaCollazos
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Software Development Methodologies in 2025
KodekX
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
The Future of Artificial Intelligence (AI)
Mukul
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 

What the #$* is a Business Catalog and why you need it

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What the #$* is a Business Catalog and Why You Need It! June 28, 2016 Apache Atlas
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Problem • Low confidence in Data - Fragmentation of metadata across the enterprise • Duplicate or MIA – Incorrect or missing classification • Rigid Governance – Traditional MDM tools are not agile, cannot keep up with rate of data change
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Solution • Cross component lineage: Dynamically capture dataset lineage • Single source: Combine and centralize information about your data • Dynamic Access Control: Integration with Ranger • Taxonomy (Business Catalog!): Common Business Language. Hierarchically organized – No dupes !
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is the Atlas Business Catalog ?  Organize data assets along business terms • Authoritative: Hierarchical Taxonomy Creation • Agile modeling: Model Conceptual, Logical, Physical assets • Definition and assignment of tags like PII (Personally Identifiable Information)  Comprehensive features for compliance • Multiple user profiles including Data Steward and Business Analysts • Object auditing to track “Who did it?” • Metadata Versioning to track ”what did they do?” Key Benefits: Organize data assets along business terms Impact analysis, Compliance, Acceptable use Faster Insight
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Taxonomies (catalog) enables: • Search / Discovery – Business catalog of conceptual, logical and physical assets • Security --Dynamic metadata based Access control
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We conduct open-ended user interviews so that we can learn more about who are users are and what their needs are. This helps us validate whether or not we’re solving the right problem. Research: Focused on Hadoop
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved We test our prototype in InVision - a click through prototyping tool that allows users to interact with static mockups. Usability Testing
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Principle Roles & Activities • Data Steward – Curator, responsible for catalog veracity • Data Scientist – Analyst, primary consumer of Business Catalog • Administrator – Role management only • Data Engineer – Data ingress and egress, semantic data quality • 50% - 80%+ Time spend looking for data • Profit Center • Primary User of Atlas • Enables Scientist Goal: < 25% spent on finding data = Empowering scientist to spend their time uncovering insights -- faster
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Concepts Business Taxonomy (Catalog) The practice and science of classification of things or concepts, including the principles that underlie such classification. The business organization model is hierarchical, making it authoritative with no duplication. Data Lineage (Provenance) Data lineage is defined as a data life cycle that includes the data's origins and where it moves over time. It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources. Tags: Traits vs. Labels vs. Business Taxonomy Atlas has Tags that are authoritative and prevent duplication. Tag can span different parts of the business taxonomy. A tag PII can be used in HR as well Finance or Sales. Benefits: A view of data assets organized by business language Impact analysis, Compliance, Acceptable use Common tag though Hadoop components
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Walk Through • User Setup Atlas via Ranger • Create & Browse Taxonomy of Business Terms • Create & Browse Tags • Search for Assets • Classify Assets with Business Terms • Associate Assets with Tags Summer GA
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Atlas Value • Designed for Hadoop at platform, not application level • High Confidence data in Hadoop for regulated verticals • Compliance and business objectives aligned to data organization • Faster discovery for analysts – reduce time to value • Agile and adaptable – ensures information is current by native connectors • Dynamic protection with Ranger in simple audited policies
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In Flight: Feature patches being review & committed • Object Versioning UX – Current state of object active or deleted • Comment Tab – User can add comments for collaboration • DQ / Profile Notes Tab – Populate by 3rd parties or by Steward via UI
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Additional Atlas Sessions • Top 3 Big Data Governance Issues: Tuesday 4:10PM @ Room 212 • Extend Governance in Hadoop with the Atlas Ecosystem: integrations with partners Waterline, Trifacta and Attivo: Thursday 4:10PM @ Room 210A
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn More: • Hortonworks links: http://hortonworks.com/solutions/security-and- governance/ • Tutorials: https://github.com/hortonworks/tutorials/tree/atlas-ranger- tp/tutorials/hortonworks/atlas-ranger-preview

Editor's Notes

  • #8: - Learn about who are users are and what are their needs to validate if we are solving the right problem Open ended half hour discussions about processes, challenges and current tools We record the interviews so that we can focus on the conversation and analyis them afterward
  • #9: - Test our prototype in Invision - A click through prototyping tool - Walk users through scenarios and watch how they respond - Remind our participants that we aren’t testing them, we’re testing the design and encourage thinking aloud
  • #10: Is the product was well understood? Is the product something they would use? Where is the value?
  • #12: Is the product was well understood? Is the product something they would use? Where is the value?