SlideShare a Scribd company logo
Graph of Enterprise Metadata–
Powered by Neo4J & Spark
Vladimir Bacvanski
Deepak Chandramouli
https://neo4j.com/nodes-2020/
AGENDA
 Prelude
 Introduction
 Architecture & Flow
 Scaling for enterprise
 Future
 Questions
About us
Vladimir Bacvanski
vbacvanski@paypal.com
Twitter: @OnSoftware
• Principal Architect, Strategic Architecture at PayPal
• In previous life: CTO of a development and
consulting firm
• PhD in Computer Science from RWTH Aachen,
Germany
• O’Reilly author: Courses on Big Data, Kafka
Deepak Chandramouli
dmohanakumarchan@paypal.com
LinkedIn: @deepakmc
• MT2 Software Engineer, Data Platform Services at
PayPal
• Data Enthusiast
• Tech lead
• Gimel (Big Data Framework for Apache Spark)
• Unified Data Catalog – PayPal’s Enterprise Data
Catalog
Prelude
2019 | Recommendation System – Unified Data Catalog
Building Recommendations for a Data Catalog
A
B
S
T
R
A
C
T
I
O
N
L
A
Y
E
R
Yeah, let’s
expand on
this !
Graph of
metadata
has much
more to
Offer !!
Graph of Enterprise Metadata
Introduction
The Enterprise
Landscape
8
On-Premise
Adjacencies
Data Technologies
Cloud Services
Multi-DC / Hybrid Cloud
Org Structures CollaborationPeople
Identity & Access Security Zones Data Catalog
Security
Organization
Artifacts Apps & Processes
Logs
Streams
ETL Apps
Connecting the Dots…
Effective
Data
Governance
Privacy
Regulation
Complianc
e
……
The whole is
greater ! Efficiency.
Connected
Components
Security
Organizatio
n
Data
Business
MetadataTechnical
Metadata
Artifacts
Code
Operational
Metadata
Internal
Social
Metadata
Data
Classification
(PII/PCI/PHI)
Connected components = Graph of Enterprise Metadata
So What  Top Down View !
©2016 PayPal Inc. Confidential and proprietary.
Demo & Sandbox Code
Code Base | Playground
Github
https://github.com/Dee-Pac/GEM
Gitter
https://gitter.im/graph_of_enterprise_metadata/GEM-Ask-Us-
Anything
Graph Build
Architecture & Data Flow
Metadata Data Sources...
• Data Classification
• PII / PCI / PHI
• Operational Metadata
• Jobs & Apps
• Databases & App – Query Logs
• Org Structure
• People, Geo , Hierarchies
• Technical Data Catalog
• Data Stores
• Databases / Containers
• Tables / Objects
• Columns / Fields
• Business Metadata / Glossary / Annotation
• IAM
• Access controls
• Owners, Users
• Social Metadata (Internal)
• Slack Chats, Threads, Annotations
• Artifacts
• Code base – GitHub/SQL
• Wiki – Documents, Mentions
• JIRA – Issues, Mentions
High Level – Data Flow
Merge Graph
CRUD &
Insights
App Data Sources
Data Sourcing
Transform
Application
s
Access Simplification
Distributed Processing for scale
(Future) Near Real Time - CDC
Graph Access
API Push
neo4j + Apache Spark + Gimel + GraphQL
Multiple Sources Systems
• Abstracts Graph Complexity
• Supports - Multiple Front-end
clients
• Accelerates UI Integration
• Reference | neo4j & GraphQL
• Spring-Data-neo4j [Scale]
Unified Access PatternData Processing
• Distributed Processing -
SCALE
• Spark-ML + Graphx. -
COMPLEXITY
• GIMEL + Spark - Data
access simplified
• Kafka Streams – Near Real
Time CDC in the future
Connected Components
• Easy Bootstrap
• Cypher & APOCs
• Robust libraries - Spark,
GraphQL
• Rich Graph Algorithms
<<<<<< Data Processing @ Scale
>>>>>
© 2020 PayPal Inc. Confidential and proprietary.
Technical Metadata
>1000s of Data Stores
>100K Containers
>7 Million Datasets
>50 Million Fields
>10K PII/PCI Class elements
Operational Metadata
>100K Data Apps Per Day
>25 Data Tech Stacks
>API, Streams & Batch Apps
Artifacts & Social
Metadata,
Artifacts
>10K Scripts & App code
>1000s of Wiki, JIRA Annotations
Data Social
~ 12 Active Analytics Channel in Slack
~ Evolving Teams conversations for data
Security & Organization
>1000’s of IAM roles
>20K Internal Users & Role Owners
Metadata Growth
Gimel + spark + neo4j
©2016 PayPal Inc. Confidential and proprietary.
GIMEL Data API
NEO4J Connector
Gimel + spark + neo4j (Continued..)
Unified Data API
Unified Connector Config
Unified Data API
Gimel + spark + neo4j (Continued..)
Nodes &
Relationship
Exposing Graph to Clients
GraphQL Query
GraphQL Query - Response
type column {
column:String!
storage_system_name:String!
dataset:String!
}
type dataset {
dataset:String!
storage_system_name:String!
column: [column]
@relation(name:"DATASET_HAS_COLUMN",
direction:OUT)
}
type Enterprise {
name:String!
}
Define GraphQL Schema
Cypher -> GraphQL
Cypher
Conclusion & Next Steps
Key Take-aways
• The whole is greater than Sum !
• Data Awareness: Data can be anywhere across the ecosystem : but we should know its shape & form of existence.
• Graphs are everywhere: Full potential can be realized by connecting the dots - across board.
• Enriching metadata: It is a collective effort; silos rarely achieve effectiveness.
• Graph’s Effectiveness: expose the data in a consumable way for – all tech stacks alike !
What’s next?
• Data Classification
• PII / PCI / PHI
• Org Structure
• People, Geo , Hierarchies
• Technical Data Catalog
• Data Stores
• Databases / Containers
• Tables / Objects
• Columns / Fields
• Business Metadata / Glossary / Annotation
• IAM
• Access controls
• Owners, Users
• Artifacts
• Code base – GitHub/SQL
• Wiki – Documents, Mentions
• JIRA – Issues, Mentions
• Operational Metadata
• Jobs & Apps
• Databases & App – Query Logs
• Lineage
• Social Metadata (Internal)
• Slack Chats, Threads,
Annotations
Connecting Further…
Scale – Millions of Nodes X Edges
Access - Simplification
Graph – Data Security &
Multitenancy
Insights driven by –
Graph / ML Algorithms
Foundational Capabilities…
Questions?
Code base – GITHUB_FOR_GEM
Let’s collaborate - GITTER_FOR_GEM
Thank You!

More Related Content

What's hot (20)

PPTX
Building a Self-Service Big Data Pipeline
DataWorks Summit
 
PDF
Building the Artificially Intelligent Enterprise
Databricks
 
PDF
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
PDF
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
PDF
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 
PPTX
Democratizing data science Using spark, hive and druid
DataWorks Summit
 
PDF
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
Databricks
 
PDF
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
PDF
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
PDF
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Databricks
 
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
PDF
Getting Started with Databricks SQL Analytics
Databricks
 
PDF
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Databricks
 
PDF
Intro to Delta Lake
Databricks
 
PDF
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
Managing R&D Data on Parallel Compute Infrastructure
Databricks
 
PPTX
Spark - Migration Story
Roman Chukh
 
PDF
Intelligent Integration OOW2017 - Jeff Pollock
Jeffrey T. Pollock
 
PDF
Introducing Databricks Delta
Databricks
 
Building a Self-Service Big Data Pipeline
DataWorks Summit
 
Building the Artificially Intelligent Enterprise
Databricks
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 
Democratizing data science Using spark, hive and druid
DataWorks Summit
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
Databricks
 
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Databricks
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Databricks
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Getting Started with Databricks SQL Analytics
Databricks
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Databricks
 
Intro to Delta Lake
Databricks
 
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Managing R&D Data on Parallel Compute Infrastructure
Databricks
 
Spark - Migration Story
Roman Chukh
 
Intelligent Integration OOW2017 - Jeff Pollock
Jeffrey T. Pollock
 
Introducing Databricks Delta
Databricks
 

Similar to Nodes2020 | Graph of enterprise_metadata | NEO4J Conference (20)

PDF
Roadmap for Enterprise Graph Strategy
Neo4j
 
PDF
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
PDF
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j
 
PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
PPTX
Graph tour keynote 2019
Neo4j
 
PDF
GraphTalk Barcelona - Keynote
Neo4j
 
PDF
Neo4j GraphTour Toronto Opening Keynote
Neo4j
 
PDF
State of the State: What’s Happening in the Database Market?
Neo4j
 
PPTX
State of Florida Neo4j Graph Briefing - Cyber IAM
Neo4j
 
PDF
Graphs for Enterprise Architects
Neo4j
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PPTX
GraphTour - Keynote
Neo4j
 
PDF
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Neo4j
 
PDF
Autodesk Netfabb Ultimate 2025 free crack
blouch110kp
 
PDF
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch136kp
 
PDF
Auslogics Video Grabber Free 1.0.0.12 Free
blouch134kp
 
PDF
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 
PDF
FL Studio Crack FREE Download link 2025 NEW Version
mahnoorwaqar444
 
PDF
Office(R)Tool Download crack (Latest 2025)
blouch120kp
 
PPTX
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
Neo4j
 
Roadmap for Enterprise Graph Strategy
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Neo4j Graph Data Platform: Making Your Data More Intelligent
Neo4j
 
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Graph tour keynote 2019
Neo4j
 
GraphTalk Barcelona - Keynote
Neo4j
 
Neo4j GraphTour Toronto Opening Keynote
Neo4j
 
State of the State: What’s Happening in the Database Market?
Neo4j
 
State of Florida Neo4j Graph Briefing - Cyber IAM
Neo4j
 
Graphs for Enterprise Architects
Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
GraphTour - Keynote
Neo4j
 
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Neo4j
 
Autodesk Netfabb Ultimate 2025 free crack
blouch110kp
 
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch136kp
 
Auslogics Video Grabber Free 1.0.0.12 Free
blouch134kp
 
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 
FL Studio Crack FREE Download link 2025 NEW Version
mahnoorwaqar444
 
Office(R)Tool Download crack (Latest 2025)
blouch120kp
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
Neo4j
 
Ad

Recently uploaded (20)

PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPTX
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Basics of Auto Computer Aided Drafting .pptx
Krunal Thanki
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
cybersecurityandthe importance of the that
JayachanduHNJc
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Inventory management chapter in automation and robotics.
atisht0104
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Basics of Auto Computer Aided Drafting .pptx
Krunal Thanki
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
cybersecurityandthe importance of the that
JayachanduHNJc
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
Information Retrieval and Extraction - Module 7
premSankar19
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
Ad

Nodes2020 | Graph of enterprise_metadata | NEO4J Conference

  • 1. Graph of Enterprise Metadata– Powered by Neo4J & Spark Vladimir Bacvanski Deepak Chandramouli
  • 3. AGENDA  Prelude  Introduction  Architecture & Flow  Scaling for enterprise  Future  Questions
  • 4. About us Vladimir Bacvanski [email protected] Twitter: @OnSoftware • Principal Architect, Strategic Architecture at PayPal • In previous life: CTO of a development and consulting firm • PhD in Computer Science from RWTH Aachen, Germany • O’Reilly author: Courses on Big Data, Kafka Deepak Chandramouli [email protected] LinkedIn: @deepakmc • MT2 Software Engineer, Data Platform Services at PayPal • Data Enthusiast • Tech lead • Gimel (Big Data Framework for Apache Spark) • Unified Data Catalog – PayPal’s Enterprise Data Catalog
  • 6. 2019 | Recommendation System – Unified Data Catalog Building Recommendations for a Data Catalog A B S T R A C T I O N L A Y E R Yeah, let’s expand on this ! Graph of metadata has much more to Offer !!
  • 7. Graph of Enterprise Metadata Introduction
  • 8. The Enterprise Landscape 8 On-Premise Adjacencies Data Technologies Cloud Services Multi-DC / Hybrid Cloud Org Structures CollaborationPeople Identity & Access Security Zones Data Catalog Security Organization Artifacts Apps & Processes Logs Streams ETL Apps
  • 9. Connecting the Dots… Effective Data Governance Privacy Regulation Complianc e …… The whole is greater ! Efficiency. Connected Components Security Organizatio n Data Business MetadataTechnical Metadata Artifacts Code Operational Metadata Internal Social Metadata Data Classification (PII/PCI/PHI)
  • 10. Connected components = Graph of Enterprise Metadata
  • 11. So What  Top Down View ! ©2016 PayPal Inc. Confidential and proprietary.
  • 13. Code Base | Playground Github https://github.com/Dee-Pac/GEM Gitter https://gitter.im/graph_of_enterprise_metadata/GEM-Ask-Us- Anything
  • 15. Metadata Data Sources... • Data Classification • PII / PCI / PHI • Operational Metadata • Jobs & Apps • Databases & App – Query Logs • Org Structure • People, Geo , Hierarchies • Technical Data Catalog • Data Stores • Databases / Containers • Tables / Objects • Columns / Fields • Business Metadata / Glossary / Annotation • IAM • Access controls • Owners, Users • Social Metadata (Internal) • Slack Chats, Threads, Annotations • Artifacts • Code base – GitHub/SQL • Wiki – Documents, Mentions • JIRA – Issues, Mentions
  • 16. High Level – Data Flow Merge Graph CRUD & Insights App Data Sources Data Sourcing Transform Application s Access Simplification Distributed Processing for scale (Future) Near Real Time - CDC Graph Access API Push
  • 17. neo4j + Apache Spark + Gimel + GraphQL Multiple Sources Systems • Abstracts Graph Complexity • Supports - Multiple Front-end clients • Accelerates UI Integration • Reference | neo4j & GraphQL • Spring-Data-neo4j [Scale] Unified Access PatternData Processing • Distributed Processing - SCALE • Spark-ML + Graphx. - COMPLEXITY • GIMEL + Spark - Data access simplified • Kafka Streams – Near Real Time CDC in the future Connected Components • Easy Bootstrap • Cypher & APOCs • Robust libraries - Spark, GraphQL • Rich Graph Algorithms
  • 18. <<<<<< Data Processing @ Scale >>>>>
  • 19. © 2020 PayPal Inc. Confidential and proprietary. Technical Metadata >1000s of Data Stores >100K Containers >7 Million Datasets >50 Million Fields >10K PII/PCI Class elements Operational Metadata >100K Data Apps Per Day >25 Data Tech Stacks >API, Streams & Batch Apps Artifacts & Social Metadata, Artifacts >10K Scripts & App code >1000s of Wiki, JIRA Annotations Data Social ~ 12 Active Analytics Channel in Slack ~ Evolving Teams conversations for data Security & Organization >1000’s of IAM roles >20K Internal Users & Role Owners Metadata Growth
  • 20. Gimel + spark + neo4j ©2016 PayPal Inc. Confidential and proprietary. GIMEL Data API NEO4J Connector
  • 21. Gimel + spark + neo4j (Continued..) Unified Data API Unified Connector Config
  • 22. Unified Data API Gimel + spark + neo4j (Continued..) Nodes & Relationship
  • 23. Exposing Graph to Clients
  • 24. GraphQL Query GraphQL Query - Response type column { column:String! storage_system_name:String! dataset:String! } type dataset { dataset:String! storage_system_name:String! column: [column] @relation(name:"DATASET_HAS_COLUMN", direction:OUT) } type Enterprise { name:String! } Define GraphQL Schema Cypher -> GraphQL Cypher
  • 26. Key Take-aways • The whole is greater than Sum ! • Data Awareness: Data can be anywhere across the ecosystem : but we should know its shape & form of existence. • Graphs are everywhere: Full potential can be realized by connecting the dots - across board. • Enriching metadata: It is a collective effort; silos rarely achieve effectiveness. • Graph’s Effectiveness: expose the data in a consumable way for – all tech stacks alike !
  • 27. What’s next? • Data Classification • PII / PCI / PHI • Org Structure • People, Geo , Hierarchies • Technical Data Catalog • Data Stores • Databases / Containers • Tables / Objects • Columns / Fields • Business Metadata / Glossary / Annotation • IAM • Access controls • Owners, Users • Artifacts • Code base – GitHub/SQL • Wiki – Documents, Mentions • JIRA – Issues, Mentions • Operational Metadata • Jobs & Apps • Databases & App – Query Logs • Lineage • Social Metadata (Internal) • Slack Chats, Threads, Annotations Connecting Further… Scale – Millions of Nodes X Edges Access - Simplification Graph – Data Security & Multitenancy Insights driven by – Graph / ML Algorithms Foundational Capabilities…
  • 28. Questions? Code base – GITHUB_FOR_GEM Let’s collaborate - GITTER_FOR_GEM

Editor's Notes