Journey to Self-Discovery: Synthesizing 30
Years of Data Pipelines with Knowledge Graphs
Mayank Gupta
Senior VP of Technology, LPL Financial
My History with Knowledge Graphs
• Graph Adjacent for data distribution at Morgan Stanley
• UBS:
• Data Virtualization Layer – using Neo4J
• Roles based access control – using Neo4J
• LPL:
• Data Management: Account -> Client -> Household – Using Neo4J
• Financial concepts and help content driven Knowledge Graph to improve the efficacy
of home office and advisor search results – using GraphAware’s Hume and Neo4J
• Using graphs to describe complex business organizations and relationships – drive
improved engagement with our clients
Problem Statement
We are experiencing a sustained increase in
transaction volumes and our business - practices,
advisors, accounts, assets under management - is
growing rapidly
This is driving a need to increase throughput,
resiliency and scale in our data pipelines
We also want to improve the value, quality and
experiences that data enables to our user cohorts
while operating more efficiently
High Level Anatomy of Data Pipelines
Sources
of Signal
Integration
Pipes
Raw Zone Map to Internal
Logical Model
Mastering into
Systems of Record
Readying for
Distribution
Consumers
Data Pipelines – Knowledge Graph
Knowledge Types
• Signal Sources
• Physical Data description
• Physical Plant Description
• Logical Models / Concepts
• Processing Details
Source or Approach to get Knowledge
• Contracts, Integration Objects,
Configurations, Job Definitions
• File Layouts, Physical DB Schema,
Message Models/Schema, Raw Zone
scans
• ITIL CMDBs, Asset inventories
• Enterprise Vocabularies, Metadata
Repositories, Public Concept Sources
• Code Scanners, ELT/ETL
Configurations, Rules Bases, Manual
Benefits for the
Data Function
• Enables decision making for our journey to
modernize
• Allows us to discover duplications and
inconsistencies and optimize
• Allows us to better engage data providers and
users – driving to well aligned outcomes – and
making them a part of the data pipelines
• A boon to better operations, enabling problem
avoidance and faster resolution
• Enables faster time to market and fosters the
spirit of continuous improvement
Hypothesized Benefits
for the Enterprise
Beyond the basics of – better data, better decisions,
better business outcomes
• This knowledge graph – if populated end to end –
moves us closer to providing intent based
engagement with data.
• The data consumer can move to a more declarative
style of engaging with data vs. the imperative
approaches that are available today
• A comprehensive view of the information
landscape enables better risk preparation and
investment planning
• Data moves to information and ushers in a
knowledge driven enterprise

Our Journey to Self-Discovery: Synthesizing 30 Years of Data Pipelines with Knowledge Graphs

  • 1.
    Journey to Self-Discovery:Synthesizing 30 Years of Data Pipelines with Knowledge Graphs Mayank Gupta Senior VP of Technology, LPL Financial
  • 2.
    My History withKnowledge Graphs • Graph Adjacent for data distribution at Morgan Stanley • UBS: • Data Virtualization Layer – using Neo4J • Roles based access control – using Neo4J • LPL: • Data Management: Account -> Client -> Household – Using Neo4J • Financial concepts and help content driven Knowledge Graph to improve the efficacy of home office and advisor search results – using GraphAware’s Hume and Neo4J • Using graphs to describe complex business organizations and relationships – drive improved engagement with our clients
  • 3.
    Problem Statement We areexperiencing a sustained increase in transaction volumes and our business - practices, advisors, accounts, assets under management - is growing rapidly This is driving a need to increase throughput, resiliency and scale in our data pipelines We also want to improve the value, quality and experiences that data enables to our user cohorts while operating more efficiently
  • 4.
    High Level Anatomyof Data Pipelines Sources of Signal Integration Pipes Raw Zone Map to Internal Logical Model Mastering into Systems of Record Readying for Distribution Consumers
  • 5.
    Data Pipelines –Knowledge Graph Knowledge Types • Signal Sources • Physical Data description • Physical Plant Description • Logical Models / Concepts • Processing Details Source or Approach to get Knowledge • Contracts, Integration Objects, Configurations, Job Definitions • File Layouts, Physical DB Schema, Message Models/Schema, Raw Zone scans • ITIL CMDBs, Asset inventories • Enterprise Vocabularies, Metadata Repositories, Public Concept Sources • Code Scanners, ELT/ETL Configurations, Rules Bases, Manual
  • 6.
    Benefits for the DataFunction • Enables decision making for our journey to modernize • Allows us to discover duplications and inconsistencies and optimize • Allows us to better engage data providers and users – driving to well aligned outcomes – and making them a part of the data pipelines • A boon to better operations, enabling problem avoidance and faster resolution • Enables faster time to market and fosters the spirit of continuous improvement
  • 7.
    Hypothesized Benefits for theEnterprise Beyond the basics of – better data, better decisions, better business outcomes • This knowledge graph – if populated end to end – moves us closer to providing intent based engagement with data. • The data consumer can move to a more declarative style of engaging with data vs. the imperative approaches that are available today • A comprehensive view of the information landscape enables better risk preparation and investment planning • Data moves to information and ushers in a knowledge driven enterprise