SlideShare a Scribd company logo
The Semantic Knowledge Graph:
A compact, auto-generated model for real-time traversal
and ranking of any relationship within a domain
Trey Grainger
SVP of Engineering
Lucidworks
Khalifeh AlJadda
Lead Data Scientist
CareerBuilder
Mohammed Korayem
Data Scientist
CareerBuilder
Andries Smith
Software Engineer
CareerBuilder
Trey Grainger
SVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Georgia Tech
• BA, Computer Science, Business, & Philosophy – Furman University
• Information Retrieval & Web Search - Stanford University
Fun outside of CB:
• Co-author of Solr in Action, plus a handful of research papers
• Frequent conference speaker
• Founder of Celiaccess.com, the gluten-free search engine
• Lucene/Solr contributor
About Me
Terminology / Background
A Graph
DSAA 2016
Montreal
Quebec Canada
Semantic
Knowledge
Graph Paper
Trey
Grainger
Mohammed
Korayem
Andries
Smith
Khalifeh
AlJadda
in_country
Node / Vertex
Edge
“Solr is the popular, blazing-fast,
open source enterprise search
platform built on Apache Lucene™.”
Key Solr Features:
● Multilingual Keyword search
● Relevancy Ranking of results
● Faceting & Analytics
● Highlighting
● Spelling Correction
● Autocomplete/Type-ahead Prediction
● Sorting, Grouping, Deduplication
● Distributed, Fault-tolerant, Scalable
● Geospatial search
● Complex Function queries
● Recommendations (More Like This)
● … many more
*source: Solr in Action, chapter 2
Term Documents
a doc1 [2x]
brown doc3 [1x] , doc5 [1x]
cat doc4 [1x]
cow doc2 [1x] , doc5 [1x]
… ...
once doc1 [1x], doc5 [1x]
over doc2 [1x], doc3 [1x]
the doc2 [2x], doc3 [2x],
doc4[2x], doc5 [1x]
… …
Document Content Field
doc1 once upon a time, in a land far,
far away
doc2 the cow jumped over the moon.
doc3 the quick brown fox jumped over
the lazy dog.
doc4 the cat in the hat
doc5 The brown cow said “moo”
once.
… …
What you SEND to Lucene/Solr:
How the content is INDEXED into
Lucene/Solr (conceptually):
The inverted index
/solr/select/?q=apache solr
Term Documents
… …
apache doc1, doc3, doc4,
doc5
…
hadoop doc2, doc4, doc6
… …
solr doc1, doc3, doc4,
doc7, doc8
… …
doc5
doc7 doc8
doc1 doc3
doc4
solr
apache
apache solr
Matching queries to documents
Related Work
Knowledge
Graph
Related Work
• Primarily related to ontology Learning.
• Recently, large-scale knowledge bases that utilize
ontologies (FreeBase [4], DBpedia [5], and YAGO
[6, 7]) have been constructed using structured
sources such as Wikipedia infoboxes.
• Other approaches (DeepDive [8], Nell2RDF [9],
and PROSPERA [10]) crawl the web and use
machine learning and natural language processing
to build web-scale knowledge graphs.
Problem Description
Knowledge
Graph
Challenges we are solving
Because current knowledge bases / ontology learning systems typically
requires explicitly modeling nodes and edges into a graph ahead of time, this
unfortunately presents several limitations to the use of such a knowledge graph:
• Entities not modeled explicitly as nodes have no known relationships to any other
entities.
• Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore
such a graph is not ideal for representing nuanced meanings of an entity when appearing
within different contexts, as is common within natural language.
• Substantial meaning is encoded in the linguistic representation of the domain that is
lost when the underlying textual representation is not preserved: phrases, interaction of
concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing
those entities, variations in spelling and other representations of entities, the use of adjectives
to modify entities to represent more complex concepts, and aggregate frequencies of
occurrence for different representations of entities relative to other representations.
• It can be an arduous process to create robust ontologies, map a domain into a graph
representing those ontologies, and ensure the generated graph is compact, accurate,
comprehensive, and kept up to date.
Knowledge
Graph
Semantic Data Encoded into Free Text Content
e en eng engi engineer engineers
engineer engineersNode Type: Term
software
engineer
software
engineers
electrical
engineering
engineer
engineering software
…
…
…
Node Type:
Character Sequence
Node Type:
Term Sequence
Node Type:
Document
id: 1
text: looking for a software
engineerwith degree in
computer science or
electrical engineering
id: 2
text: apply to be a software
engineer and work with
other great software
engineers
id: 3
text: start a great careerin
electrical engineering
…
…
Model
id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field term postings list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
… … … …
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Uninverted IndexDocuments Knowledge
Graph
Knowledge
Graph
Set-theory View
Graph View
How the Graph Traversal Works
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
Data Structure View
Java
Scala Hibernate
docs
1, 2, 6
docs
3, 4
Oncology
doc 5
Knowledge
Graph
Graph Model
Structure:
Single-level Traversal / Scoring:
Multi-level Traversal / Scoring:
Knowledge
Graph
Multi-level Traversal
Data Structure View
Graph View
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill: Java
skill: Scala
skill:
Hibernate
skill:
Oncology
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
job_title:
Software
Engineer
job_title:
Data
Scientist
job_title:
Java
Developer
……
Inverted Index
Lookup
Doc Values Index
Lookup
Doc Values Index
Lookup
Inverted Index
Lookup
Java
Java
Developer
Hibernate
Scala
Software
Engineer
Data
Scientist
has_related_job_title
has_related_job_title
Knowledge
Graph
Scoring nodes in the Graph
Foreground vs. Background Analysis
Every term scored against it’s context. The more
commonly the term appears within it’s foreground
context versus its background context, the more
relevant it is to the specified foreground context.
countFG(x) - totalDocsFG * probBG(x)
z = --------------------------------------------------------
sqrt(totalDocsFG * probBG(x) * (1 - probBG(x)))
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness": 0.9765, "popularity":369 },
{ "value":”spark", "relatedness": 0.9634, "popularity":15653 },
{ "value":".net", "relatedness": 0.5417, "popularity":17683 },
{ "value":"bogus_word", "relatedness": 0.0, "popularity":0 },
{ "value":"teaching", "relatedness": -0.1510, "popularity":9923 },
{ "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] }
+
-
Foreground Query:
"Hadoop"
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any relationship
within a domain”. DSAA 2016.
Knowledge
Graph
Multi-level Graph Traversal with Scores
software engineer*
(materialized node)
Java
C#
.NET
.NET
Developer
Java
Developer
Hibernate
ScalaVB.NET
Software
Engineer
Data
Scientist
Skill
Nodes
has_related_skillStarting
Node
Skill
Nodes
has_related_skill Job Title
Nodes
has_related_job_title
0.90
0.88 0.93
0.93
0.34
0.74
0.91
0.89
0.74
0.89
0.780.72
0.48
0.93
0.76
0.83
0.80
0.64
0.61
0.780.55
Knowledge
Graph
Materialization of new nodes through shared documents
engineer
engineers
software engineer*
(materialized node)
engineer*
(materialized node)
Software
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
links_to
links_to
Implementation
Open Sourced!
Knowledge
Graph
Populating the Graph
Knowledge
Graph
Knowledge
Graph
Experiments
Knowledge
Graph
Data Cleansing
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness": 0.9765, "popularity":369 },
{ "value":”spark", "relatedness": 0.9634, "popularity":15653 },
{ "value":".net", "relatedness": 0.5417, "popularity":17683 },
{ "value":"bogus_word", "relatedness": 0.0, "popularity":0 },
{ "value":"teaching", "relatedness": -0.1510, "popularity":9923 },
{ "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] }
Foreground Query: "Hadoop"
Experiment: Data analyst
manually annotated 500
pairs of terms found together
in real query logs as
“relevant” or “not relevant”
Results: SKG removed 78%
of the terms while maintaining
a 95% accuracy at removing
the correct noisy pairs from
the input data.
Knowledge
Graph
Predictive Analytics
Knowledge
Graph
Search Expansion
Experiment: Take an initial query, and expand keyword
phrases to include the most related entities to that query
Example:
The Semantic Search Problem
User’s Query:
machine learning research and development Portland, OR software
engineer AND hadoop, java
Traditional Query Parsing:
(machine AND learning AND research AND development AND portland)
OR (software AND engineer AND hadoop AND java)
Semantic Query Parsing:
"machine learning" AND "research and development" AND "Portland, OR"
AND "software engineer" AND hadoop AND java
Semantically Expanded Query:
("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence")
AND ("research and development"^10 OR "r&d") AND
AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo})
AND ("software engineer"^10 OR "software developer")
AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
machine learning
Keywords:
Search Behavior,
Application Behavior, etc.
Job Title Classifier, Skills Extractor, Job Level Classifier, etc.
Semantic
Interpretation
keywords:((machine learning)^10 OR
{ AT_LEAST_2: ("data mining"^0.9, matlab^0.8,
"data scientist"^0.75, "artificial intelligence"^0.7,
"neural networks"^0.55)) }
{ BOOST_TO_TOP: ( job_title:(
"software engineer" OR "data manager" OR
"data scientist" OR "hadoop engineer")) }
Modified Query:
Related Occupations
machine learning:
{15-1031.00 .58
Computer Software Engineers, Applications
15-1011.00 .55
Computer and Information Scientists, Research
15-1032.00 .52
Computer Software Engineers, Systems Software }
machine learning:
{ software engineer .65,
data manager .3,
data scientist .25,
hadoop engineer .2, }
Common Job Titles
Query Expansion
Related Phrases
machine learning:
{ data mining .9,
matlab .8,
data scientist .75,
artificial intelligence .7,
neural networks .55 }
Known keyword
phrases
java developer
machine learning
registered nurse
FST
Knowledge
Graph in
+
The Semantic Knowledge Graph
Knowledge
Graph
Document Summarization
Experiment: Pass in raw text
(extracting phrases as needed), and
rank their similarity to the documents
using the SKG.
Additionally, can traverse the graph
to “related” entities/keyword phrases
NOT found in the original document
Applications: Content-based and
multi-modal recommendations
(no cold-start problem), data cleansing
prior to clustering or other ML methods,
semantic search / similarity scoring
Document Enrichment – Find / Score Relationships
Document Summarization – Rank / Clean Keywords
Knowledge
Graph
Future Work
• Semantic Search (more experiments)
• Search Engine Relevancy Algorithms
• Trending Topics
• Recommendation Systems
• Root Cause Analysis
• Abuse Detection
Knowledge
Graph
Conclusion
Applications:
The Semantic Knowledge Graph has numerous applications, including
automatically building ontologies, identification of trending topics over time,
predictive analytics on timeseries data, root-cause analysis surfacing concepts
related to failure scenarios from free text, data cleansing, document
summarization, semantic search interpretation and expansion of queries,
recommendation systems, and numerous other forms of anomaly detection.
Main contribution of this paper:
The introduction (and open sourcing) of the the Semantic Knowledge Graph, a
novel and compact new graph model
that can dynamically materialize and score the relationships between any arbitrary
combination of entities represented within a corpus of documents.
Knowledge
Graph
References
Contact Info
Trey Grainger
trey.grainger@lucidworks.com
@treygrainger
http://solrinaction.com
Other presentations:
http://www.treygrainger.com

More Related Content

What's hot (20)

PDF
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PDF
Slides: Knowledge Graphs vs. Property Graphs
DATAVERSITY
 
PDF
Introduction of Knowledge Graphs
Jeff Z. Pan
 
PDF
Introduction to Knowledge Graphs: Data Summit 2020
Enterprise Knowledge
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PPTX
Knowledge Graph Introduction
Sören Auer
 
PPTX
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
PDF
Vector Search for Data Scientists.pdf
ConnorShorten2
 
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS
 
PPTX
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
PDF
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
PDF
Querying the Wikidata Knowledge Graph
Ioan Toma
 
PPTX
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 
PPTX
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
PDF
An Introduction to SPARQL
Olaf Hartig
 
PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
PDF
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
PDF
Knowledge Graphs and Generative AI
Neo4j
 
PDF
Getting Started with Knowledge Graphs
Peter Haase
 
PDF
And then there were ... Large Language Models
Leon Dohmen
 
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slides: Knowledge Graphs vs. Property Graphs
DATAVERSITY
 
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Introduction to Knowledge Graphs: Data Summit 2020
Enterprise Knowledge
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Knowledge Graph Introduction
Sören Auer
 
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
Vector Search for Data Scientists.pdf
ConnorShorten2
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Querying the Wikidata Knowledge Graph
Ioan Toma
 
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
An Introduction to SPARQL
Olaf Hartig
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen
 
Knowledge Graphs and Generative AI
Neo4j
 
Getting Started with Knowledge Graphs
Peter Haase
 
And then there were ... Large Language Models
Leon Dohmen
 

Similar to The Semantic Knowledge Graph (20)

PPTX
The Relevance of the Apache Solr Semantic Knowledge Graph
Trey Grainger
 
PDF
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
PPTX
How to Build a Semantic Search System
Trey Grainger
 
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Trivadis
 
PDF
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital.AI
 
PDF
Multimodal Knowledge Assistance - Berkeley LLM AI Agents MOOC
VincentLui15
 
PPTX
Odsc 2019 entity_reputation_knowledge_graph
venkatramanJ4
 
DOCX
Babasaheb javca
Babasaheb Pawar
 
PPTX
From keyword-based search to language-agnostic semantic search
CareerBuilder.com
 
PDF
Mark Tortoricci - Talent42 2015
Talent42
 
PDF
Neo4j in Depth
Max De Marzi
 
PDF
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
DOC
RamaRaju_Profile
Ramaraju Dantuluri
 
PDF
Reflected intelligence evolving self-learning data systems
Trey Grainger
 
PDF
web-roadmap developer file information..
pandeyarush01
 
PDF
Data science presentation
MSDEVMTL
 
DOC
Gaurav agarwal
Gaurav Agarwal
 
DOC
urttyo_banerjee
Urttyo Banerjee
 
PPTX
Data-Oriented Programming: making data a first-class citizen
Manning Publications
 
The Relevance of the Apache Solr Semantic Knowledge Graph
Trey Grainger
 
SDSC18 and DSATL Meetup March 2018
CareerBuilder.com
 
How to Build a Semantic Search System
Trey Grainger
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Trivadis
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital.AI
 
Multimodal Knowledge Assistance - Berkeley LLM AI Agents MOOC
VincentLui15
 
Odsc 2019 entity_reputation_knowledge_graph
venkatramanJ4
 
Babasaheb javca
Babasaheb Pawar
 
From keyword-based search to language-agnostic semantic search
CareerBuilder.com
 
Mark Tortoricci - Talent42 2015
Talent42
 
Neo4j in Depth
Max De Marzi
 
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 
RamaRaju_Profile
Ramaraju Dantuluri
 
Reflected intelligence evolving self-learning data systems
Trey Grainger
 
web-roadmap developer file information..
pandeyarush01
 
Data science presentation
MSDEVMTL
 
Gaurav agarwal
Gaurav Agarwal
 
urttyo_banerjee
Urttyo Banerjee
 
Data-Oriented Programming: making data a first-class citizen
Manning Publications
 
Ad

More from Trey Grainger (20)

PDF
Balancing the Dimensions of User Intent
Trey Grainger
 
PDF
Reflected Intelligence: Real world AI in Digital Transformation
Trey Grainger
 
PDF
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger
 
PDF
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Trey Grainger
 
PDF
The Next Generation of AI-powered Search
Trey Grainger
 
PDF
Natural Language Search with Knowledge Graphs (Activate 2019)
Trey Grainger
 
PDF
AI, Search, and the Disruption of Knowledge Management
Trey Grainger
 
PDF
Measuring Relevance in the Negative Space
Trey Grainger
 
PDF
Natural Language Search with Knowledge Graphs (Haystack 2019)
Trey Grainger
 
PDF
The Future of Search and AI
Trey Grainger
 
PPTX
Searching for Meaning
Trey Grainger
 
PPTX
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
PPTX
Building Search & Recommendation Engines
Trey Grainger
 
PPTX
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
PPTX
Self-learned Relevancy with Apache Solr
Trey Grainger
 
PPTX
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
PPTX
South Big Data Hub: Text Data Analysis Panel
Trey Grainger
 
PPTX
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger
 
Balancing the Dimensions of User Intent
Trey Grainger
 
Reflected Intelligence: Real world AI in Digital Transformation
Trey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Trey Grainger
 
The Next Generation of AI-powered Search
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Trey Grainger
 
AI, Search, and the Disruption of Knowledge Management
Trey Grainger
 
Measuring Relevance in the Negative Space
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Trey Grainger
 
The Future of Search and AI
Trey Grainger
 
Searching for Meaning
Trey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
Building Search & Recommendation Engines
Trey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
Self-learned Relevancy with Apache Solr
Trey Grainger
 
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
South Big Data Hub: Text Data Analysis Panel
Trey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger
 
Ad

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

The Semantic Knowledge Graph

  • 1. The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain Trey Grainger SVP of Engineering Lucidworks Khalifeh AlJadda Lead Data Scientist CareerBuilder Mohammed Korayem Data Scientist CareerBuilder Andries Smith Software Engineer CareerBuilder
  • 2. Trey Grainger SVP of Engineering • Previously Director of Engineering @ CareerBuilder • MBA, Management of Technology – Georgia Tech • BA, Computer Science, Business, & Philosophy – Furman University • Information Retrieval & Web Search - Stanford University Fun outside of CB: • Co-author of Solr in Action, plus a handful of research papers • Frequent conference speaker • Founder of Celiaccess.com, the gluten-free search engine • Lucene/Solr contributor About Me
  • 4. A Graph DSAA 2016 Montreal Quebec Canada Semantic Knowledge Graph Paper Trey Grainger Mohammed Korayem Andries Smith Khalifeh AlJadda in_country Node / Vertex Edge
  • 5. “Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™.”
  • 6. Key Solr Features: ● Multilingual Keyword search ● Relevancy Ranking of results ● Faceting & Analytics ● Highlighting ● Spelling Correction ● Autocomplete/Type-ahead Prediction ● Sorting, Grouping, Deduplication ● Distributed, Fault-tolerant, Scalable ● Geospatial search ● Complex Function queries ● Recommendations (More Like This) ● … many more *source: Solr in Action, chapter 2
  • 7. Term Documents a doc1 [2x] brown doc3 [1x] , doc5 [1x] cat doc4 [1x] cow doc2 [1x] , doc5 [1x] … ... once doc1 [1x], doc5 [1x] over doc2 [1x], doc3 [1x] the doc2 [2x], doc3 [2x], doc4[2x], doc5 [1x] … … Document Content Field doc1 once upon a time, in a land far, far away doc2 the cow jumped over the moon. doc3 the quick brown fox jumped over the lazy dog. doc4 the cat in the hat doc5 The brown cow said “moo” once. … … What you SEND to Lucene/Solr: How the content is INDEXED into Lucene/Solr (conceptually): The inverted index
  • 8. /solr/select/?q=apache solr Term Documents … … apache doc1, doc3, doc4, doc5 … hadoop doc2, doc4, doc6 … … solr doc1, doc3, doc4, doc7, doc8 … … doc5 doc7 doc8 doc1 doc3 doc4 solr apache apache solr Matching queries to documents
  • 10. Knowledge Graph Related Work • Primarily related to ontology Learning. • Recently, large-scale knowledge bases that utilize ontologies (FreeBase [4], DBpedia [5], and YAGO [6, 7]) have been constructed using structured sources such as Wikipedia infoboxes. • Other approaches (DeepDive [8], Nell2RDF [9], and PROSPERA [10]) crawl the web and use machine learning and natural language processing to build web-scale knowledge graphs.
  • 12. Knowledge Graph Challenges we are solving Because current knowledge bases / ontology learning systems typically requires explicitly modeling nodes and edges into a graph ahead of time, this unfortunately presents several limitations to the use of such a knowledge graph: • Entities not modeled explicitly as nodes have no known relationships to any other entities. • Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore such a graph is not ideal for representing nuanced meanings of an entity when appearing within different contexts, as is common within natural language. • Substantial meaning is encoded in the linguistic representation of the domain that is lost when the underlying textual representation is not preserved: phrases, interaction of concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing those entities, variations in spelling and other representations of entities, the use of adjectives to modify entities to represent more complex concepts, and aggregate frequencies of occurrence for different representations of entities relative to other representations. • It can be an arduous process to create robust ontologies, map a domain into a graph representing those ontologies, and ensure the generated graph is compact, accurate, comprehensive, and kept up to date.
  • 13. Knowledge Graph Semantic Data Encoded into Free Text Content e en eng engi engineer engineers engineer engineersNode Type: Term software engineer software engineers electrical engineering engineer engineering software … … … Node Type: Character Sequence Node Type: Term Sequence Node Type: Document id: 1 text: looking for a software engineerwith degree in computer science or electrical engineering id: 2 text: apply to be a software engineer and work with other great software engineers id: 3 text: start a great careerin electrical engineering … …
  • 14. Model
  • 15. id: 1 job_title: Software Engineer desc: software engineer at a great company skills: .Net, C#, java id: 2 job_title: Registered Nurse desc: a registered nurse at hospital doing hard work skills: oncology, phlebotemy id: 3 job_title: Java Developer desc: a software engineer or a java engineer doing work skills: java, scala, hibernate field term postings list doc pos desc a 1 4 2 1 3 1, 5 at 1 3 2 4 company 1 6 doing 2 6 3 8 engineer 1 2 3 3, 7 great 1 5 hard 2 7 hospital 2 5 java 3 6 nurse 2 3 or 3 4 registered 2 2 software 1 1 3 2 work 2 10 3 9 job_title java developer 3 1 … … … … field doc term desc 1 a at company engineer great software 2 a at doing hard hospital nurse registered work 3 a doing engineer java or software work job_title 1 Software Engineer … … … Terms-Docs Inverted IndexDocs-Terms Uninverted IndexDocuments Knowledge Graph
  • 16. Knowledge Graph Set-theory View Graph View How the Graph Traversal Works skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology Data Structure View Java Scala Hibernate docs 1, 2, 6 docs 3, 4 Oncology doc 5
  • 17. Knowledge Graph Graph Model Structure: Single-level Traversal / Scoring: Multi-level Traversal / Scoring:
  • 18. Knowledge Graph Multi-level Traversal Data Structure View Graph View doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 job_title: Software Engineer job_title: Data Scientist job_title: Java Developer …… Inverted Index Lookup Doc Values Index Lookup Doc Values Index Lookup Inverted Index Lookup Java Java Developer Hibernate Scala Software Engineer Data Scientist has_related_job_title has_related_job_title
  • 19. Knowledge Graph Scoring nodes in the Graph Foreground vs. Background Analysis Every term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context. countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness": 0.9765, "popularity":369 }, { "value":”spark", "relatedness": 0.9634, "popularity":15653 }, { "value":".net", "relatedness": 0.5417, "popularity":17683 }, { "value":"bogus_word", "relatedness": 0.0, "popularity":0 }, { "value":"teaching", "relatedness": -0.1510, "popularity":9923 }, { "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] } + - Foreground Query: "Hadoop"
  • 20. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Multi-level Graph Traversal with Scores software engineer* (materialized node) Java C# .NET .NET Developer Java Developer Hibernate ScalaVB.NET Software Engineer Data Scientist Skill Nodes has_related_skillStarting Node Skill Nodes has_related_skill Job Title Nodes has_related_job_title 0.90 0.88 0.93 0.93 0.34 0.74 0.91 0.89 0.74 0.89 0.780.72 0.48 0.93 0.76 0.83 0.80 0.64 0.61 0.780.55
  • 21. Knowledge Graph Materialization of new nodes through shared documents engineer engineers software engineer* (materialized node) engineer* (materialized node) Software doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 links_to links_to
  • 28. Knowledge Graph Data Cleansing { "type":"keywords”, "values":[ { "value":"hive", "relatedness": 0.9765, "popularity":369 }, { "value":”spark", "relatedness": 0.9634, "popularity":15653 }, { "value":".net", "relatedness": 0.5417, "popularity":17683 }, { "value":"bogus_word", "relatedness": 0.0, "popularity":0 }, { "value":"teaching", "relatedness": -0.1510, "popularity":9923 }, { "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] } Foreground Query: "Hadoop" Experiment: Data analyst manually annotated 500 pairs of terms found together in real query logs as “relevant” or “not relevant” Results: SKG removed 78% of the terms while maintaining a 95% accuracy at removing the correct noisy pairs from the input data.
  • 30. Knowledge Graph Search Expansion Experiment: Take an initial query, and expand keyword phrases to include the most related entities to that query Example:
  • 31. The Semantic Search Problem User’s Query: machine learning research and development Portland, OR software engineer AND hadoop, java Traditional Query Parsing: (machine AND learning AND research AND development AND portland) OR (software AND engineer AND hadoop AND java) Semantic Query Parsing: "machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java Semantically Expanded Query: ("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence") AND ("research and development"^10 OR "r&d") AND AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo}) AND ("software engineer"^10 OR "software developer") AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
  • 32. machine learning Keywords: Search Behavior, Application Behavior, etc. Job Title Classifier, Skills Extractor, Job Level Classifier, etc. Semantic Interpretation keywords:((machine learning)^10 OR { AT_LEAST_2: ("data mining"^0.9, matlab^0.8, "data scientist"^0.75, "artificial intelligence"^0.7, "neural networks"^0.55)) } { BOOST_TO_TOP: ( job_title:( "software engineer" OR "data manager" OR "data scientist" OR "hadoop engineer")) } Modified Query: Related Occupations machine learning: {15-1031.00 .58 Computer Software Engineers, Applications 15-1011.00 .55 Computer and Information Scientists, Research 15-1032.00 .52 Computer Software Engineers, Systems Software } machine learning: { software engineer .65, data manager .3, data scientist .25, hadoop engineer .2, } Common Job Titles Query Expansion Related Phrases machine learning: { data mining .9, matlab .8, data scientist .75, artificial intelligence .7, neural networks .55 } Known keyword phrases java developer machine learning registered nurse FST Knowledge Graph in +
  • 34. Knowledge Graph Document Summarization Experiment: Pass in raw text (extracting phrases as needed), and rank their similarity to the documents using the SKG. Additionally, can traverse the graph to “related” entities/keyword phrases NOT found in the original document Applications: Content-based and multi-modal recommendations (no cold-start problem), data cleansing prior to clustering or other ML methods, semantic search / similarity scoring
  • 35. Document Enrichment – Find / Score Relationships
  • 36. Document Summarization – Rank / Clean Keywords
  • 37. Knowledge Graph Future Work • Semantic Search (more experiments) • Search Engine Relevancy Algorithms • Trending Topics • Recommendation Systems • Root Cause Analysis • Abuse Detection
  • 38. Knowledge Graph Conclusion Applications: The Semantic Knowledge Graph has numerous applications, including automatically building ontologies, identification of trending topics over time, predictive analytics on timeseries data, root-cause analysis surfacing concepts related to failure scenarios from free text, data cleansing, document summarization, semantic search interpretation and expansion of queries, recommendation systems, and numerous other forms of anomaly detection. Main contribution of this paper: The introduction (and open sourcing) of the the Semantic Knowledge Graph, a novel and compact new graph model that can dynamically materialize and score the relationships between any arbitrary combination of entities represented within a corpus of documents.
  • 40. Contact Info Trey Grainger [email protected] @treygrainger http://solrinaction.com Other presentations: http://www.treygrainger.com