SlideShare a Scribd company logo
Applying Noisy Knowledge
Graphs to Real Problems
Mayank Kejriwal
USC Information Sciences Institute
May 2019
2
Acknowledgements
Real Problems
Applying Noisy Knowledge Graphs to Real Problems
Web has lowered the barrier to entry!
5
6
7
Pump and dump schemes proliferate online
8
Quechua
Fula
Odiya
Maithili Bhojhpuri
Uighyur
Mayan languages
Aboriginal
languages
Tasmanian
languages
Fang
Umbundu
Setswana
Afro-Asiatic
Khoisan Fon
Yoruba
Peulh
Adangame
Erzya Bashkir
Khakas
Udmurt
Ingush
Tagalog
Hilgaynon
Bikol
Waray
Native American
dialects
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
12
13
What do these problems have in common
(besides being really hard)?
1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
1. Very messy, raw data, with both
redundancy and irrelevance
2. Users are also producers i.e. we cannot
just ‘build’ the system and hand it off
3. Domains are largely non-analytic (e.g.,
we don’t have a model/equations for
human trafficking)
Applying Noisy Knowledge Graphs to Real Problems
19
Space of design decisions
Raw data
Search+GUI
?
??
Representation +
Infrastructure
? ? ?
Space of design decisions
Search+GUI
Producer Consumer
21
22
23
Space of design decisions
Raw data
Search+GUI
Knowledge
Graph
??
Representation +
Infrastructure
? ? ?
24
Domain-specific
Insight Graphs
(DIG)
25
Space of design decisions: example from human trafficking
Raw data
Search+GUI
Knowledge
Graph (KG)
Domain
discovery
Define KG
schema
Representation +
Infrastructure
Flexible
inputs
Query
reformulation
KG
Construction
Applying Noisy Knowledge Graphs to Real Problems
The Knowledge Graph is noisy…how do
we cope?
Answer: Strategize around each triangle
Search+GUI
ConsumerProducer
Example from DIG: consumer triangle
Search+GUI
Consumer
Applying Noisy Knowledge Graphs to Real Problems
31
Anti-fragile query reformulation to satisfy user intent
SELECT ?ad ?ethnicity
WHERE
{
?ad a :Ad ;
:hair_color 'Auburn' ;
:review_site_id 'cg9469f'
;
:price_per_hour '500' ;
:name ’Claire Gold’ ;
:ethnicity ?ethnicity .
}
query 1
query 2
query 3
query 4
query n
Query
Reformulation
Keyword expansion • Context broadening • Constraint
relaxation
Precision
Recall
Elastic Search
100M entities
Ranked
Candidates
32
Query-centric KG representation
33
Infrastructure: Leverage existing ecosystems (there
are many!)
Applying Noisy Knowledge Graphs to Real Problems
35
Prosecutions
36
User Testimonials
Showcasing THOR
38
Other domains
Narcotics
Illegal weapons
sales
Fraudulent shipments
Securities fraud
Causal exploration
Geopolitical forecasting
Cyberattack
prediction
THANK YOU! QUESTIONS...
39
Applying Noisy Knowledge Graphs to Real Problems
BACKUP
THOR: Text-enabled Humanitarian Operations in
Real-time
Applying Noisy Knowledge Graphs to Real Problems
Impact and
Measurements
45
Controlled (i.e. academic measurements)
0
10
20
30
40
50
60
70
80
90
100
0 - 0.1 < 0.2 < 0.3 < 0.4 < 0.5 < 0.6 < 0.7 < 0.8 < 0.9 <= 1.0
Average Precision of Retrieved Pages
DARPA MEMEX Eval (90K pages)
Point Fact Cluster ID Aggregate Facet
%Questions
Average Precision
46
In-use impact (sex trafficking)
100 million+ escort ads
3 years data coverage
2 billion triples
100 law enforcement
offices
3 convictions
47
NY County District Attorney (HTRU)
MEMEX tools getting
rolled out
48
Memex tools getting rolled out
49
Academic Output
~15 publications over the course of the program
• 7 more currently under review
• 2 best paper awards
• Upcoming special issue call on knowledge construction and management
• 2 upcoming books, incl. graduate-level textbook on knowledge graphs (MIT Press, 2018)
Multiple tutorials/demonstrations at top-tier academic conferences
• Tutorials on knowledge graph construction and data mining over Web corpora/unusual domains in
KDD17, ISWC17, AAAI18, WWW18
• At ISWC17, only full-day tutorial accepted; had near-capacity attendance
• Demos at ISWC17, AAAI18 (nominated for Best Demo)
• Case study at CHI18
Selected papers
• Knowledge Graphs for Social Good: An Entity-centric Search Engine for the Human Trafficking Domain
(IEEE Transactions on Big Data, 2017)
• Information Extraction in Illicit Domains (WWW, 2017)
• Unsupervised Entity Resolution on Multi-type Graphs (ISWC 2016)
50
Broker
Rich club
effectStar cluster
Web formation
Social Science Studies
51
• Subjective issue
• Architecture-level
evaluation
• Ablation analysis
“Ideal” Evaluation
Applying Noisy Knowledge Graphs to Real Problems
53
Space of design decisions
Raw data Search+GUI
?
??
? ?
54
Domain-specific
Insight Graphs
(DIG)
55
Structured query execution on noisy data
SELECT ?ad ?ethnicity
WHERE
{
?ad a :Ad ;
:hair_color 'Auburn' ;
:review_site_id 'cg9469f'
;
:price_per_hour '500' ;
:name ’Claire Gold’ ;
:ethnicity ?ethnicity .
}
query 1
query 2
query 3
query 4
query n
Query
Reformulation
Keyword expansion • Context broadening • Constraint
relaxation
Precision
Recall
Elastic Search
100M entities
Ranked
Candidates
56
DIG capabilities
Aggregations
Facets
Dossier Generation
Networks
Provenance
Structured Queries
Interface Customization
• Capabilities that generic search
engines like Google do not
currently support
• Domain-specific
–Allows a user to specify her schema
–No prior constraints
• Insight
–Supports aggregations, network
analysis, faceted search, dossiers...
• Graph
–Uses a knowledge graph
representation + efficient NoSQL
query reformulation
Users want Situational Awareness i.e.
equipped with actionable insights
• Advanced name matching algorithm based on
machine learning, phonetic similarity and illicit
webpage-specific word embeddings
58
How can we tell when two actors are really one and the same?
Abbie
Candy
Kim
Lea
Nicki
Abby
Kandy
Kimmy
Leah
Nikki
• Evaluated on five investigative domains beyond human trafficking,
each with its own domain-specific needs
–Narcotics
–Counterfeit Electronics Manufacturing
–Securities Fraud
–Mail Shipment Fraud
–Illegal Weapons Sales
• User engagement was high
–Investigators were able to customize their domain in just one day, with less
than an hour of training
–Have expressed interest in continuing to refine and use the search engine
internally
59
Other use-cases
Relevance score
Matching search
criteria
highlighted
Image
extraction+face
and pose
analytics using
deep learning
Original URL
Dossier term
Activity timeline
Co-occurrence
statistics
Related ads

More Related Content

What's hot (20)

PPTX
Great Expectations Presentation
Adam Doyle
 
PDF
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Neo4j
 
PPTX
Analytical tools
Aniket Joshi
 
PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
PPSX
Big Data
Neha Mehta
 
PDF
Big data landscape
Natalino Busa
 
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
PDF
The evolution of data analytics
Natalino Busa
 
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
PPTX
Big Data Analytics
Tyrone Systems
 
PDF
Future of Data - Big Data
Shankar R
 
PDF
Introduction to Big Data
AmpoolIO
 
PDF
Big Data Analytics
Sreedhar Chowdam
 
PPTX
Bigdata
Shankar R
 
PPTX
Hadoop - An Introduction
Shankar R
 
PPTX
Bigdata
Saravanan Manoharan
 
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
PPTX
Big Data Analytics Using Hadoop
Srikanth VNV
 
PPTX
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
PPTX
How big data and AI saved the day: critical IP almost walked out the door
DataWorks Summit
 
Great Expectations Presentation
Adam Doyle
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Neo4j
 
Analytical tools
Aniket Joshi
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
Big Data
Neha Mehta
 
Big data landscape
Natalino Busa
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
The evolution of data analytics
Natalino Busa
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Big Data Analytics
Tyrone Systems
 
Future of Data - Big Data
Shankar R
 
Introduction to Big Data
AmpoolIO
 
Big Data Analytics
Sreedhar Chowdam
 
Bigdata
Shankar R
 
Hadoop - An Introduction
Shankar R
 
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Big Data Analytics Using Hadoop
Srikanth VNV
 
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
How big data and AI saved the day: critical IP almost walked out the door
DataWorks Summit
 

Similar to Applying Noisy Knowledge Graphs to Real Problems (20)

PDF
The technical case for a semantic web
Tony Dobaj
 
PDF
Introduction to Knowledge Graphs for Information Architects.pdf
Heather Hedden
 
PPTX
Using Knowledge Graph for Promoting Cognitive Computing
Artificial Intelligence Institute at UofSC
 
PPTX
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
Craig Knoblock
 
PPTX
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
PPTX
Building AI Applications using Knowledge Graphs
Andre Freitas
 
PPTX
Semtech bizsemanticsearchtutorial
Barbara Starr
 
PPTX
The Semantic Knowledge Graph
Trey Grainger
 
PDF
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Trey Grainger
 
PPTX
The Relevance of the Apache Solr Semantic Knowledge Graph
Trey Grainger
 
PPTX
Making the Web Searchable - Keynote ICWE 2015
Peter Mika
 
PPTX
(Keynote) Peter Mika - “Making the Web Searchable”
icwe2015
 
PDF
Knowledge graphs + Chatbots with Neo4j
Christophe Willemsen
 
PDF
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Neo4j
 
PPTX
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
PPTX
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum
 
PPTX
Semantic Search tutorial at SemTech 2012
Peter Mika
 
PDF
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
Connected Data World
 
PPTX
Recent Trends in Semantic Search Technologies
Thanh Tran
 
The technical case for a semantic web
Tony Dobaj
 
Introduction to Knowledge Graphs for Information Architects.pdf
Heather Hedden
 
Using Knowledge Graph for Promoting Cognitive Computing
Artificial Intelligence Institute at UofSC
 
From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs u...
Craig Knoblock
 
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
Building AI Applications using Knowledge Graphs
Andre Freitas
 
Semtech bizsemanticsearchtutorial
Barbara Starr
 
The Semantic Knowledge Graph
Trey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Trey Grainger
 
The Relevance of the Apache Solr Semantic Knowledge Graph
Trey Grainger
 
Making the Web Searchable - Keynote ICWE 2015
Peter Mika
 
(Keynote) Peter Mika - “Making the Web Searchable”
icwe2015
 
Knowledge graphs + Chatbots with Neo4j
Christophe Willemsen
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Neo4j
 
The Apache Solr Semantic Knowledge Graph
Trey Grainger
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Optum
 
Semantic Search tutorial at SemTech 2012
Peter Mika
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
Connected Data World
 
Recent Trends in Semantic Search Technologies
Thanh Tran
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 

Applying Noisy Knowledge Graphs to Real Problems

Editor's Notes