SlideShare a Scribd company logo
Building Smarter Search Applications Using Built-In Knowledge Graphs and Query Introspection: Presented by Ted Sullivan, Lucidworks
Ted Sullivan
Building Smarter Search Applications
Using Built-In Knowledge Graphs and
Query Introspection
lucidworks.com
Senior Solutions Architect
Ted Sullivan
Building Smarter Search Applications
Using Built-In Knowledge Graphs (aka
Solr!) and Query Introspection
lucidworks.com
Senior Solutions Architect
Relevance - Precision - Recall
Do we put the cart before the horse?
Precision/Recall determine what matches and what doesn’t.
Relevance then computes the “best” matches from what is left.
Without focusing more on precision/recall first, we tend to have
Garbage In/Garbage Out
This is especially true in faceted search - relevance tuning can fix
the first few pages but the facets cannot be fixed!!!
Improving Precision starts …
with better phrase detection
Embarrassing “noise” hits are often due to phrase cross matches.
Synonyms can improve recall tremendously but they need some help in
Solr when they are multi-term
Stop words can be important for disambiguation within phrases:
“The Lady Is A Tramp” vs. “The Lady And The Tramp”
“To Be Or Not To Be”
Better Search: Detecting Noun Phrases
The basic technique is called “autophrasing” – recognizing when more
than one word represents just one thing.
Autophrasing – uses an extra knowledge-base file “autophrases.txt”
Query Autofiltering – uses the phrases that are stored as metadata
values in the index.
A Novel Approach to Natural Language Processing:
Mapping Noun and Verb phrases to metadata fields
“Who’s in The Who?”
Multi-term Synonym Demo
autophrases.txt
new york

new york state

empire state

new york city

new york new york

big apple

ny ny

city of new york

state of new york

ny state
synonyms.txt
new_york => new_york_state, new_york_city, big_apple,
new_york_new_york, ny_ny, nyc,empire_state,ny_state,
state_of_new_york
new_york_state,empire_state,ny_state, state_of_new_york
new_york_city,big_apple,new_york_new_york,

ny_ny,nyc, city_of_new_york
Multi-term Synonym Demo
This document is about new york state.
This document is about new york city.
There is a lot going on in NYC.
I heart the big apple.
The empire state is a great state.
New York, New York is a hellova town.
I am a native of the great state of New York.
New York New York City New York State
/select /autophrase
Query Autofiltering Implementation
Use Lucene FieldCache to build a map of field values to field names
(string fields)
Add synonym mappings from synonyms.txt and stemming to this
value(s) -> field(s) map
Use this map to discover noun phrases in the query that correspond to
field values in the index – longest contiguous phrase wins
Build filter or boost queries based on these discovered mappings
SOLR-7539 created 5/30/2015 - One comment so far “+1” - Bill Bell - Thanks Bill
Query Autofiltering – Basic Behavior
q = red socks -> fq=color:red&fq=product_type:socks
or bq=(color:red AND product_type:socks)^20
q = red lion socks -> fq=brand:”Red Lion”&fq=product_type:socks
q = scarlet chaise lounge -> color:red AND product_type:”Lounge Chair”
q = white dress shirts -> color:white AND product_type:”dress shirt”
q = white linen shirts -> ((brand:"White Linen" OR (color:white AND material:linen)) AND product_category:shirts)
q = white and grey dress shirts
((product_type:"dress shirt" OR ((product_type:dress OR product_category:dress) AND
(product_type:shirt OR product_category:shirt))) AND (color:(white OR grey) OR colors:(white AND grey)))
Query Autofiltering – Language And Logic
Logical or “Boolean” operators (named after mathematician George Boole) have
precise meaning in Set Theory and Computer Science
Search is about returning a set of records that match a set of terms
AND - Intersection ( && )
OR - Union ( | | )
NOT - Exclusion ( ! )
In language - the meaning of “and” and “or” is contextual - sometimes they are
synonyms and sometimes they are antonyms!
- depends on the cardinality (single or multi-value) of an noun property or
attribute!
A Music Ontology
Song
Songwriter
Genre
Performer
Recording
Guitarist
Pianist
VocalistProducer
Record Label
Band
Album
Natural Language Processing - Lite
(Front-end NLP)
Precise Free-text searching of Structured Metadata
Query Autofilter can take Natural Language Queries and turn
them into structured Boolean Queries.
Now processes both noun and verb/adjective phrases
Verb phrase mapping enables better selection of field names
Beatles Songs written by George Harrison
Willie Dixon Songs covered by Led Zeppelin
==> Look Mom - No SQL!!
Natural Language Processing - Lite
Noun phrases that map to fieldName/fieldValue pairs
Bob Dylan Songs
composer:”Bob Dylan” OR performer:”Bob Dylan”
Verb phrase patterns that map to field names:
Songs written by Bob Dylan ==> composer:”Bob Dylan”
“Who’s in The Who?” ==> memberOfGroup:”The Who”
Songs Bob Dylan covered vs covers of Bob Dylan Songs
Short Demo of Query Autofiltering NLP-Lite
A Suggester for Query Autofiltering
Create multi-field suggestions using “Pivot Facet Patterns” that can be
processed by Query Autofiltering
Use facets - at index time - to extract suggestion meta phrases and
context.
Steps in building the Suggester
Processing - Denormalizing the Graph
Create searchable metadata from object links
Process graph relationships, apply business rules
Creating the Pivot Patterns
${name_s} ${recording_type}s ==> Bob Dylan Songs
${genre} ${recording_type}s ==> Progressive Rock Albums
${genre} ${musician_type}s ==> Rock Drummers, Heavy Metal Bands
Calculating which recordings are covers
Finding Related entities (i.e. John Lennon <=> Paul McCartney)
${members_ss} ${musician_type}s ==> Paul McCartney Bands
Repurposing Facets for relationship mining
Using pivot facets to generate multi-field phrases
Only get linguistically sensible permutations!
Facets provide Specification and Context
Traditionally used for visualization and navigation.
We can repurpose this to make a smarter suggester!
Security Trimming of suggestions
Only show suggestions that can return results given the current
user’s entitlements
A Suggester that learns what the
user is looking for
Suggester now brings back metadata with the suggestion
Front end can cache this metadata and use it to boost
subsequent typeahead queries based on what the user
selected.
Searching for Beatles Songs - “Baby’s In Black” and
“Baby You’re A Rich Man” are now boosted over all of
the other song titles that start with “Baby”
Thank you!
lucidworks.com
Ted Sullivan
Senior Solutions Architect

More Related Content

PDF
Webinar: Natural Language Search with Solr
Lucidworks
 
PDF
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
PDF
Webinar: Simpler Semantic Search with Solr
Lucidworks
 
PDF
TACTICAL: Sourcing Nirvana! Power, Accuracy, and Speed in the Tactical Use of...
Janette Toral
 
PDF
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
PPTX
Self-learned Relevancy with Apache Solr
Trey Grainger
 
PPTX
Boolean Training
Somil Charan
 
PDF
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 
Webinar: Natural Language Search with Solr
Lucidworks
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
Webinar: Simpler Semantic Search with Solr
Lucidworks
 
TACTICAL: Sourcing Nirvana! Power, Accuracy, and Speed in the Tactical Use of...
Janette Toral
 
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Self-learned Relevancy with Apache Solr
Trey Grainger
 
Boolean Training
Somil Charan
 
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 

What's hot (19)

ODP
Quepy
dmoisset
 
PPTX
Boolean Logic Searching: A Primer
cswetzel
 
PPTX
Techniques For Deep Query Understanding
Abhay Prakash
 
PPTX
Google Is a Two Page Site
Martina Helene Welander
 
PPT
Lecture09
praveen kumar yechuri
 
PPTX
How to Build a Semantic Search System
Trey Grainger
 
PPTX
Online Search Techniques-Boolean Searching
Rebecca Herwatic Leonhard
 
PPTX
Introduction to boolean search
Key Resourcing
 
PPTX
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
PPSX
Boolean guidance
Syed Yaseen Ahmed
 
PPT
A Search Engine Syntax
Sherrin Jv
 
PPT
Sourcingrecruitinggooglelive 1232145650825055 3
pallaviksrikanth
 
PPTX
Using a keyword extraction pipeline to understand concepts in future work sec...
Kai Li
 
PPTX
Advanced westlaw complete_show2
jzahrndt
 
PPTX
Basic Boolean & Keyword Searching
Emily Burnaman
 
PPTX
Terms & connectors_complete_show_final
jzahrndt
 
PPTX
Boolean searching
hisled
 
PDF
Using and learning phrases
Cassandra Jacobs
 
PPT
Text Mining
sathish sak
 
Quepy
dmoisset
 
Boolean Logic Searching: A Primer
cswetzel
 
Techniques For Deep Query Understanding
Abhay Prakash
 
Google Is a Two Page Site
Martina Helene Welander
 
How to Build a Semantic Search System
Trey Grainger
 
Online Search Techniques-Boolean Searching
Rebecca Herwatic Leonhard
 
Introduction to boolean search
Key Resourcing
 
The Intent Algorithms of Search & Recommendation Engines
Trey Grainger
 
Boolean guidance
Syed Yaseen Ahmed
 
A Search Engine Syntax
Sherrin Jv
 
Sourcingrecruitinggooglelive 1232145650825055 3
pallaviksrikanth
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Kai Li
 
Advanced westlaw complete_show2
jzahrndt
 
Basic Boolean & Keyword Searching
Emily Burnaman
 
Terms & connectors_complete_show_final
jzahrndt
 
Boolean searching
hisled
 
Using and learning phrases
Cassandra Jacobs
 
Text Mining
sathish sak
 
Ad

Viewers also liked (16)

PDF
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
PDF
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucidworks
 
PDF
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucidworks
 
PDF
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Lucidworks
 
PDF
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Lucidworks
 
PDF
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucidworks
 
PDF
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Lucidworks
 
PDF
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
PDF
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Lucidworks
 
PDF
Webinar: Ecommerce, Rules, and Relevance
Lucidworks
 
PDF
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Lucidworks
 
PDF
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Lucidworks
 
PDF
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Lucidworks
 
PDF
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
PDF
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
PDF
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
This Ain't Your Parent's Search Engine: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucidworks
 
Lucene/Solr Spatial in 2015: Presented by David Smiley
Lucidworks
 
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Lucidworks
 
Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote
Lucidworks
 
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucidworks
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Lucidworks
 
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Lucidworks
 
Webinar: Ecommerce, Rules, and Relevance
Lucidworks
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Lucidworks
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Lucidworks
 
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Lucidworks
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
Ad

Similar to Building Smarter Search Applications Using Built-In Knowledge Graphs and Query Introspection: Presented by Ted Sullivan, Lucidworks (20)

PDF
Dealing with a search engine in your application - a Solr approach for beginners
Elaine Naomi
 
PPTX
The well tempered search application
Ted Sullivan
 
PDF
Information retrieval to recommender systems
Data Science Society
 
PPT
Search Query's
Squalicum High School
 
PPT
Enhance Your Google Search
Valentini Mellas
 
PPTX
Search engines
Sanjana Dixit
 
PPTX
Taxonomies in Search
TSoholt
 
PPTX
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
PDF
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
PPTX
Beyond document retrieval using semantic annotations
Roi Blanco
 
PPT
Basic Boolean Searching
Jennifer Haveman
 
PPT
Basic Boolean Searching for High School Researchers
Jennifer Haveman
 
PDF
Retrieving Information From Solr
Ramzi Alqrainy
 
PPTX
Search, Signals & Sense: An Analytics Fueled Vision
Seth Grimes
 
PPT
Searching the internet - better with Google / Google not always best
Eric Sieverts
 
PDF
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
lucenerevolution
 
PDF
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
PDF
Search explained T3DD15
Hans Höchtl
 
PDF
Solr 3.1 and beyond
Lucidworks (Archived)
 
PDF
Find it, possibly also near you!
Paul Borgermans
 
Dealing with a search engine in your application - a Solr approach for beginners
Elaine Naomi
 
The well tempered search application
Ted Sullivan
 
Information retrieval to recommender systems
Data Science Society
 
Search Query's
Squalicum High School
 
Enhance Your Google Search
Valentini Mellas
 
Search engines
Sanjana Dixit
 
Taxonomies in Search
TSoholt
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
Beyond document retrieval using semantic annotations
Roi Blanco
 
Basic Boolean Searching
Jennifer Haveman
 
Basic Boolean Searching for High School Researchers
Jennifer Haveman
 
Retrieving Information From Solr
Ramzi Alqrainy
 
Search, Signals & Sense: An Analytics Fueled Vision
Seth Grimes
 
Searching the internet - better with Google / Google not always best
Eric Sieverts
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
lucenerevolution
 
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
Search explained T3DD15
Hans Höchtl
 
Solr 3.1 and beyond
Lucidworks (Archived)
 
Find it, possibly also near you!
Paul Borgermans
 

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
PDF
Drive Agent Effectiveness in Salesforce
Lucidworks
 
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
PPTX
Connected Experiences Are Personalized Experiences
Lucidworks
 
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
PDF
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
PPTX
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
PPTX
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
PPTX
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

Recently uploaded (20)

PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
The Future of Artificial Intelligence (AI)
Mukul
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 

Building Smarter Search Applications Using Built-In Knowledge Graphs and Query Introspection: Presented by Ted Sullivan, Lucidworks

  • 2. Ted Sullivan Building Smarter Search Applications Using Built-In Knowledge Graphs and Query Introspection lucidworks.com Senior Solutions Architect
  • 3. Ted Sullivan Building Smarter Search Applications Using Built-In Knowledge Graphs (aka Solr!) and Query Introspection lucidworks.com Senior Solutions Architect
  • 4. Relevance - Precision - Recall Do we put the cart before the horse? Precision/Recall determine what matches and what doesn’t. Relevance then computes the “best” matches from what is left. Without focusing more on precision/recall first, we tend to have Garbage In/Garbage Out This is especially true in faceted search - relevance tuning can fix the first few pages but the facets cannot be fixed!!!
  • 5. Improving Precision starts … with better phrase detection Embarrassing “noise” hits are often due to phrase cross matches. Synonyms can improve recall tremendously but they need some help in Solr when they are multi-term Stop words can be important for disambiguation within phrases: “The Lady Is A Tramp” vs. “The Lady And The Tramp” “To Be Or Not To Be”
  • 6. Better Search: Detecting Noun Phrases The basic technique is called “autophrasing” – recognizing when more than one word represents just one thing. Autophrasing – uses an extra knowledge-base file “autophrases.txt” Query Autofiltering – uses the phrases that are stored as metadata values in the index. A Novel Approach to Natural Language Processing: Mapping Noun and Verb phrases to metadata fields “Who’s in The Who?”
  • 7. Multi-term Synonym Demo autophrases.txt new york
 new york state
 empire state
 new york city
 new york new york
 big apple
 ny ny
 city of new york
 state of new york
 ny state synonyms.txt new_york => new_york_state, new_york_city, big_apple, new_york_new_york, ny_ny, nyc,empire_state,ny_state, state_of_new_york new_york_state,empire_state,ny_state, state_of_new_york new_york_city,big_apple,new_york_new_york,
 ny_ny,nyc, city_of_new_york
  • 8. Multi-term Synonym Demo This document is about new york state. This document is about new york city. There is a lot going on in NYC. I heart the big apple. The empire state is a great state. New York, New York is a hellova town. I am a native of the great state of New York. New York New York City New York State /select /autophrase
  • 9. Query Autofiltering Implementation Use Lucene FieldCache to build a map of field values to field names (string fields) Add synonym mappings from synonyms.txt and stemming to this value(s) -> field(s) map Use this map to discover noun phrases in the query that correspond to field values in the index – longest contiguous phrase wins Build filter or boost queries based on these discovered mappings SOLR-7539 created 5/30/2015 - One comment so far “+1” - Bill Bell - Thanks Bill
  • 10. Query Autofiltering – Basic Behavior q = red socks -> fq=color:red&fq=product_type:socks or bq=(color:red AND product_type:socks)^20 q = red lion socks -> fq=brand:”Red Lion”&fq=product_type:socks q = scarlet chaise lounge -> color:red AND product_type:”Lounge Chair” q = white dress shirts -> color:white AND product_type:”dress shirt” q = white linen shirts -> ((brand:"White Linen" OR (color:white AND material:linen)) AND product_category:shirts) q = white and grey dress shirts ((product_type:"dress shirt" OR ((product_type:dress OR product_category:dress) AND (product_type:shirt OR product_category:shirt))) AND (color:(white OR grey) OR colors:(white AND grey)))
  • 11. Query Autofiltering – Language And Logic Logical or “Boolean” operators (named after mathematician George Boole) have precise meaning in Set Theory and Computer Science Search is about returning a set of records that match a set of terms AND - Intersection ( && ) OR - Union ( | | ) NOT - Exclusion ( ! ) In language - the meaning of “and” and “or” is contextual - sometimes they are synonyms and sometimes they are antonyms! - depends on the cardinality (single or multi-value) of an noun property or attribute!
  • 13. Natural Language Processing - Lite (Front-end NLP) Precise Free-text searching of Structured Metadata Query Autofilter can take Natural Language Queries and turn them into structured Boolean Queries. Now processes both noun and verb/adjective phrases Verb phrase mapping enables better selection of field names Beatles Songs written by George Harrison Willie Dixon Songs covered by Led Zeppelin ==> Look Mom - No SQL!!
  • 14. Natural Language Processing - Lite Noun phrases that map to fieldName/fieldValue pairs Bob Dylan Songs composer:”Bob Dylan” OR performer:”Bob Dylan” Verb phrase patterns that map to field names: Songs written by Bob Dylan ==> composer:”Bob Dylan” “Who’s in The Who?” ==> memberOfGroup:”The Who” Songs Bob Dylan covered vs covers of Bob Dylan Songs
  • 15. Short Demo of Query Autofiltering NLP-Lite
  • 16. A Suggester for Query Autofiltering Create multi-field suggestions using “Pivot Facet Patterns” that can be processed by Query Autofiltering Use facets - at index time - to extract suggestion meta phrases and context.
  • 17. Steps in building the Suggester Processing - Denormalizing the Graph Create searchable metadata from object links Process graph relationships, apply business rules Creating the Pivot Patterns ${name_s} ${recording_type}s ==> Bob Dylan Songs ${genre} ${recording_type}s ==> Progressive Rock Albums ${genre} ${musician_type}s ==> Rock Drummers, Heavy Metal Bands Calculating which recordings are covers Finding Related entities (i.e. John Lennon <=> Paul McCartney) ${members_ss} ${musician_type}s ==> Paul McCartney Bands
  • 18. Repurposing Facets for relationship mining Using pivot facets to generate multi-field phrases Only get linguistically sensible permutations! Facets provide Specification and Context Traditionally used for visualization and navigation. We can repurpose this to make a smarter suggester! Security Trimming of suggestions Only show suggestions that can return results given the current user’s entitlements
  • 19. A Suggester that learns what the user is looking for Suggester now brings back metadata with the suggestion Front end can cache this metadata and use it to boost subsequent typeahead queries based on what the user selected. Searching for Beatles Songs - “Baby’s In Black” and “Baby You’re A Rich Man” are now boosted over all of the other song titles that start with “Baby”