Trey Grainger
Chief Algorithms Officer
Balancing the Dimensions of User Intent
October 28, 2019
Trey Grainger
Chief Algorithms Officer
• Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder
• Georgia Tech – MBA, Management of Technology
• Furman University – BA, Computer Science, Business, & Philosophy
• Stanford University – Information Retrieval & Web Search
Other fun projects:
• Co-author of Solr in Action, plus numerous research publications
• Advisor to Presearch, the decentralized search engine
• Lucene / Solr contributor
About Me
• About Lucidworks
• What is AI-powered Search?
• The Dimensions of User Intent
• Content Understanding:
• Keyword Search
• User Understanding:
• Collaborative Recommendations
• Content Understanding + User Understanding:
• Personalized Search
• Domain Understanding:
• Knowledge Graphs
• Domain Understanding + User Understanding:
• Domain-aware Matching
• Content Understanding + Domain Understanding:
• Semantic Search
• Balancing Approaches:
• Keyword vs. Vector vs. Knowledge Graph Search
• Vector Search
• Knowledge Graph Search
• Combining it all together
Agenda
Who are we?
300+ CUSTOMERS ACROSS THE
FORTUNE 1000
400+EMPLOYEES
OFFICES IN
San Francisco, CA (HQ)
Raleigh-Durham, NC
Cambridge, UK
Bangalore, India
Hong Kong
The Search & AI Conference
COMPANY BEHIND
D E V E L O P M E N T,
H O S T I N G ,
& S U P P O R T
Proudly built with open-source
tech at its core: Apache Solr &
Apache Spark
Personalizes search
with applied
machine learning
Proven on the
world’s biggest
information systems
AI-Powered Search
What is
?
http://aiPoweredSearch.com
... is my new book!
(Haystack discount code: ctwhay19)
AI-powered Search
AI-powered Search
Question / Answer
Systems
Virtual Assistants
• Signals Boosting Models
• Learning to Rank
• Semantic Search
• Collaborative Filtering
• Personalized Search
• Content Clustering
• NLP / Entity Resolution
• Semantic Knowledge Graphs
• Document Classification
• etc.
• Neural Search
• Word Embeddings
• Vector Search
• Image / Voice Search
• etc.
• Question / Answer Systems
• Virtual Assistants
• Chatbots
• Rules-based Relevancy
• etc.
We have a big toolbox - great!
But how do we properly apply
those tools?
Dimensions of User Intent
Content
Understanding
Domain
Understanding
User
Understanding
User Intent
Keyword
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
User
Understanding
User Intent
/solr/collection/select/?q=apache solr
Term Documents
… …
apache
doc1, doc3, doc4,
doc5
…
lucene doc2, doc4, doc6
… …
solr
doc1, doc3, doc4,
doc7, doc8
… …
doc5
doc7 doc8
doc1 doc3
doc4
solr
apache
apache solr
Matching queries to documents
BM25 (Relevance Scoring between Query and Documents)
Score(q, d) =
∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl )
t in q
Where:
t = term; d = document; q = query; i = index
tf(t in d) = numTermOccurrencesInDocument ½
idf(t) = 1 + log (numDocs / (docFreq + 1))
|d| = ∑ 1
t in d
avgdl = = ( ∑ |d| ) / ( ∑ 1 ) )
d in i d in i
k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point.
b = Free parameter. Usually ~0.75. Increases impact of document normalization.
ipad
Keyword
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
User Intent
Collaborative Filtering (Recommendations)
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Results
Alonzo ipad doc10,
doc22,
doc12, …
Elena printer doc84,
doc2,
doc17, …
Ming ipad doc10,
doc22,
doc12, …
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc12
Elena click doc2
… … …
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
ipad ⌕
Matrix Factorization
Recommendations for Alonzo:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Recommendations (User-Item, Item-Item, Query-Item)
User Item Weight
Alonzo doc22 1.0
Alonzo doc12 0.4
… … …
Ming doc12 0.9
Ming doc22 0.6
… … …
Recommendations for Alonzo:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Item Item Weight
doc22 doc22 1.0
doc22 doc12 0.85
… … …
doc12 doc12 1.0
doc12 doc22 0.83
… … …
Query Item Weight
ipad doc22 0.98
ipad doc12 0.6
… … …
kindle doc12 0.96
apple doc22 0.90
… … …
Recommendations for Doc22:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Recommendations for “ipad”:
• doc22: “iPad Pro”
• doc12: “Kindle Fire”
…
Matrix Factorization
ipad
ipad
Keyword
Search
Knowledge Graph
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
User Intent
What is a Knowledge Graph?
(vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
Overly Simplistic Definitions
Alternative Labels: Substitute words with identical meanings
[ CTO => Chief Technology Officer; specialise => specialize ]
Synonyms List: Provides substitute words that can be used to represent
the same or very similar things
[ human => homo sapien, mankind; food => sustenance, meal ]
Taxonomy: Classifies things into Categories
[ john is Human; Human is Mammal; Mammal is Animal ]
Ontology: Defines relationships between types of things
[ animal eats food; human is animal ]
Knowledge Graph: Instantiation of an
Ontology (contains the things that are related)
[ john is human; john eats food ]
A Knowledge Graph subsumes the other types.
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Keyword Search
(Completely User-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Keyword Search
(Completely User-specified)
Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
Personalized
Queries
(Mostly user-specified,
partially driven by user profile)
Keyword Search
(Completely User-specified)
User-guided
Recommendations
(Mostly driven by user profile,
partially user-specified)
Traditional
Recommendations
(Completely driven by
user behavior)
Personalized Search
Personalization
Regular Search Results:
Personalized Search Results:
User:
Nice - personalization is awesome!
Let’s roll it out everywhere!
Ugh…
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Personas / User Profiles
(User attributes and preferences in
knowledge graph)
Multimodal Recommendations
(Recommendations combining
collaborative filtering plus user-based
profile attribute matching/ranking)
Knowledge Graph
(Understanding conceptual
and logical relationships
between domain-specific entities)
Collaborative
Recommendations
(Completely driven by
user behavior)
Domain-aware Matching
http://localhost:8983/solr/jobs/select/?
fl=jobtitle,city,state,salary&
q=(
jobtitle:"nurse educator"^25 OR jobtitle:(nurse educator)^10
)
AND (
(city:"Boston" AND state:"MA")^15
OR state:"MA")
AND _val_:"map(salary, 40000, 60000,10, 0)"
AND similar_users:{!terms}u99,u1,u50,u2311,u253,u70,u99
*Example derived from chapter 16 of Solr in Action
Multimodal Recommendations
Jane is a nurse educator in Boston seeking between $40K and $60K
She has interacted with the same content as the following users:
u99,u1,u50,u2311,u253,u70,u99
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Semantic
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
Keyword Search
(Finding and
Ranking Keyword)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Language Understanding
(Understanding syntax
and query structure)
Keyword Search
(Finding and
Ranking Keyword)
Terminology Understanding
(Understanding domain-specific
terms and conceptual meaning)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Language Understanding
(Understanding syntax
and query structure)
Terminology Understanding
(Understanding domain-specific
terms and conceptual meaning)
Keyword Search
(Finding and
Ranking Keyword)
Knowledge Graph
(Understanding conceptual and
logical relationships between
domain-specific entities)
Semantic Search
Sentence Embeddings:
[ 2, 3, 2, 4, 2, 1, 5, 3 ]
[ 5, 3, 2, 3, 4, 0, 3, 4 ]
. . .
Document Embedding:
[ 4, 1, 4, 2, 1, 2, 4, 3 ]
Word Embeddings:
[ 5, 1, 3, 4, 2, 1, 5, 3 ]
[ 4, 1, 3, 0, 1, 1, 4, 2 ]
. . .
Paragraph Embeddings:
[ 5, 1, 4, 1, 0, 2, 4, 0 ]
[ 1, 1, 4, 2, 1, 0, 0, 0 ]
. . .
Thought Vectors
apple caffeine cheese coffee drink donut food juice pizza tea water … term N
cappuccino 0 0 0 0 0 0 0 0 0 0 0 ...
apple 1 0 0 0 0 0 0 0 0 0 0 ...
juice 0 0 0 0 0 0 0 1 0 0 0 ...
cheese 0 0 1 0 0 0 0 0 0 0 0 ...
pizza 0 0 0 0 0 0 0 0 1 0 0 ...
donut 0 0 0 0 0 1 0 0 0 0 0 ...
green 0 0 0 0 0 0 0 0 0 0 0 ...
tea 0 0 0 0 0 0 0 0 0 1 0 ...
bread 0 0 1 0 0 0 0 0 0 0 0 ...
sticks 0 0 0 0 0 0 0 0 0 0 0 ...
exact term lookup in inverted indexquery
Single Term Searches (as a Vector)
Combined Vector
query
Multi-term Query Vectors
juice 0 0 0 0 0 0 0 1 0 0 0 ...
apple 1 0 0 0 0 0 0 0 0 0 0 ...
+
apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
apple caffeine cheese coffee drink donut food juice pizza tea water … term N
latte 0 0 0 0 0 0 0 0 0 0 0 ...
cappuccino 0 0 0 0 0 0 0 0 0 0 0 ...
apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
cheese pizza 0 0 1 0 0 0 0 0 1 0 0 ...
donut 0 0 0 0 0 1 0 0 0 0 0 ...
soda 0 0 0 0 0 0 0 0 0 0 0 ...
green tea 0 0 0 0 0 0 0 0 0 1 0 ...
water 0 0 0 0 0 0 0 0 0 0 1 ...
cheese bread
sticks
0 0 1 0 0 0 0 0 0 0 0 ...
cinnamon sticks 0 0 0 0 0 0 0 0 0 0 0 ...
exact term lookup in inverted indexquery
Multi-term Searches
food drink dairy bread caffeine sweet calories healthy
apple juice 0 5 0 0 0 4 4 3
cappuccino 0 5 3 0 4 1 2 3
cheese bread
sticks
5 0 4 5 0 1 4 2
cheese pizza 5 0 4 4 0 1 5 2
cinnamon
bread sticks
5 0 1 5 0 3 4 2
donut 5 0 1 5 0 4 5 1
green tea 0 5 0 0 2 1 1 5
latte 0 5 4 0 4 1 3 3
soda 0 5 0 0 3 5 5 0
water 0 5 0 0 0 0 0 5
Dimensionality Reduction
Phrase: Vector:
apple juice: [ 0, 5, 0, 0, 0, 4, 4, 3 ]
cappuccino: [ 0, 5, 3, 0, 4, 1, 2, 3 ]
cheese bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ]
cheese pizza: [ 5, 0, 4, 4, 0, 1, 5, 2 ]
cinnamon bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ]
donut: [ 5, 0, 1, 5, 0, 4, 5, 1 ]
green tea: [ 0, 5, 0, 0, 2, 1, 1, 5 ]
latte: [ 0, 5, 4, 0, 4, 1, 3, 3 ]
soda: [ 0, 5, 0, 0, 3, 5, 5, 0 ]
water: [ 0, 5, 0, 0, 0, 0, 0, 5 ]
Ranked Results: Green Tea
0.94 water
0.85 cappuccino
0.80 latte
0.78 apple juice
0.60 soda
… …
0.19 donut
Vector Similarity Scores:
Vector Similarity (a, b):
cos(θ) = a · b
|a| × |b|
Ranked Results: Cheese Pizza
0.99 cheese bread sticks
0.91 cinnamon bread sticks
0.89 donut
0.47 latte
0.46 apple juice
… …
0.19 water
Vector Similarity Scoring
Vector Similarity Scores:
Performance Considerations
Problem: Vector Scoring is Slow
• Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate
similarities between the query vector and every document’s vectors to determine best matches, which is
slow at scale.
Solution: Quantized Vectors
• “Quantization” is the process for mapping vectors features to discrete values.
• Creating “tokens” which map to a similar vector space, enables matching on those tokens to perform an ANN
(Approximate Nearest Neighbor) search
• This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again,
at the expense of some recall and scoring accuracy
Recommended Approach: Quantized Vector Search + Vector Similarity Reranking
• Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and
then re-rank the top-N results using full Vector similarity scoring.
Solr Implementation Options
Option 1: Streaming Expressions
curl -X POST -H "Content-Type: application/json" 
http://localhost:8983/solr/food/update?commit=true 
--data-binary ' [
{"id": "1", "name_s":"donut", "vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]},
{"id": "2", "name_s":"apple juice",
"vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]},
{"id": "3", "name_s":"cappuccino",
"vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]},
{"id": "4", "name_s":"cheese pizza",
"vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]},
{"id": "5", "name_s":"green tea",
"vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]},
{"id": "6", "name_s":"latte", "vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]},
{"id": "7", "name_s":"soda", "vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]},
{"id": "8", "name_s":"cheese bread sticks",
"vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]},
{"id": "9", "name_s":"water", "vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]},
{"id": "10", "name_s":"cinnamon bread sticks",
"vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]}
] '
Send Documents to Solr:
Streaming Expressions
8983
Option 2:
Streaming Expressions Query Parser
http://localhost:8983/solr/food/select?q=*:*&fl=id,name_s&
fq={!streaming_expression}top(
select(
search(food, q="*:*", fl="id,vector_fs", sort="id asc"),
cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id),
n=5, sort="cos desc”
)
{ "responseHeader":{
… },
"response":{"numFound":5,"start":0,"docs":[
{ "name_s":"donut", "id":"1"},
{ "name_s":"apple juice", "id":"2"},
{ "name_s":"cheese pizza", "id":"4"},
{ "name_s":"cheese bread sticks", "id":"8"},
{ "name_s":"cinnamon bread sticks", "id":"10"}]
}}
Request:
Response:
Streaming Expressions Query Parser
Option 3:
Solr Vector Scoring Plugin
Send Documents to Solr:
curl -X POST -H "Content-Type: application/json"
http://localhost:8983/solr/{your-collection-name}/update?commit=true --
data-binary ‘
[
{"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33"},
{"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25"},
{"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01"},
{"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9"},
{"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1"},
{"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09"}
]'
Solr Vector Scoring Plugin
Request:
Response:
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="0.1,4.75,0.3,1.2,0.7,4.0”
}
{ "responseHeader":{ "status":0, "QTime":1}},
"response":{ "numFound":6,"start":0,"maxScore":0.99984086,
"docs":[
{ "name":["example 3"], "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "],
"score":0.99984086},
{ "name":["example 0"], "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "], "score":0.7693964},
{ "name":["example 5"], "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "], "score":0.76322395},
{ "name":["example 4"], "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "], "score":0.5328145},
{ "name":["example 1"], "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "], "score":0.48513117},
{ "name":["example 2"], "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "], "score":0.44909418}]
}}
Solr Vector Scoring Plugin
Option 4:
Solr Vector Scoring + LSH Plugin
Send Documents to Solr:
Solr Vector Scoring + LSH Plugin
curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-
name}/update?update.chain=LSH&commit=true --data-binary ‘
[
{"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"},
{"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"}
]'
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true"
reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_
Request:
Response:
Solr Vector Scoring + LSH Plugin
{
"responseHeader":{ "status":0, "QTime":8, "response":{"numFound":1,"start":0,"maxScore":36.65736,
"docs":[
{ "id": "1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33",
"_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==",
"_lsh_hash_":["0_8", "1_35", "2_7", "3_10", "4_2", "5_35", "6_16", "7_30", "8_27", "9_12", "10_7",
"11_32", "12_48", "13_36", "14_10", "15_7", "16_42", "17_5", "18_3", "19_2", "20_1",
"21_0", "22_24", "23_18", "24_42", "25_31", "26_35", "27_8", "28_1", "29_24", "30_47",
"31_14", "32_22", "33_39", "34_0", "35_34", "36_34", "37_39", "38_27", "39_27",
"40_45", "41_10", "42_21", "43_34", "44_41", "45_9", "46_31", "47_0", "48_4", "49_43"],
"score":36.65736}
] } }
http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector
vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true"
reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_
Request:
Option 5 (Work in Progress):
First-class Vector Fields in Lucene/Solr
Now In Progress
ANN Benchmarks
(Approximate Nearest Neighbor)
https://github.com/erikbern/ann-benchmarks
Vector Encoders
• Take queries, documents, sentences, paragraphs, etc. and
transform them into vectors.
• Usually leverage deep learning, which can discover rich language
usage rules and map them to combinations of features in the
vector
• Popular Libraries:
• Bert
• Elmo
• Universal Sentence Encoder
• Word2Vec
• Sentence2Vec
• Glove
• fastText
• many more …
Vector Encoders
Query Type Likely Outcome
Obscure keyword combinations
Q. (software OR hardware) AND enginee*
• Keyword search succeeds
• Vector Search fails
Natural Language Queries
Q. Can my wife drive on my insurance?
• Keyword search might get
lucky, but probably fails
• Vector Search succeeds
Fuzzy Language Queries
Q. famous french tower
• Keyword search mismatch
yields poor results
• Vector Search succeeds
Structured Relationship Queries
Q. popular bbq near Activate
• Keyword search fails
• Vector search fails
• Need a Knowledge Graph!
Keyword Search vs. Vector Search
Giant Graph of Relationships...
Trey Grainger works for Lucidworks.
He spoke at the Activate 2019
conference.
#Activate19
(Activate) wqs held in Washington, DC
September 9-12, 2019.
Trey got his masters degree from
Georgia Tech.
Trey’s Voicemail
Semantic Knowledge Graph
id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Registered Nurse
desc: a registered nurse at
hospital doing hard work
skills: oncology, phlebotemy
id: 3
job_title: Java Developer
desc: a software engineer or a
java engineer doing work
skills: java, scala, hibernate
field doc term
desc
1
a
at
company
engineer
great
software
2
a
at
doing
hard
hospital
nurse
registered
work
3
a
doing
engineer
java
or
software
work
job_title 1
Software
Engineer
… … …
Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments
Source: Trey Grainger,
Khalifeh AlJadda,
Mohammed Korayem,
Andries Smith.“The Semantic
Knowledge Graph: A
compact, auto-generated
model for real-time traversal
and ranking of any
relationship within a domain”.
DSAA 2016.
Knowledge
Graph
field term postings
list
doc pos
desc
a
1 4
2 1
3 1, 5
at
1 3
2 4
company 1 6
doing
2 6
3 8
engineer
1 2
3 3, 7
great 1 5
hard 2 7
hospital 2 5
java 3 6
nurse 2 3
or 3 4
registered 2 2
software
1 1
3 2
work
2 10
3 9
job_title java developer 3 1
… … … …
Related term vector (for query concept expansion)
http://localhost:8983/solr/stack-exchange-health/skg
Disambiguation by Category Example
Meaning 1: Restaurant => bbq, brisket, ribs, pork, …
Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
Example Query:
Demo!
Demo Data
Places (also includes geonames database)
Entities (includes search commands)
Text Content
[ Web crawl of restaurant and product reviews sites ]
Solr Knowledge Graph Traversal Query
"bbq",
Why this Semantic Nuance Matters
popular barbeque near Activate
(popular same as "good", "top", "best")
Hotels near Haystack EU
hotels near popular BBQ in Berlin
BBQ near airports near Berlin
hotels near movie theaters in Berlin …
Other Knowledge Graph Search examples:
Keyword
Search
Knowledge Graph
User Intent
Personalized
Search
Semantic
Search
Domain-aware
Matching
Dimensions of User Intent
Content
Understanding
Domain
Understanding
Collaborative
Recommendations User
Understanding
News Search : popularity and freshness drive relevance
Restaurant Search: geographical proximity and price range are critical
Ecommerce: likelihood of a purchase is key
Movie search: More popular titles are generally more relevant
Job search: category of job, salary range, and geographical proximity matter
The right ranking algorithm is domain and context-dependent
Example Combining Content + Domain + User Context
News website:
/select?
fq={!cache=false v=$keywords}&
q= {!func}scale(query($keywords),0,25)
{!func}scale(geodist(),0,25)
{!func}recip(rord(publicationDate),1,25,0)
{!func}scale(popularity,0,25)&
keywords="fall festival"&
sfield=location&
pt=33.748,-84.391
25%
25%
25%
25%
*Example from chapter 16 of Solr in Action
But how do we figure out the right
balance of weights?
Learning to Rank
User
Searches
User
Sees
Results
User
takes an
action
Users’ actions
inform system
improvements
User Query Re
Alonzo ipad do
do
do
Elena printer do
do
do
Ming ipad do
do
do
… … …
User Action Document
Alonzo click doc22
Elena click doc17
Ming click doc12
Alonzo purchase doc22
Ming click doc22
Ming purchase doc22
Elena click doc2
… … …
Feature Weight
title_match_all_terms 15.25
exact_phrase_match 10
signal_boost 9.5
content_age 9.2
user_geo_distance 6.5
personalization_cat_1 2.8
doc_popularity 2.75
… …
ipad ⌕
Initial Results:
1) doc1
2) doc2
3) doc3
Build Ranking Classifier
(from Implicit Relevance Judgements)
Final Results:
1) doc3
2) doc1
3) doc2
Facet,
Topic &
Cluster
Query Rule
Matching
Natural
Language
Machine
Learning
Boosted
Results
Signals
Content
Index
System Generated
Human Generated
Application Generated
Solution
Data
We operationalize AI for the
largest businesses on the planet.
Questions?
Trey Grainger
trey@lucidworks.com
@treygrainger
Other presentations:
http://www.treygrainger.com
40% Discount code: ctwhay19
http://aiPoweredSearch.com
http://solrinaction.com
Books:
Thank You!

Balancing the Dimensions of User Intent

  • 1.
    Trey Grainger Chief AlgorithmsOfficer Balancing the Dimensions of User Intent October 28, 2019
  • 2.
    Trey Grainger Chief AlgorithmsOfficer • Previously: SVP of Engineering @ Lucidworks; Director of Engineering @ CareerBuilder • Georgia Tech – MBA, Management of Technology • Furman University – BA, Computer Science, Business, & Philosophy • Stanford University – Information Retrieval & Web Search Other fun projects: • Co-author of Solr in Action, plus numerous research publications • Advisor to Presearch, the decentralized search engine • Lucene / Solr contributor About Me
  • 3.
    • About Lucidworks •What is AI-powered Search? • The Dimensions of User Intent • Content Understanding: • Keyword Search • User Understanding: • Collaborative Recommendations • Content Understanding + User Understanding: • Personalized Search • Domain Understanding: • Knowledge Graphs • Domain Understanding + User Understanding: • Domain-aware Matching • Content Understanding + Domain Understanding: • Semantic Search • Balancing Approaches: • Keyword vs. Vector vs. Knowledge Graph Search • Vector Search • Knowledge Graph Search • Combining it all together Agenda
  • 4.
    Who are we? 300+CUSTOMERS ACROSS THE FORTUNE 1000 400+EMPLOYEES OFFICES IN San Francisco, CA (HQ) Raleigh-Durham, NC Cambridge, UK Bangalore, India Hong Kong The Search & AI Conference COMPANY BEHIND D E V E L O P M E N T, H O S T I N G , & S U P P O R T
  • 5.
    Proudly built withopen-source tech at its core: Apache Solr & Apache Spark Personalizes search with applied machine learning Proven on the world’s biggest information systems
  • 6.
  • 7.
  • 9.
  • 10.
    AI-powered Search Question /Answer Systems Virtual Assistants • Signals Boosting Models • Learning to Rank • Semantic Search • Collaborative Filtering • Personalized Search • Content Clustering • NLP / Entity Resolution • Semantic Knowledge Graphs • Document Classification • etc. • Neural Search • Word Embeddings • Vector Search • Image / Voice Search • etc. • Question / Answer Systems • Virtual Assistants • Chatbots • Rules-based Relevancy • etc.
  • 11.
    We have abig toolbox - great!
  • 12.
    But how dowe properly apply those tools?
  • 13.
    Dimensions of UserIntent Content Understanding Domain Understanding User Understanding User Intent
  • 14.
    Keyword Search Dimensions of UserIntent Content Understanding Domain Understanding User Understanding User Intent
  • 15.
    /solr/collection/select/?q=apache solr Term Documents …… apache doc1, doc3, doc4, doc5 … lucene doc2, doc4, doc6 … … solr doc1, doc3, doc4, doc7, doc8 … … doc5 doc7 doc8 doc1 doc3 doc4 solr apache apache solr Matching queries to documents
  • 16.
    BM25 (Relevance Scoringbetween Query and Documents) Score(q, d) = ∑ idf(t) · ( tf(t in d) · (k + 1) ) / ( tf(t in d) + k · (1 – b + b · |d| / avgdl ) t in q Where: t = term; d = document; q = query; i = index tf(t in d) = numTermOccurrencesInDocument ½ idf(t) = 1 + log (numDocs / (docFreq + 1)) |d| = ∑ 1 t in d avgdl = = ( ∑ |d| ) / ( ∑ 1 ) ) d in i d in i k = Free parameter. Usually ~1.2 to 2.0. Increases term frequency saturation point. b = Free parameter. Usually ~0.75. Increases impact of document normalization.
  • 17.
  • 18.
    Keyword Search Dimensions of UserIntent Content Understanding Domain Understanding Collaborative Recommendations User Understanding User Intent
  • 20.
    Collaborative Filtering (Recommendations) User Searches User Sees Results User takesan action Users’ actions inform system improvements User Query Results Alonzo ipad doc10, doc22, doc12, … Elena printer doc84, doc2, doc17, … Ming ipad doc10, doc22, doc12, … … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc12 Elena click doc2 … … … User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … ipad ⌕ Matrix Factorization Recommendations for Alonzo: • doc22: “iPad Pro” • doc12: “Kindle Fire” …
  • 21.
    Recommendations (User-Item, Item-Item,Query-Item) User Item Weight Alonzo doc22 1.0 Alonzo doc12 0.4 … … … Ming doc12 0.9 Ming doc22 0.6 … … … Recommendations for Alonzo: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Item Item Weight doc22 doc22 1.0 doc22 doc12 0.85 … … … doc12 doc12 1.0 doc12 doc22 0.83 … … … Query Item Weight ipad doc22 0.98 ipad doc12 0.6 … … … kindle doc12 0.96 apple doc22 0.90 … … … Recommendations for Doc22: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Recommendations for “ipad”: • doc22: “iPad Pro” • doc12: “Kindle Fire” … Matrix Factorization
  • 22.
  • 24.
  • 25.
    Keyword Search Knowledge Graph Dimensions ofUser Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding User Intent
  • 26.
    What is aKnowledge Graph? (vs. Ontology vs. Taxonomy vs. Synonyms, etc.)
  • 27.
    Overly Simplistic Definitions AlternativeLabels: Substitute words with identical meanings [ CTO => Chief Technology Officer; specialise => specialize ] Synonyms List: Provides substitute words that can be used to represent the same or very similar things [ human => homo sapien, mankind; food => sustenance, meal ] Taxonomy: Classifies things into Categories [ john is Human; Human is Mammal; Mammal is Animal ] Ontology: Defines relationships between types of things [ animal eats food; human is animal ] Knowledge Graph: Instantiation of an Ontology (contains the things that are related) [ john is human; john eats food ] A Knowledge Graph subsumes the other types.
  • 30.
    Keyword Search Knowledge Graph User Intent Personalized Search Dimensionsof User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 31.
  • 32.
    Keyword Search (Completely User-specified) User-guided Recommendations (Mostlydriven by user profile, partially user-specified) Traditional Recommendations (Completely driven by user behavior) Keyword Search (Completely User-specified) Personalized Queries (Mostly user-specified, partially driven by user profile)
  • 33.
    Personalized Queries (Mostly user-specified, partially drivenby user profile) Keyword Search (Completely User-specified) User-guided Recommendations (Mostly driven by user profile, partially user-specified) Traditional Recommendations (Completely driven by user behavior) Personalized Search
  • 34.
  • 37.
  • 38.
    Nice - personalizationis awesome! Let’s roll it out everywhere!
  • 39.
  • 40.
    Keyword Search Knowledge Graph User Intent Personalized Search Domain-aware Matching Dimensionsof User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 41.
    Knowledge Graph (Understanding conceptual andlogical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior)
  • 42.
    Personas / UserProfiles (User attributes and preferences in knowledge graph) Multimodal Recommendations (Recommendations combining collaborative filtering plus user-based profile attribute matching/ranking) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior)
  • 43.
    Personas / UserProfiles (User attributes and preferences in knowledge graph) Multimodal Recommendations (Recommendations combining collaborative filtering plus user-based profile attribute matching/ranking) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Collaborative Recommendations (Completely driven by user behavior) Domain-aware Matching
  • 48.
    http://localhost:8983/solr/jobs/select/? fl=jobtitle,city,state,salary& q=( jobtitle:"nurse educator"^25 ORjobtitle:(nurse educator)^10 ) AND ( (city:"Boston" AND state:"MA")^15 OR state:"MA") AND _val_:"map(salary, 40000, 60000,10, 0)" AND similar_users:{!terms}u99,u1,u50,u2311,u253,u70,u99 *Example derived from chapter 16 of Solr in Action Multimodal Recommendations Jane is a nurse educator in Boston seeking between $40K and $60K She has interacted with the same content as the following users: u99,u1,u50,u2311,u253,u70,u99
  • 49.
    Keyword Search Knowledge Graph User Intent Personalized Search Semantic Search Domain-aware Matching Dimensionsof User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 50.
    Keyword Search (Finding and RankingKeyword) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities)
  • 51.
    Language Understanding (Understanding syntax andquery structure) Keyword Search (Finding and Ranking Keyword) Terminology Understanding (Understanding domain-specific terms and conceptual meaning) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities)
  • 52.
    Language Understanding (Understanding syntax andquery structure) Terminology Understanding (Understanding domain-specific terms and conceptual meaning) Keyword Search (Finding and Ranking Keyword) Knowledge Graph (Understanding conceptual and logical relationships between domain-specific entities) Semantic Search
  • 55.
    Sentence Embeddings: [ 2,3, 2, 4, 2, 1, 5, 3 ] [ 5, 3, 2, 3, 4, 0, 3, 4 ] . . . Document Embedding: [ 4, 1, 4, 2, 1, 2, 4, 3 ] Word Embeddings: [ 5, 1, 3, 4, 2, 1, 5, 3 ] [ 4, 1, 3, 0, 1, 1, 4, 2 ] . . . Paragraph Embeddings: [ 5, 1, 4, 1, 0, 2, 4, 0 ] [ 1, 1, 4, 2, 1, 0, 0, 0 ] . . . Thought Vectors
  • 56.
    apple caffeine cheesecoffee drink donut food juice pizza tea water … term N cappuccino 0 0 0 0 0 0 0 0 0 0 0 ... apple 1 0 0 0 0 0 0 0 0 0 0 ... juice 0 0 0 0 0 0 0 1 0 0 0 ... cheese 0 0 1 0 0 0 0 0 0 0 0 ... pizza 0 0 0 0 0 0 0 0 1 0 0 ... donut 0 0 0 0 0 1 0 0 0 0 0 ... green 0 0 0 0 0 0 0 0 0 0 0 ... tea 0 0 0 0 0 0 0 0 0 1 0 ... bread 0 0 1 0 0 0 0 0 0 0 0 ... sticks 0 0 0 0 0 0 0 0 0 0 0 ... exact term lookup in inverted indexquery Single Term Searches (as a Vector)
  • 57.
    Combined Vector query Multi-term QueryVectors juice 0 0 0 0 0 0 0 1 0 0 0 ... apple 1 0 0 0 0 0 0 0 0 0 0 ... + apple juice 1 0 0 0 0 0 0 1 0 0 0 ...
  • 58.
    apple caffeine cheesecoffee drink donut food juice pizza tea water … term N latte 0 0 0 0 0 0 0 0 0 0 0 ... cappuccino 0 0 0 0 0 0 0 0 0 0 0 ... apple juice 1 0 0 0 0 0 0 1 0 0 0 ... cheese pizza 0 0 1 0 0 0 0 0 1 0 0 ... donut 0 0 0 0 0 1 0 0 0 0 0 ... soda 0 0 0 0 0 0 0 0 0 0 0 ... green tea 0 0 0 0 0 0 0 0 0 1 0 ... water 0 0 0 0 0 0 0 0 0 0 1 ... cheese bread sticks 0 0 1 0 0 0 0 0 0 0 0 ... cinnamon sticks 0 0 0 0 0 0 0 0 0 0 0 ... exact term lookup in inverted indexquery Multi-term Searches
  • 59.
    food drink dairybread caffeine sweet calories healthy apple juice 0 5 0 0 0 4 4 3 cappuccino 0 5 3 0 4 1 2 3 cheese bread sticks 5 0 4 5 0 1 4 2 cheese pizza 5 0 4 4 0 1 5 2 cinnamon bread sticks 5 0 1 5 0 3 4 2 donut 5 0 1 5 0 4 5 1 green tea 0 5 0 0 2 1 1 5 latte 0 5 4 0 4 1 3 3 soda 0 5 0 0 3 5 5 0 water 0 5 0 0 0 0 0 5 Dimensionality Reduction
  • 60.
    Phrase: Vector: apple juice:[ 0, 5, 0, 0, 0, 4, 4, 3 ] cappuccino: [ 0, 5, 3, 0, 4, 1, 2, 3 ] cheese bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ] cheese pizza: [ 5, 0, 4, 4, 0, 1, 5, 2 ] cinnamon bread sticks: [ 5, 0, 4, 5, 0, 1, 4, 2 ] donut: [ 5, 0, 1, 5, 0, 4, 5, 1 ] green tea: [ 0, 5, 0, 0, 2, 1, 1, 5 ] latte: [ 0, 5, 4, 0, 4, 1, 3, 3 ] soda: [ 0, 5, 0, 0, 3, 5, 5, 0 ] water: [ 0, 5, 0, 0, 0, 0, 0, 5 ] Ranked Results: Green Tea 0.94 water 0.85 cappuccino 0.80 latte 0.78 apple juice 0.60 soda … … 0.19 donut Vector Similarity Scores: Vector Similarity (a, b): cos(θ) = a · b |a| × |b| Ranked Results: Cheese Pizza 0.99 cheese bread sticks 0.91 cinnamon bread sticks 0.89 donut 0.47 latte 0.46 apple juice … … 0.19 water Vector Similarity Scoring
  • 61.
    Vector Similarity Scores: PerformanceConsiderations Problem: Vector Scoring is Slow • Unlike keyword search, which looks up pre-indexed answers to queries, Vector Search must instead calculate similarities between the query vector and every document’s vectors to determine best matches, which is slow at scale. Solution: Quantized Vectors • “Quantization” is the process for mapping vectors features to discrete values. • Creating “tokens” which map to a similar vector space, enables matching on those tokens to perform an ANN (Approximate Nearest Neighbor) search • This enables converting vector scoring into a search problem (term lookup and scoring), which is fast again, at the expense of some recall and scoring accuracy Recommended Approach: Quantized Vector Search + Vector Similarity Reranking • Combine the best of both worlds by running an initial ANN search on a quantized vector representation, and then re-rank the top-N results using full Vector similarity scoring.
  • 62.
  • 63.
  • 64.
    curl -X POST-H "Content-Type: application/json" http://localhost:8983/solr/food/update?commit=true --data-binary ' [ {"id": "1", "name_s":"donut", "vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}, {"id": "2", "name_s":"apple juice", "vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]}, {"id": "3", "name_s":"cappuccino", "vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]}, {"id": "4", "name_s":"cheese pizza", "vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}, {"id": "5", "name_s":"green tea", "vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]}, {"id": "6", "name_s":"latte", "vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]}, {"id": "7", "name_s":"soda", "vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]}, {"id": "8", "name_s":"cheese bread sticks", "vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]}, {"id": "9", "name_s":"water", "vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]}, {"id": "10", "name_s":"cinnamon bread sticks", "vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]} ] ' Send Documents to Solr: Streaming Expressions
  • 65.
  • 66.
  • 67.
    http://localhost:8983/solr/food/select?q=*:*&fl=id,name_s& fq={!streaming_expression}top( select( search(food, q="*:*", fl="id,vector_fs",sort="id asc"), cosineSimilarity(vector_fs, array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort="cos desc” ) { "responseHeader":{ … }, "response":{"numFound":5,"start":0,"docs":[ { "name_s":"donut", "id":"1"}, { "name_s":"apple juice", "id":"2"}, { "name_s":"cheese pizza", "id":"4"}, { "name_s":"cheese bread sticks", "id":"8"}, { "name_s":"cinnamon bread sticks", "id":"10"}] }} Request: Response: Streaming Expressions Query Parser
  • 68.
    Option 3: Solr VectorScoring Plugin
  • 69.
    Send Documents toSolr: curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection-name}/update?commit=true -- data-binary ‘ [ {"name":"example 0", "vector":"0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33"}, {"name":"example 1", "vector":"0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25"}, {"name":"example 2", "vector":"0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01"}, {"name":"example 3", "vector":"0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9"}, {"name":"example 4", "vector":"0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1"}, {"name":"example 5", "vector":"0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09"} ]' Solr Vector Scoring Plugin
  • 70.
    Request: Response: http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="0.1,4.75,0.3,1.2,0.7,4.0” } { "responseHeader":{"status":0, "QTime":1}}, "response":{ "numFound":6,"start":0,"maxScore":0.99984086, "docs":[ { "name":["example 3"], "vector":["0|0.06 1|4.73 2|0.29 3|1.27 4|0.69 5|3.9 "], "score":0.99984086}, { "name":["example 0"], "vector":["0|1.55 1|3.53 2|2.3 3|0.7 4|3.44 5|2.33 "], "score":0.7693964}, { "name":["example 5"], "vector":["0|0.64 1|3.95 2|1.03 3|1.65 4|0.99 5|0.09 "], "score":0.76322395}, { "name":["example 4"], "vector":["0|4.01 1|3.69 2|2 3|4.36 4|1.09 5|0.1 "], "score":0.5328145}, { "name":["example 1"], "vector":["0|3.54 1|0.4 2|4.16 3|4.88 4|4.28 5|4.25 "], "score":0.48513117}, { "name":["example 2"], "vector":["0|1.11 1|0.6 2|1.47 3|1.99 4|2.91 5|1.01 "], "score":0.44909418}] }} Solr Vector Scoring Plugin
  • 71.
    Option 4: Solr VectorScoring + LSH Plugin
  • 72.
    Send Documents toSolr: Solr Vector Scoring + LSH Plugin curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/{your-collection- name}/update?update.chain=LSH&commit=true --data-binary ‘ [ {"id":"1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33"}, {"id":"2", "vector":"3.54,0.4,4.16,4.88,4.28,4.25"} ]' http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_ Request:
  • 73.
    Response: Solr Vector Scoring+ LSH Plugin { "responseHeader":{ "status":0, "QTime":8, "response":{"numFound":1,"start":0,"maxScore":36.65736, "docs":[ { "id": "1", "vector":"1.55,3.53,2.3,0.7,3.44,2.33", "_vector_":"/z/GZmZAYeuFQBMzMz8zMzNAXCj2QBUeuA==", "_lsh_hash_":["0_8", "1_35", "2_7", "3_10", "4_2", "5_35", "6_16", "7_30", "8_27", "9_12", "10_7", "11_32", "12_48", "13_36", "14_10", "15_7", "16_42", "17_5", "18_3", "19_2", "20_1", "21_0", "22_24", "23_18", "24_42", "25_31", "26_35", "27_8", "28_1", "29_24", "30_47", "31_14", "32_22", "33_39", "34_0", "35_34", "36_34", "37_39", "38_27", "39_27", "40_45", "41_10", "42_21", "43_34", "44_41", "45_9", "46_31", "47_0", "48_4", "49_43"], "score":36.65736} ] } } http://localhost:8983/solr/{your-collection-name}/query?fl=name,score,vector&q={!vp f=vector vector="1.55,3.53,2.3,0.7,3.44,2.33" lsh="true" reRankDocs="5"}&fl=name,score,vector,_vector_,_lsh_hash_ Request:
  • 74.
    Option 5 (Workin Progress): First-class Vector Fields in Lucene/Solr
  • 75.
  • 76.
    ANN Benchmarks (Approximate NearestNeighbor) https://github.com/erikbern/ann-benchmarks
  • 77.
  • 78.
    • Take queries,documents, sentences, paragraphs, etc. and transform them into vectors. • Usually leverage deep learning, which can discover rich language usage rules and map them to combinations of features in the vector • Popular Libraries: • Bert • Elmo • Universal Sentence Encoder • Word2Vec • Sentence2Vec • Glove • fastText • many more … Vector Encoders
  • 80.
    Query Type LikelyOutcome Obscure keyword combinations Q. (software OR hardware) AND enginee* • Keyword search succeeds • Vector Search fails Natural Language Queries Q. Can my wife drive on my insurance? • Keyword search might get lucky, but probably fails • Vector Search succeeds Fuzzy Language Queries Q. famous french tower • Keyword search mismatch yields poor results • Vector Search succeeds Structured Relationship Queries Q. popular bbq near Activate • Keyword search fails • Vector search fails • Need a Knowledge Graph! Keyword Search vs. Vector Search
  • 81.
    Giant Graph ofRelationships... Trey Grainger works for Lucidworks. He spoke at the Activate 2019 conference. #Activate19 (Activate) wqs held in Washington, DC September 9-12, 2019. Trey got his masters degree from Georgia Tech. Trey’s Voicemail
  • 82.
  • 83.
    id: 1 job_title: SoftwareEngineer desc: software engineer at a great company skills: .Net, C#, java id: 2 job_title: Registered Nurse desc: a registered nurse at hospital doing hard work skills: oncology, phlebotemy id: 3 job_title: Java Developer desc: a software engineer or a java engineer doing work skills: java, scala, hibernate field doc term desc 1 a at company engineer great software 2 a at doing hard hospital nurse registered work 3 a doing engineer java or software work job_title 1 Software Engineer … … … Terms-Docs Inverted IndexDocs-Terms Forward IndexDocuments Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph field term postings list doc pos desc a 1 4 2 1 3 1, 5 at 1 3 2 4 company 1 6 doing 2 6 3 8 engineer 1 2 3 3, 7 great 1 5 hard 2 7 hospital 2 5 java 3 6 nurse 2 3 or 3 4 registered 2 2 software 1 1 3 2 work 2 10 3 9 job_title java developer 3 1 … … … …
  • 84.
    Related term vector(for query concept expansion) http://localhost:8983/solr/stack-exchange-health/skg
  • 85.
    Disambiguation by CategoryExample Meaning 1: Restaurant => bbq, brisket, ribs, pork, … Meaning 2: Outdoor Equipment => bbq, grill, charcoal, propane, …
  • 91.
  • 93.
  • 94.
    Demo Data Places (alsoincludes geonames database) Entities (includes search commands) Text Content [ Web crawl of restaurant and product reviews sites ]
  • 95.
    Solr Knowledge GraphTraversal Query "bbq",
  • 97.
    Why this SemanticNuance Matters
  • 98.
    popular barbeque nearActivate (popular same as "good", "top", "best") Hotels near Haystack EU hotels near popular BBQ in Berlin BBQ near airports near Berlin hotels near movie theaters in Berlin … Other Knowledge Graph Search examples:
  • 99.
    Keyword Search Knowledge Graph User Intent Personalized Search Semantic Search Domain-aware Matching Dimensionsof User Intent Content Understanding Domain Understanding Collaborative Recommendations User Understanding
  • 100.
    News Search :popularity and freshness drive relevance Restaurant Search: geographical proximity and price range are critical Ecommerce: likelihood of a purchase is key Movie search: More popular titles are generally more relevant Job search: category of job, salary range, and geographical proximity matter The right ranking algorithm is domain and context-dependent
  • 101.
    Example Combining Content+ Domain + User Context News website: /select? fq={!cache=false v=$keywords}& q= {!func}scale(query($keywords),0,25) {!func}scale(geodist(),0,25) {!func}recip(rord(publicationDate),1,25,0) {!func}scale(popularity,0,25)& keywords="fall festival"& sfield=location& pt=33.748,-84.391 25% 25% 25% 25% *Example from chapter 16 of Solr in Action
  • 102.
    But how dowe figure out the right balance of weights?
  • 103.
    Learning to Rank User Searches User Sees Results User takesan action Users’ actions inform system improvements User Query Re Alonzo ipad do do do Elena printer do do do Ming ipad do do do … … … User Action Document Alonzo click doc22 Elena click doc17 Ming click doc12 Alonzo purchase doc22 Ming click doc22 Ming purchase doc22 Elena click doc2 … … … Feature Weight title_match_all_terms 15.25 exact_phrase_match 10 signal_boost 9.5 content_age 9.2 user_geo_distance 6.5 personalization_cat_1 2.8 doc_popularity 2.75 … … ipad ⌕ Initial Results: 1) doc1 2) doc2 3) doc3 Build Ranking Classifier (from Implicit Relevance Judgements) Final Results: 1) doc3 2) doc1 3) doc2
  • 104.
  • 105.
    We operationalize AIfor the largest businesses on the planet.
  • 106.
  • 107.
    Trey Grainger [email protected] @treygrainger Other presentations: http://www.treygrainger.com 40%Discount code: ctwhay19 http://aiPoweredSearch.com http://solrinaction.com Books: Thank You!