SlideShare a Scribd company logo
Rapid
Solr Schema
Development
Alexandre Rafalovitch (@arafalov)
Apache Solr Committer
Montreal Solr/ML meetup May 2018
Phone directory - content
Names, often from multiple cultures
Addresses
Phone numbers
Company/Group
Locations
Other fun data
I use https://www.fakenamegenerator.com/ for demos
 Can generate bulk entries in csv, tab-separated, sql, etc
 Many fields, languages, regions
 Warning: comes with an – invisible – byte order mark
Slide 2
Today's exploration
Solr 7.3 (latest)
The smallest learning schema/configuration required
Rapid schema evolution workflow
Free-form and fielded user entry
Dealing with multiple languages
Dealing with alternative name spellings
Searching phone numbers by any-length suffix
Configuring Solr to simplify API interface
(Bonus points) Fit into 40 minutes presentation!
Slide 3
Today's dataset
http://www.fakenamegenerator.com/ - Bulk request (20000 identities) – Free and configurable!
Name sets: American, Arabic, Australian, Chinese, French, Hispanic, Polish, Russian, Russian
(Cyrillic), Thai
Countries: Australia, Canada, France, Poland, Spain, United Kingdom, United States
Age range: 19 - 85 years old
Gender: 50% male, 50% female
Fields:
id,Gender,NameSet,Title,GivenName,MiddleInitial,Surname,StreetAddress,City,StateFull,ZipCod
e,CountryFull,EmailAddress,Username,TelephoneNumber,TelephoneCountryCode,Birthday,Age,T
ropicalZodiac,Color,Occupation,Company,BloodType,Kilograms,Centimeters,GUID,Latitude,Longi
tude
Renamed first field (Number) to id to fit Solr's naming convention
Removed BOM (in Vim, :set nobomb)
Slide 4
First try – Solr's built in schema
bin/solr start – standalone (non-clustered) server with no initial collections
bin/solr create -c demo1 – uses default configset, with 'schemaless' mode, not for production
Starts with 4 fields (id, _text_, _version_, _root_)
Auto-creates the rest on first occurance
bin/post -c demo1 ../dataset.csv
auto-detect content type from extension
can bulk upload files
see techproducts shipped example
bin/solr start –e techproducts
For one file, can also do via Admin UI
DEMO
Slide 5
Schemaless schema – lessons learned
Imported 1 record
Failed on the second one, because ZipCode was detected as a number
Can fix that by explicit configuration and rebuilding – see films example
(example/films/README.txt)
Other issues
Dual fields for text and string
Everything multivalued – because "just in case" – No sorting, API is messier, etc
Many large files
managed-schema: 546 lines (without comments)
solrconfig.xml: 1364 lines (with comments)
Plus another 42 configuration files, mostly language stopwords
Home work to get this working – not enough time today
Slide 6
Learning schema
managed-schema: start from nearly nothing – add as needed
solrconfig.xml: start from nearly all defaults – Most definitely NOT production ready
Not SolrCloud ready – add those as you scale
No extra field types – add as you need them
How small can we go?!?
Based on exploration done for my presentation at Lucene/Solr Revolution 2016
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-
2016 (slides and video)
https://github.com/arafalov/solr-deconstructing-films-example - repo
A bit out of date – schemaless mode was tuned since
Today's version uses latest Solr feature
https://github.com/arafalov/solr-presentation-2018-may/commits/master (changes commit-
by-commit)
Slide 7
Learning schema – managed-schema
<?xml version="1.0" encoding="UTF-8"?>
<schema name="smallest-config" version="1.6">
<field name="id" type="string" required="true" indexed="true" stored="true" />
<field name="_text_" type="text_basic" multiValued="true" indexed="true"
stored="false" docValues="false"/>
<dynamicField name="*" type="text_basic" indexed="true" stored="true"/>
<copyField source="*" dest="_text_"/>
<uniqueKey>id</uniqueKey>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/>
<fieldType name="text_basic" class="solr.SortableTextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</schema>
Slide 8
Learning schema – solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>7.3.0</luceneMatchVersion>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="df">_text_</str>
<str name="echoParams">all</str>
</lst>
</requestHandler>
</config>
Slide 9
2 files, 33 lines combined, including blanks – but Will It Blend Search?
bin/solr create -c tinydir -d ../configs/smallest/ - provide custom config files to the collection
bin/post -c tinydir ../dataset.csv – Remember the BOM and renaming column Number->id
Does it search?
General search?
Case-insensitive search?
Range search: Centimeters:[* TO 99]
Fielded search?
Facet?
Sort?
Are ids preserved?
Are individual fields easy to work with (fl, etc)?
DEMO
Learning schema – create and index
Slide 10
It works! And ready to start being used from other parts of the project
Do NOT expose Solr directly to the Internet. Not until you are a Solr Wizard, the Gray.
managed-schema file has NOT changed – because of dynamicField
Still 21 lines
Would still keep the comments
Would still preserve field/type definitions
Will change on first AdminUI/API modification – gets rewritten
What else? Actual search-engine tuning!
Special cases
Numerics – e.g. for Range search
Spatial search – e.g. for Mapping/distance ranking
Multivalued fields
Dates
Special parsing (e.g. names/surnames)
Useful telephone number search
Relevancy tuning!
Learning schema - conclusion
Slide 11
Several possibilities
Admin UI
Delete schema field
Add schema field with new definition
Reindex
Sometimes causes docValue-related exception, have to rebuild collection from scratch
Schema API (Admin UI uses a subset of it)
See: https://lucene.apache.org/solr/guide/7_3/schema-api.html
Also has Replace a Field
Also has Add/Delete Field Type
Great to use programmatically or with something like Postman (https://www.getpostman.com/)
Edit schema/solrconfig.xml directly and reload the collection
Not recommended for production, but OK with a single server/single developer
Remember to edit actual scheme not the original config one
◦ Check "Instance" location in Admin UI, in collections' Overview screen
Remember that in SolrCloud mode, the config files are NOT on disk (they are in ZooKeeper).
Evolving schema
Slide 12
Numeric fields
 Age – int
 Centimeters (height?) – int
 Kilograms – float
Copy missing field types (pint, pfloat) from solr-7.3.0/server/solr/configsets/_default/conf/managed-schema
Map numeric fields explicitly
Delete content due to radical storage needs change
 bin/post -c tinydir -format solr -d "<delete><query>*:*</query></delete>"
Reload the core in Admin UI's Core Admin (menu is different in SolrCloud mode)
Index again
 bin/post -c tinydir ../dataset.csv
New queries
 facet=true&facet.range=Age&facet.range.start=0&facet.range.end=200&facet.range.gap=10
 Centimeters:[* TO 99] (again)
DEMO
Evolving schema – add numeric fields
Slide 13
Solr supports extensive spatial search
https://lucene.apache.org/solr/guide/7_3/spatial-search.html
bounding-box with different shapes (circles, polygons, etc)
distance limiting or boosting
different options with different functionalities
LatLonPointSpatialField
SpatialRecursivePrefixTreeFieldType
BBoxField
All require combined Lat Lon coordinates (lat,lon)
We are providing separate Latitude and Longitude fields – need to merge them with a comma
Let's copy a field type and create a field:
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true"
distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />
<field name="location" type="location_rpt" indexed="true" stored="true" />
Remember to reload – no need to delete, as it is a new field
Next, need to also give merge instructions with an Update Request Processor
Evolving schema – spatial search
Slide 14
Update Request Processors
Deal with the data before it touches the schema
Can do pre-processing magic with many, many processors
See: https://lucene.apache.org/solr/guide/7_3/update-request-processors.html
See: http://www.solr-start.com/info/update-request-processors/ (mine)
Some are more magical then others and have shortcuts, e.g. TemplateUpdateProcessorFactory
All can be configured with chains in solrconfig.xml and apply explicitly or by default
That's how the schemaless mode works (default chain in solrconfig.xml of _default configset)
Also check the way dates are parsed in it, search for parse-date – can be used standalone
IgnoreFieldUpdateProcessorFactory could be useful to drop fields we don't want Solr to process at all
(including in collect-all _text_ field)
Let's reindex everything using the template to populate the new field:
bin/post -c tinydir -params "processor=template&template.field=location:{Latitude},{Longitude}" ../dataset.csv
Query:
q=*:*&rows=1&
fq={!geofilt sfield=location}&
pt=45.493444, -73.558154&d=100&
facet=on&facet.field=City&facet.mincount=1
DEMO
Evolving schema – URPs
Slide 15
Search for John and look at the phone numbers (q=John&fl=TelephoneNumber):
03.99.56.91.63
(08) 9435 3911
79 196 65 43
306-724-3986
Can we search that?
TelephoneNumber:3911 – yes
TelephoneNumber:"65 43" – sort of (need to quote or know these are together)
TelephoneNumber:3986 – sort of: some at the end, some at middle
Use Case: Just search the last digits (suffix) regardless of formatting
We have MANY analyzers, tokenizers, and character and token filters to help us with it
https://lucene.apache.org/solr/guide/7_3/understanding-analyzers-tokenizers-and-filters.html
http://www.solr-start.com/info/analyzers/ (mine)
Evolving schema – phone numbers
Slide 16
Let's define a super-custom field type:
<fieldType name="phone" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])"
replacement="" replace="all"/>
<filter class="solr.ReverseStringFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])"
replacement="" replace="all"/>
<filter class="solr.ReverseStringFilterFactory"/>
</analyzer>
</fieldType>
Notice
Asymmetric analyzers
Reversing the string to make it end-digits starts digit (make sure that's symmetric!)
Edge n-grams (3-30 character substrings) - makes the index larger, but the search very fast
Evolving schema – digits-only type
Slide 17
Remap TelephoneNumber to it
<field name="TelephoneNumber" type="phone"
indexed="true" stored="true" />
And reindex (don't forget our speed hack' for now):
bin/post -c tinydir -params
"processor=template&template.field=location:{Latitude},{Longitude
}" ../dataset.csv
Check terms in Admin UI Schema screen and do our test searches
TelephoneNumber:3911
TelephoneNumber:"65 43"
TelephoneNumber:6543
TelephoneNumber:3986
DEMO
Evolving schema – digits-only type - cont
Slide 18
Many languages have accents on letters
Frédéric, Thérèse, Jérôme
Many users can't be bothered to type them
Sometimes, they don't even know how to type them
Łódź, Kędzierzyn-Koźle
Can we just ignore the accents when we search?
Several ways, but let's use the simplest by insert a filter into the text_basic type definition
<filter class="solr.ASCIIFoldingFilterFactory" />
Before the LowerCaseFilterFactory
Reload the collection and reindex – because the filter is symmetric (affects indexing)
Search without accents, general or fielded
Lodz, Frederic, Therese, GivenName:jerome
DEMO
Evolving schema – collapsing accents
Slide 19
What are similar names to 'Alexandre':
q=GivenName:Alexandre~2&
facet=on&facet.field=GivenName&facet.mincount=1
Alexander, Alexandra, Alexandrin, Leixandre, Alexandre, Alexandrie
We can't ask the user to enter arcane Solr syntax
Let's do a phonetic search instead
Bunch of different ways, each with its own tradeoffs
PhoneticFilterFactory, BeiderMorseFilterFactory, DaitchMokotoffSoundexFilterFactory,
DoubleMetaphoneFilterFactory,....
https://lucene.apache.org/solr/guide/7_3/phonetic-matching.html
Best to have one - or several - separate Field Type definitions with a copy field
Allows to experiment
Allows to trigger them at different times (e.g. in advanced search, but not general one)
Allows to tune them for relevancy by assign different weights
Evolving schema – Names and Surnames
Slide 20
How do we actually search multiple fields at once?
We've been using the default 'lucene' query parser so far on either _text_ or specific field
Solr has MANY parsers
General: "lucene", DisMax, Extended DisMax (edismax)
Specialized: Block Join, Boolean, Boost, Collapsing, Complex Phrase, Field, Filters, Function, Function Range,
Graph, Join, Learning to Rank, .....
 https://lucene.apache.org/solr/guide/7_3/other-parsers.html
We already used Spatial geofilt query parser: fq={!geofilt sfield=location}
edismax allows to search against multiple fields, with different weights, boosts, ties, minimum-
match specifications, etc
Choose with defType=edismax or {edismax param=value param=value}search_string
Let's search for "George Brown" against (qf) "GivenName Surname Company StreetAddress City"
and display same fields only
DEMO
Try using http://splainer.io/ to review the results
Try with qf=GivenName^5 Surname^5 Company StreetAddress City
Side-trip into eDisMax and query parsers
Slide 21
Result: 149 records, but all over the field values
Enter RELEVANCY
Recall – did we find all documents?
Precision – did we find just the documents we needed
Recall and Precision – fight. Perfect recall is q=*:* ......
Ranking – First hit is very important, ones after that less so (not always)
Side note: Field sorting destroys ranking.
We were optimizing Recall
Dump everything into _text_ and let search sort it out
Optimizing for Precision may seem easy too
Under eDisMax, set mm=100%
DEMO
eDisMax exploration continues
Slide 22
It is a business decision what Precision and Recall mean for your use case
Often "find more just in case" and focus on "ranking better" is the right approach
Try
qf=GivenName^5 Surname^5 Company StreetAddress City (no mm)
qf=GivenName^5 Surname^5 Company StreetAddress City and mm=100%
qf=GivenName^5 Surname^5 _text_ and mm=100%
DEMO in Splainer
Relevancy business case for our names (GivenName, Surname)
UPPER/lower case does not matter
Exact spelling (with accents) matches best – new Field Type needed (actually original text_basic...)
Accent-free spelling matches next – existing text_basic and therefore dynamic field match is fine
Phonetic spelling matches lowest (but higher than fallback _text_ field) – new Field Type needed
eDisMax for ranking
Slide 23
<fieldType name="text_exact" class="solr.SortableTextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_phonetic" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
</analyzer>
</fieldType>
<field name="GivenName_exact" type="text_exact" indexed="true" stored="false"/>
<field name="Surname_exact" type="text_exact" indexed="true" stored="false"/>
<field name="GivenName_ph" type="text_phonetic" indexed="true" stored="false"/>
<field name="Surname_ph" type="text_phonetic" indexed="true" stored="false"/>
<copyField source="GivenName" dest="GivenName_exact"/>
<copyField source="GivenName" dest="GivenName_ph"/>
<copyField source="Surname" dest="Surname_exact"/>
<copyField source="Surname" dest="Surname_ph"/>
Multiple fields for same content
Slide 24
Our test cases
Frédéric, Thérèse, Jérôme
Check different analysis in Admin UI's Analysis screen
Can choose fields or field types from drop-down, use types as we have dynamic fields
Can also test analysis vs search and highlight the matches
Test search with Admin UI and Splainer with eDisMax enabled and Thérèse against different set
of Query Fields (qf)
Default search (qf=_text_)
GivenName
GivenName _text_
GivenName^10 _text_
GivenName_exact^15 GivenName^10 GivenName_ph^5 _text_
DEMO
Testing multiple representations
Slide 25
Original search URL: http://...:8983/solr/tinydir/select?defType=edismax&fl=.....
The good parameter set:
defType=edismax
qf=GivenName_exact^15 GivenName^10 GivenName_ph^5% _text_
fl=GivenName Surname Company StreetAddress City CountryFull
Lock it in a dedicated request handler in solrconfig.xml
<requestHandler name="/namesearch" class="solr.SearchHandler">
<lst name="defaults">
<str name="df">_text_</str>
<str name="echoParams">all</str>
<str name="defType">edismax</str>
<str name="qf">GivenName_exact^15 GivenName^10 GivenName_ph^5 _text_</str>
<str name="fl">GivenName Surname Company StreetAddress City CountryFull</str>
</lst>
</requestHandler>
Now: http://...:8983/solr/tinydir/namesearch?q=Thérèse
DEMO
Simplify API usage
Slide 26
Based on previous work with Thai language: https://github.com/arafalov/solr-thai-test
Needs ICU libraries in solrconfig.xml
 <lib path="../../../contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-7.3.0.jar" />
<lib path="../../../contrib/analysis-extras/lib/icu4j-59.1.jar" />
Field, type, and copyField definition in managed-schema:
<fieldType name="text_ru_en" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.ICUTokenizerFactory"/>
<filter class="solr.ICUTransformFilterFactory" id="ru-en" />
<filter class="solr.BeiderMorseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.BeiderMorseFilterFactory" />
</analyzer>
</fieldType>
<field name="GivenName_ruen" type="text_ru_en" indexed="true" stored="false"/>
<copyField source="GivenName" dest="GivenName_ruen"/>
Reload, reindex
Search
 GivenName:Zahar
 GivenName_ruen:Zahar
And BOOM!
Bonus magic
Slide 27
Rapid
Solr Schema
Development
Alexandre Rafalovitch (@arafalov)
Apache Solr Committer
Montreal Solr/ML meetup May 2018

More Related Content

PDF
From content to search: speed-dating Apache Solr (ApacheCON 2018)
Alexandre Rafalovitch
 
ODP
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 
PPTX
JSON in Solr: from top to bottom
Alexandre Rafalovitch
 
PDF
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
 
PPTX
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
ODP
Mastering solr
jurcello
 
PDF
Make your gui shine with ajax solr
lucenerevolution
 
PDF
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
Alexandre Rafalovitch
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Alexandre Rafalovitch
 
JSON in Solr: from top to bottom
Alexandre Rafalovitch
 
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Mastering solr
jurcello
 
Make your gui shine with ajax solr
lucenerevolution
 
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 

What's hot (20)

PDF
Solr Indexing and Analysis Tricks
Erik Hatcher
 
PPTX
Apache Solr + ajax solr
Net7
 
PDF
Apache Solr Workshop
Saumitra Srivastav
 
PDF
Solr Black Belt Pre-conference
Erik Hatcher
 
PDF
An Introduction to Basics of Search and Relevancy with Apache Solr
Lucidworks (Archived)
 
PPS
Introduction to Solr
Jayesh Bhoyar
 
PDF
Using Apache Solr
pittaya
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PDF
Solr Recipes Workshop
Erik Hatcher
 
PDF
Solr workshop
Yasas Senarath
 
PPTX
Solr 6 Feature Preview
Yonik Seeley
 
PPT
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PDF
Solr Powered Lucene
Erik Hatcher
 
PDF
Solr Flair
Erik Hatcher
 
PDF
Solr Query Parsing
Erik Hatcher
 
PPTX
Apache Solr
Minh Tran
 
PPT
Solr Presentation
Gaurav Verma
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Apache Solr + ajax solr
Net7
 
Apache Solr Workshop
Saumitra Srivastav
 
Solr Black Belt Pre-conference
Erik Hatcher
 
An Introduction to Basics of Search and Relevancy with Apache Solr
Lucidworks (Archived)
 
Introduction to Solr
Jayesh Bhoyar
 
Using Apache Solr
pittaya
 
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Erik Hatcher
 
Solr Recipes Workshop
Erik Hatcher
 
Solr workshop
Yasas Senarath
 
Solr 6 Feature Preview
Yonik Seeley
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Lucene for Solr Developers
Erik Hatcher
 
Solr Powered Lucene
Erik Hatcher
 
Solr Flair
Erik Hatcher
 
Solr Query Parsing
Erik Hatcher
 
Apache Solr
Minh Tran
 
Solr Presentation
Gaurav Verma
 
Rapid Prototyping with Solr
Erik Hatcher
 
Ad

Similar to Rapid Solr Schema Development (Phone directory) (20)

PDF
New-Age Search through Apache Solr
Edureka!
 
PDF
Tuning and optimizing webcenter spaces application white paper
Vinay Kumar
 
ODP
Drupal Efficiency - Coding, Deployment, Scaling
smattoon
 
PPT
Open Source Content Management Systems
Matthew Turland
 
ODP
Drupal Efficiency using open source technologies from Sun
smattoon
 
PDF
Lightweight web frameworks
Jonathan Holloway
 
PPT
Ruby On Rails
guest4faf46
 
PPS
Simplify your professional web development with symfony
Francois Zaninotto
 
PPT
Intro to-html-backbone
zonathen
 
PPT
Using and Extending Memory Analyzer into Uncharted Waters
Vladimir Pavlov
 
PPT
Dn D Custom 1
ptalindstrom
 
PPT
Dn D Custom 1
ptalindstrom
 
PDF
New-Age Search through Apache Solr
Edureka!
 
PDF
Crash Course HTML/Rails Slides
Udita Plaha
 
PDF
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Mack Hardy
 
PPT
Enterprise search in_drupal_pub
dstuartnz
 
PPT
Red5workshop 090619073420-phpapp02
arghya007
 
PDF
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
PDF
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
New-Age Search through Apache Solr
Edureka!
 
Tuning and optimizing webcenter spaces application white paper
Vinay Kumar
 
Drupal Efficiency - Coding, Deployment, Scaling
smattoon
 
Open Source Content Management Systems
Matthew Turland
 
Drupal Efficiency using open source technologies from Sun
smattoon
 
Lightweight web frameworks
Jonathan Holloway
 
Ruby On Rails
guest4faf46
 
Simplify your professional web development with symfony
Francois Zaninotto
 
Intro to-html-backbone
zonathen
 
Using and Extending Memory Analyzer into Uncharted Waters
Vladimir Pavlov
 
Dn D Custom 1
ptalindstrom
 
Dn D Custom 1
ptalindstrom
 
New-Age Search through Apache Solr
Edureka!
 
Crash Course HTML/Rails Slides
Udita Plaha
 
Strategies and Tips for Building Enterprise Drupal Applications - PNWDS 2013
Mack Hardy
 
Enterprise search in_drupal_pub
dstuartnz
 
Red5workshop 090619073420-phpapp02
arghya007
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Front Range PHP NoSQL Databases
Jon Meredith
 
Ad

Recently uploaded (20)

PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 

Rapid Solr Schema Development (Phone directory)

  • 1. Rapid Solr Schema Development Alexandre Rafalovitch (@arafalov) Apache Solr Committer Montreal Solr/ML meetup May 2018
  • 2. Phone directory - content Names, often from multiple cultures Addresses Phone numbers Company/Group Locations Other fun data I use https://www.fakenamegenerator.com/ for demos  Can generate bulk entries in csv, tab-separated, sql, etc  Many fields, languages, regions  Warning: comes with an – invisible – byte order mark Slide 2
  • 3. Today's exploration Solr 7.3 (latest) The smallest learning schema/configuration required Rapid schema evolution workflow Free-form and fielded user entry Dealing with multiple languages Dealing with alternative name spellings Searching phone numbers by any-length suffix Configuring Solr to simplify API interface (Bonus points) Fit into 40 minutes presentation! Slide 3
  • 4. Today's dataset http://www.fakenamegenerator.com/ - Bulk request (20000 identities) – Free and configurable! Name sets: American, Arabic, Australian, Chinese, French, Hispanic, Polish, Russian, Russian (Cyrillic), Thai Countries: Australia, Canada, France, Poland, Spain, United Kingdom, United States Age range: 19 - 85 years old Gender: 50% male, 50% female Fields: id,Gender,NameSet,Title,GivenName,MiddleInitial,Surname,StreetAddress,City,StateFull,ZipCod e,CountryFull,EmailAddress,Username,TelephoneNumber,TelephoneCountryCode,Birthday,Age,T ropicalZodiac,Color,Occupation,Company,BloodType,Kilograms,Centimeters,GUID,Latitude,Longi tude Renamed first field (Number) to id to fit Solr's naming convention Removed BOM (in Vim, :set nobomb) Slide 4
  • 5. First try – Solr's built in schema bin/solr start – standalone (non-clustered) server with no initial collections bin/solr create -c demo1 – uses default configset, with 'schemaless' mode, not for production Starts with 4 fields (id, _text_, _version_, _root_) Auto-creates the rest on first occurance bin/post -c demo1 ../dataset.csv auto-detect content type from extension can bulk upload files see techproducts shipped example bin/solr start –e techproducts For one file, can also do via Admin UI DEMO Slide 5
  • 6. Schemaless schema – lessons learned Imported 1 record Failed on the second one, because ZipCode was detected as a number Can fix that by explicit configuration and rebuilding – see films example (example/films/README.txt) Other issues Dual fields for text and string Everything multivalued – because "just in case" – No sorting, API is messier, etc Many large files managed-schema: 546 lines (without comments) solrconfig.xml: 1364 lines (with comments) Plus another 42 configuration files, mostly language stopwords Home work to get this working – not enough time today Slide 6
  • 7. Learning schema managed-schema: start from nearly nothing – add as needed solrconfig.xml: start from nearly all defaults – Most definitely NOT production ready Not SolrCloud ready – add those as you scale No extra field types – add as you need them How small can we go?!? Based on exploration done for my presentation at Lucene/Solr Revolution 2016 https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution- 2016 (slides and video) https://github.com/arafalov/solr-deconstructing-films-example - repo A bit out of date – schemaless mode was tuned since Today's version uses latest Solr feature https://github.com/arafalov/solr-presentation-2018-may/commits/master (changes commit- by-commit) Slide 7
  • 8. Learning schema – managed-schema <?xml version="1.0" encoding="UTF-8"?> <schema name="smallest-config" version="1.6"> <field name="id" type="string" required="true" indexed="true" stored="true" /> <field name="_text_" type="text_basic" multiValued="true" indexed="true" stored="false" docValues="false"/> <dynamicField name="*" type="text_basic" indexed="true" stored="true"/> <copyField source="*" dest="_text_"/> <uniqueKey>id</uniqueKey> <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true"/> <fieldType name="text_basic" class="solr.SortableTextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> </schema> Slide 8
  • 9. Learning schema – solrconfig.xml <?xml version="1.0" encoding="UTF-8" ?> <config> <luceneMatchVersion>7.3.0</luceneMatchVersion> <requestHandler name="/select" class="solr.SearchHandler"> <lst name="defaults"> <str name="df">_text_</str> <str name="echoParams">all</str> </lst> </requestHandler> </config> Slide 9
  • 10. 2 files, 33 lines combined, including blanks – but Will It Blend Search? bin/solr create -c tinydir -d ../configs/smallest/ - provide custom config files to the collection bin/post -c tinydir ../dataset.csv – Remember the BOM and renaming column Number->id Does it search? General search? Case-insensitive search? Range search: Centimeters:[* TO 99] Fielded search? Facet? Sort? Are ids preserved? Are individual fields easy to work with (fl, etc)? DEMO Learning schema – create and index Slide 10
  • 11. It works! And ready to start being used from other parts of the project Do NOT expose Solr directly to the Internet. Not until you are a Solr Wizard, the Gray. managed-schema file has NOT changed – because of dynamicField Still 21 lines Would still keep the comments Would still preserve field/type definitions Will change on first AdminUI/API modification – gets rewritten What else? Actual search-engine tuning! Special cases Numerics – e.g. for Range search Spatial search – e.g. for Mapping/distance ranking Multivalued fields Dates Special parsing (e.g. names/surnames) Useful telephone number search Relevancy tuning! Learning schema - conclusion Slide 11
  • 12. Several possibilities Admin UI Delete schema field Add schema field with new definition Reindex Sometimes causes docValue-related exception, have to rebuild collection from scratch Schema API (Admin UI uses a subset of it) See: https://lucene.apache.org/solr/guide/7_3/schema-api.html Also has Replace a Field Also has Add/Delete Field Type Great to use programmatically or with something like Postman (https://www.getpostman.com/) Edit schema/solrconfig.xml directly and reload the collection Not recommended for production, but OK with a single server/single developer Remember to edit actual scheme not the original config one ◦ Check "Instance" location in Admin UI, in collections' Overview screen Remember that in SolrCloud mode, the config files are NOT on disk (they are in ZooKeeper). Evolving schema Slide 12
  • 13. Numeric fields  Age – int  Centimeters (height?) – int  Kilograms – float Copy missing field types (pint, pfloat) from solr-7.3.0/server/solr/configsets/_default/conf/managed-schema Map numeric fields explicitly Delete content due to radical storage needs change  bin/post -c tinydir -format solr -d "<delete><query>*:*</query></delete>" Reload the core in Admin UI's Core Admin (menu is different in SolrCloud mode) Index again  bin/post -c tinydir ../dataset.csv New queries  facet=true&facet.range=Age&facet.range.start=0&facet.range.end=200&facet.range.gap=10  Centimeters:[* TO 99] (again) DEMO Evolving schema – add numeric fields Slide 13
  • 14. Solr supports extensive spatial search https://lucene.apache.org/solr/guide/7_3/spatial-search.html bounding-box with different shapes (circles, polygons, etc) distance limiting or boosting different options with different functionalities LatLonPointSpatialField SpatialRecursivePrefixTreeFieldType BBoxField All require combined Lat Lon coordinates (lat,lon) We are providing separate Latitude and Longitude fields – need to merge them with a comma Let's copy a field type and create a field: <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" /> <field name="location" type="location_rpt" indexed="true" stored="true" /> Remember to reload – no need to delete, as it is a new field Next, need to also give merge instructions with an Update Request Processor Evolving schema – spatial search Slide 14
  • 15. Update Request Processors Deal with the data before it touches the schema Can do pre-processing magic with many, many processors See: https://lucene.apache.org/solr/guide/7_3/update-request-processors.html See: http://www.solr-start.com/info/update-request-processors/ (mine) Some are more magical then others and have shortcuts, e.g. TemplateUpdateProcessorFactory All can be configured with chains in solrconfig.xml and apply explicitly or by default That's how the schemaless mode works (default chain in solrconfig.xml of _default configset) Also check the way dates are parsed in it, search for parse-date – can be used standalone IgnoreFieldUpdateProcessorFactory could be useful to drop fields we don't want Solr to process at all (including in collect-all _text_ field) Let's reindex everything using the template to populate the new field: bin/post -c tinydir -params "processor=template&template.field=location:{Latitude},{Longitude}" ../dataset.csv Query: q=*:*&rows=1& fq={!geofilt sfield=location}& pt=45.493444, -73.558154&d=100& facet=on&facet.field=City&facet.mincount=1 DEMO Evolving schema – URPs Slide 15
  • 16. Search for John and look at the phone numbers (q=John&fl=TelephoneNumber): 03.99.56.91.63 (08) 9435 3911 79 196 65 43 306-724-3986 Can we search that? TelephoneNumber:3911 – yes TelephoneNumber:"65 43" – sort of (need to quote or know these are together) TelephoneNumber:3986 – sort of: some at the end, some at middle Use Case: Just search the last digits (suffix) regardless of formatting We have MANY analyzers, tokenizers, and character and token filters to help us with it https://lucene.apache.org/solr/guide/7_3/understanding-analyzers-tokenizers-and-filters.html http://www.solr-start.com/info/analyzers/ (mine) Evolving schema – phone numbers Slide 16
  • 17. Let's define a super-custom field type: <fieldType name="phone" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all"/> <filter class="solr.ReverseStringFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all"/> <filter class="solr.ReverseStringFilterFactory"/> </analyzer> </fieldType> Notice Asymmetric analyzers Reversing the string to make it end-digits starts digit (make sure that's symmetric!) Edge n-grams (3-30 character substrings) - makes the index larger, but the search very fast Evolving schema – digits-only type Slide 17
  • 18. Remap TelephoneNumber to it <field name="TelephoneNumber" type="phone" indexed="true" stored="true" /> And reindex (don't forget our speed hack' for now): bin/post -c tinydir -params "processor=template&template.field=location:{Latitude},{Longitude }" ../dataset.csv Check terms in Admin UI Schema screen and do our test searches TelephoneNumber:3911 TelephoneNumber:"65 43" TelephoneNumber:6543 TelephoneNumber:3986 DEMO Evolving schema – digits-only type - cont Slide 18
  • 19. Many languages have accents on letters Frédéric, Thérèse, Jérôme Many users can't be bothered to type them Sometimes, they don't even know how to type them Łódź, Kędzierzyn-Koźle Can we just ignore the accents when we search? Several ways, but let's use the simplest by insert a filter into the text_basic type definition <filter class="solr.ASCIIFoldingFilterFactory" /> Before the LowerCaseFilterFactory Reload the collection and reindex – because the filter is symmetric (affects indexing) Search without accents, general or fielded Lodz, Frederic, Therese, GivenName:jerome DEMO Evolving schema – collapsing accents Slide 19
  • 20. What are similar names to 'Alexandre': q=GivenName:Alexandre~2& facet=on&facet.field=GivenName&facet.mincount=1 Alexander, Alexandra, Alexandrin, Leixandre, Alexandre, Alexandrie We can't ask the user to enter arcane Solr syntax Let's do a phonetic search instead Bunch of different ways, each with its own tradeoffs PhoneticFilterFactory, BeiderMorseFilterFactory, DaitchMokotoffSoundexFilterFactory, DoubleMetaphoneFilterFactory,.... https://lucene.apache.org/solr/guide/7_3/phonetic-matching.html Best to have one - or several - separate Field Type definitions with a copy field Allows to experiment Allows to trigger them at different times (e.g. in advanced search, but not general one) Allows to tune them for relevancy by assign different weights Evolving schema – Names and Surnames Slide 20
  • 21. How do we actually search multiple fields at once? We've been using the default 'lucene' query parser so far on either _text_ or specific field Solr has MANY parsers General: "lucene", DisMax, Extended DisMax (edismax) Specialized: Block Join, Boolean, Boost, Collapsing, Complex Phrase, Field, Filters, Function, Function Range, Graph, Join, Learning to Rank, .....  https://lucene.apache.org/solr/guide/7_3/other-parsers.html We already used Spatial geofilt query parser: fq={!geofilt sfield=location} edismax allows to search against multiple fields, with different weights, boosts, ties, minimum- match specifications, etc Choose with defType=edismax or {edismax param=value param=value}search_string Let's search for "George Brown" against (qf) "GivenName Surname Company StreetAddress City" and display same fields only DEMO Try using http://splainer.io/ to review the results Try with qf=GivenName^5 Surname^5 Company StreetAddress City Side-trip into eDisMax and query parsers Slide 21
  • 22. Result: 149 records, but all over the field values Enter RELEVANCY Recall – did we find all documents? Precision – did we find just the documents we needed Recall and Precision – fight. Perfect recall is q=*:* ...... Ranking – First hit is very important, ones after that less so (not always) Side note: Field sorting destroys ranking. We were optimizing Recall Dump everything into _text_ and let search sort it out Optimizing for Precision may seem easy too Under eDisMax, set mm=100% DEMO eDisMax exploration continues Slide 22
  • 23. It is a business decision what Precision and Recall mean for your use case Often "find more just in case" and focus on "ranking better" is the right approach Try qf=GivenName^5 Surname^5 Company StreetAddress City (no mm) qf=GivenName^5 Surname^5 Company StreetAddress City and mm=100% qf=GivenName^5 Surname^5 _text_ and mm=100% DEMO in Splainer Relevancy business case for our names (GivenName, Surname) UPPER/lower case does not matter Exact spelling (with accents) matches best – new Field Type needed (actually original text_basic...) Accent-free spelling matches next – existing text_basic and therefore dynamic field match is fine Phonetic spelling matches lowest (but higher than fallback _text_ field) – new Field Type needed eDisMax for ranking Slide 23
  • 24. <fieldType name="text_exact" class="solr.SortableTextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_phonetic" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/> </analyzer> </fieldType> <field name="GivenName_exact" type="text_exact" indexed="true" stored="false"/> <field name="Surname_exact" type="text_exact" indexed="true" stored="false"/> <field name="GivenName_ph" type="text_phonetic" indexed="true" stored="false"/> <field name="Surname_ph" type="text_phonetic" indexed="true" stored="false"/> <copyField source="GivenName" dest="GivenName_exact"/> <copyField source="GivenName" dest="GivenName_ph"/> <copyField source="Surname" dest="Surname_exact"/> <copyField source="Surname" dest="Surname_ph"/> Multiple fields for same content Slide 24
  • 25. Our test cases Frédéric, Thérèse, Jérôme Check different analysis in Admin UI's Analysis screen Can choose fields or field types from drop-down, use types as we have dynamic fields Can also test analysis vs search and highlight the matches Test search with Admin UI and Splainer with eDisMax enabled and Thérèse against different set of Query Fields (qf) Default search (qf=_text_) GivenName GivenName _text_ GivenName^10 _text_ GivenName_exact^15 GivenName^10 GivenName_ph^5 _text_ DEMO Testing multiple representations Slide 25
  • 26. Original search URL: http://...:8983/solr/tinydir/select?defType=edismax&fl=..... The good parameter set: defType=edismax qf=GivenName_exact^15 GivenName^10 GivenName_ph^5% _text_ fl=GivenName Surname Company StreetAddress City CountryFull Lock it in a dedicated request handler in solrconfig.xml <requestHandler name="/namesearch" class="solr.SearchHandler"> <lst name="defaults"> <str name="df">_text_</str> <str name="echoParams">all</str> <str name="defType">edismax</str> <str name="qf">GivenName_exact^15 GivenName^10 GivenName_ph^5 _text_</str> <str name="fl">GivenName Surname Company StreetAddress City CountryFull</str> </lst> </requestHandler> Now: http://...:8983/solr/tinydir/namesearch?q=Thérèse DEMO Simplify API usage Slide 26
  • 27. Based on previous work with Thai language: https://github.com/arafalov/solr-thai-test Needs ICU libraries in solrconfig.xml  <lib path="../../../contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-7.3.0.jar" /> <lib path="../../../contrib/analysis-extras/lib/icu4j-59.1.jar" /> Field, type, and copyField definition in managed-schema: <fieldType name="text_ru_en" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUTransformFilterFactory" id="ru-en" /> <filter class="solr.BeiderMorseFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.BeiderMorseFilterFactory" /> </analyzer> </fieldType> <field name="GivenName_ruen" type="text_ru_en" indexed="true" stored="false"/> <copyField source="GivenName" dest="GivenName_ruen"/> Reload, reindex Search  GivenName:Zahar  GivenName_ruen:Zahar And BOOM! Bonus magic Slide 27
  • 28. Rapid Solr Schema Development Alexandre Rafalovitch (@arafalov) Apache Solr Committer Montreal Solr/ML meetup May 2018

Editor's Notes

  • #14: Line 205-206 facet=true&facet.range=Age&facet.range.start=0&facet.range.end=200&facet.range.gap=10
  • #16: http://localhost:8983/solr/tinydir/select?rows=1&d=100&facet.field=City&facet=on&fq={!geofilt%20sfield=location}&pt=45.493444,%20-73.558154&q=*:*&facet.mincount=1
  • #19: TelephoneNumber:3911 – yes TelephoneNumber:"65 43" – sort of (need to quote or know these are together) TelephoneNumber:3986
  • #20: Frédéric, Thérèse, Jérôme Łódź, Kędzierzyn-Koźle
  • #27: Thérèse