SlideShare a Scribd company logo
Introduction to
Apache Solr
Thessaloniki Java Meetup
2015-10-16
Christos Manios
2Contents
1. What is Solr
2. Solr Architecture / Concepts
3. Install / Configure
4. Index, Query, Update, Delete data
5. Solr integration
6. Solr resources
7. SolrCloud
WHAT IS SOLR
(and why we care so much about it!)
3
WHAT IS
SOLR
▸ A search engine
▸ A REST API
▸ Built on Lucene
▸ Open Source
▸ Blazing-fast
▸ Scalable
▸ Fault tolerant
4
WHY SOLR
▸ Text search faster than RDBMS
▸ Solr knows about languages
▸ Specific features:
▹ Highlighting
▹ Faceting
▹ Scoring/Boost
and many more !!
5
SOLR
TIMELINE
6
1999 2004 2010 2015
Doug Cutting
creates
Lucene
Version 5.0
Yonik Seeley
creates Solr
Lucene and
Solr merge
2012
Solr 4 and
introduction
of SolrCloud
WHO USES
SOLR
LinkedIn
DuckDuckGo
IBM Websphere
Commerce
AT&T
Apple
eBay
MTV Networks
Magento
O.T.S.
Instagram
Nasa
Netflix
Disney
Buy.com
Adobe
SAP Hybris
Bloomberg
7
and many more!
Does Solr fit in our
application?
“Well…. it depends!”
8
SOLR
ARCHITECTURE
9
BASIC
CONCEPTS
▸ Standalone application
server (Jetty powered)
▸ Document oriented
▸ Schema (less)
▸ Not ACID (document
atomicity)
10
ARCHITECTURE
11
ARCHITECTURE
(2)
12 insert picture
©Lucidworks 2013
ARCHITECTURE
(3)
13 insert picture
©Lucidworks 2013
ARCHITECTURE
(4)
14 insert picture
SOLR
CONCEPTS
AND
TERMINOLOGY
▸ Node
▸ Core
▸ Schema
▸ ConfigSet
▸ SolrCloud
▹ Collection
▹ Shard
▹ Zookeeper
15
SCHEMA
▸ Every Solr core has a
schema
▸ Defined in schema.xml
▸ Contains:
▹ Fields
▹ Data types
▹ Analysers
16
MANAGED
SCHEMA
▸ Solr supports schemaless
mode
▸ Not recommended for
production
▸ Performance implications
17
FIELD TYPES
▸ int, float, long, double
▸ date
▸ string
▸ text (multilingual ** )
▸ location
18
COMMON
FIELD
ATTRIBUTES
▸ indexed
▸ stored
▸ type
▸ multivalued
19
Example:
<field name="id" type="string" indexed="true"
stored="true" required="true" multiValued="
false" />
DYNAMIC
FIELDS
▸ Fields not explicitly defined in
schema
▸ Field names must match a
pattern
▸ Field names prefixed or suffixed
with a wildcard
▸ Make schema dynamic
<dynamicField name="*_i" type="tint"
indexed="true" stored="true"/>
20
INDEXING/
SEARCHING /
UPDATING /
DELETING
21
INDEXING
You can index one or more
documents using:
▸ bin/post command
▸ REST api
▸ SolrJ or other libraries
▸ DataImportHandler
22
INDEXING (2)
REST API example:
curl http://localhost:8983/solr/my_collection/update
-H "Content-Type: text/xml" --data-binary '
<add>
<doc>
<field name="id">012ab1</field>
<field name="authors">Patrick Eagar</field>
<field name="subject">Sports</field>
<field name="dd">796.35</field>
<field name="isbn">0002166313</field>
<field name="yearpub">1982</field>
<field name="publisher">Collins</field>
</doc>
</add>'
23
INDEXING
MULTIPLE
DOCS (JSON)
REST API example:
curl -X POST -H 'Content-Type: application/json' 'http://localhost:
8983/solr/my_collection/update' --data-binary '
[
{
"id": "1",
"title": "Doc 1"
},
{
"id": "2",
"title": "Doc 2"
}
]'
24
SEARCHING
REST API example:
curl http://192.168.1.2:8983/solr/javameetup/select?q=*%3A*
&sort=creatorName_txtel_diav+desc
&rows=10
&fl=id
&wt=json
&indent=true
25
SEARCHING:
QUERY
PARSERS
Solr has the following query
parsers:
▸ Standard (lucene)
▸ Dismax
▸ Edismax
26
SEARCHING:
RANGE
QUERIES
▸ Allow the selection of documents whose fields
fall within a range
▸ Ranges with [] are inclusive at both sides
▹ price:[0 TO 100]
▹ price:[0 TO *]
▹ price:[* TO 100]
▸ Range queries with {} are exclusive
▹ price:{0 TO 100}
▸ Can mix { and ]
▹ price:[0 TO 100}
27
SEARCHING:
DATE QUERIES
▸ Date format: 2015-10-16T19:19:59Z
▸ Dates are stored in UTC.
▸ Date math
▹ NOW
▹ NOW/YEAR
▹ NOW/HOUR
▹ NOW/MONTH
▹ NOW/SECOND
28
▸ Boolean queries:
▹ +this -that
▹ this AND that
▸ Field queries:
▹ title: Bob SquarePants
▹ company: Nickelodeon
SEARCHING:
OTHER
QUERIES
29
SEARCHING:
OTHER
QUERIES (2)
▸ Phrase/proximity queries:
▹ "Sheldon Couper" matches only Sheldon
Couper
▹ "Sheldon Couper"~1 matches Sheldon Lee
Couper
▸ Multi-term queries:
▹ title:Ιωάννης Μακρυγιάννης
▹ title:(Ιωάννης Μακρυγιάννης)
▸ Combine them:
▹ +this -title:that +price:[* TO 100] –
name:"Sheldon Couper"
30
SEARCHING:
FUZZY &
WILDCARD
QUERIES
▸ Sometimes we don't know exactly what you are
looking:
▹ It starts with pro: pro*
▹ It ends with tion: *tion
▹ Not sure about a letter: j?t
▸ Something like chris:
▹ chris~
▹ chris~0.9
▸ Regular expression: /H.*t/ matches Hornet
31
SEARCHING:
RELEVANCY
Relevancy is the quality of results returned from a
query, encompassing both what documents are
found, and their relative ranking (the order that they
are returned to the user.)
32
SEARCHING:
RELEVANCY
EXAMPLE
▸ Find all people with name “Κώστας” and return
politicians first:
▹ q=name:”Κώστας” +occupation:
Politician~100
33
SEARCHING:
FILTER
QUERIES
▸ Limit the possible responses to the main query
▸ Do not change ordering or scoring
▸ Can be based on any query type
▸ Example:
&fq=category:music
&fq=price:[0 TO 100]
&fq=rating:[3 TO *]
34
SEARCHING:
SORTING
▸ Solr can sort by
▹ Score
▹ A value in a field
▹ A function
▸ In ascending or descending order
▸ Multiple fields:
&sort=name asc,age desc
35
SEARCHING:
FACETS
36
SEARCHING:
HIGHLIGHTING
37
UPDATE
DOCUMENT
EXAMPLE
38 ▸ Solr performs atomic (partial) updates.
▹ It marks the old version of the document as
deleted
▹ It adds the new version of the document.
▹ Updates are based on the unique ID.
▹ Not possible to update by query.
DELETE
DOCUMENTS
39 ▸ Delete documents by query (WARNING! The
following deletes all docs!!
http://192.168.1.1:8983/solr/update?
commit=true&stream.body=<delete><query>*:
*</query></delete>
SEARCH
SPEED?
40
SEARCH
SPEED
PARAMETERS
41
It depends on:
1. Document size
2. Field cardinality
3. RAM assigned to JVM
4. Indexing rate (updates / sec)
5. Query rate (queries / sec)
6. Query quality
Be careful or it will become:
INSTALL /
CONFIGURE SOLR
42
INSTALL
SOLR
▸ Download from a mirror
▸ Unzip
▸ Run
43
bob@bobos-PC$ ls solr*
solr-5.3.1.zip
bob@bobos-PC$ unzip -q solr-5.3.1.zip
bob@bobos-PC$ cd solr-5.3.1/
RUN SOLR
44 bob@bobos-PC$ /opt/solr-5.3.1 $ bin/solr start -p 8983
Waiting up to 30 seconds to see Solr running on port 8983
[/]
Started Solr server on port 8983 (pid=6240). Happy
searching!
(in Windows use: bin/solr.cmd)
CREATE A
NEW CORE
45 $ bin/solr create_core -c javameetup -d basic_configs
Setup new core instance directory:
/opt/solr-5.3.1/server/solr/javameetup
SOLR
RESOURCES
46
RESOURCES ▸ Official Lucene page:
▹ http://lucene.apache.org
▸ Official Solr page:
▹ http://lucene.apache.org/solr
47
RESOURCES
(2)
Solr official resources page
provides links to:
▸ Tutorials
▸ Release documentation
▸ Reference guide
▸ Mailing lists
48
SOLR
INTEGRATIO
N
49 Solr is integrated with multiple
languages via libraries:
▸ Java (solrj, spring-data-solr)
▸ Python
▸ PHP
▸ .NET
▸ Go
for a full list see here.
SOLR
INTEGRATIO
N (2)
50 Solr can be combined with big
data software such as:
▸ Apache Hadoop
▸ Apache Cassandra
▸ Apache Spark
▸ Apache Mahout
SolrCloud
51
INDEX SIZE
▸ Be constantly aware of your
index size:52
2,100,000,000
maximum number of documents per core or shard
53
For more, consider SolrCloud solution!
SOLRCLOUD
CHARACTERISTI
CS
▸ Distributed search
▸ Sharding
▸ Fault tolerance
▸ High availability
▸ Apache Zookeeper coordinates:
▹ shard leader election
▹ updates distribution to shard
leaders
54
SOLRCLOUD
ADMIN PAGE
55
Collection with one shard
SOLRCLOUD
ADMIN PAGE
(2)
56
Collection with one shard
57 Questions?
About me:
▸ https://manios.org
▸ https://github.com/manios

More Related Content

What's hot (20)

PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
PDF
Introduction to elasticsearch
pmanvi
 
PDF
Logstash-Elasticsearch-Kibana
dknx01
 
PDF
Spark shuffle introduction
colorant
 
PDF
Elasticsearch From the Bottom Up
foundsearch
 
PPTX
Optimizing Apache Spark SQL Joins
Databricks
 
PPTX
Elastic search overview
ABC Talks
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
PDF
Workshop: Learning Elasticsearch
Anurag Patel
 
PPTX
Hashicorp-Certified-Terraform-Associate-v3-edited.pptx
ssuser0d6c88
 
PPTX
Solr introduction
Lap Tran
 
ODP
Deep Dive Into Elasticsearch
Knoldus Inc.
 
PDF
Introduction to elasticsearch
hypto
 
PDF
JCR - Java Content Repositories
Carsten Ziegeler
 
ODP
Query DSL In Elasticsearch
Knoldus Inc.
 
PPTX
Why your Spark Job is Failing
DataWorks Summit
 
Apache Nifi Crash Course
DataWorks Summit
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
Introduction to elasticsearch
pmanvi
 
Logstash-Elasticsearch-Kibana
dknx01
 
Spark shuffle introduction
colorant
 
Elasticsearch From the Bottom Up
foundsearch
 
Optimizing Apache Spark SQL Joins
Databricks
 
Elastic search overview
ABC Talks
 
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Stream processing using Kafka
Knoldus Inc.
 
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Workshop: Learning Elasticsearch
Anurag Patel
 
Hashicorp-Certified-Terraform-Associate-v3-edited.pptx
ssuser0d6c88
 
Solr introduction
Lap Tran
 
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Introduction to elasticsearch
hypto
 
JCR - Java Content Repositories
Carsten Ziegeler
 
Query DSL In Elasticsearch
Knoldus Inc.
 
Why your Spark Job is Failing
DataWorks Summit
 

Viewers also liked (20)

PPTX
Introduction to Apache Solr
Andy Jackson
 
PPT
Introduction to Apache Solr.
ashish0x90
 
PDF
Introduction to Apache Solr
Alexandre Rafalovitch
 
PPTX
Enterprise Search Using Apache Solr
sagar chaturvedi
 
PPTX
Introduction to Apache Lucene/Solr
Rahul Jain
 
PDF
Intro to Apache Solr
Shalin Shekhar Mangar
 
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
PDF
Apache Solr Search Course Drupal 7 Acquia
Dropsolid
 
PDF
Solr Powered Lucene
Erik Hatcher
 
PDF
Webinar: What's New in Solr 6
Lucidworks
 
PPTX
Building a real time, solr-powered recommendation engine
Trey Grainger
 
PPTX
20130310 solr tuorial
Chris Huang
 
PDF
Solr Architecture
Ramez Al-Fayez
 
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
PDF
New-Age Search through Apache Solr
Edureka!
 
PPTX
Lucene
Harshit Agarwal
 
PDF
Manage tracability with Apache Atlas, a flexible metadata repository
Synaltic Group
 
PPTX
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
PDF
Using Apache Solr
pittaya
 
PDF
Solr: Search at the Speed of Light
Erik Hatcher
 
Introduction to Apache Solr
Andy Jackson
 
Introduction to Apache Solr.
ashish0x90
 
Introduction to Apache Solr
Alexandre Rafalovitch
 
Enterprise Search Using Apache Solr
sagar chaturvedi
 
Introduction to Apache Lucene/Solr
Rahul Jain
 
Intro to Apache Solr
Shalin Shekhar Mangar
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
Apache Solr Search Course Drupal 7 Acquia
Dropsolid
 
Solr Powered Lucene
Erik Hatcher
 
Webinar: What's New in Solr 6
Lucidworks
 
Building a real time, solr-powered recommendation engine
Trey Grainger
 
20130310 solr tuorial
Chris Huang
 
Solr Architecture
Ramez Al-Fayez
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
New-Age Search through Apache Solr
Edureka!
 
Manage tracability with Apache Atlas, a flexible metadata repository
Synaltic Group
 
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
Using Apache Solr
pittaya
 
Solr: Search at the Speed of Light
Erik Hatcher
 
Ad

Similar to Introduction to Apache Solr (20)

PDF
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
PPTX
Apache Solr Workshop
JSGB
 
KEY
Apache Solr - Enterprise search platform
Tommaso Teofili
 
PDF
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
PDF
Solr search engine with multiple table relation
Jay Bharat
 
PPTX
Solr tech talk
Prashant More
 
PDF
A Practical Introduction to Apache Solr
Angel Borroy López
 
PPTX
Coffee at DBG- Solr introduction
Sajindbg Dbg
 
PDF
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
PPS
Introduction to Solr
Jayesh Bhoyar
 
PPTX
Apache Solr-Webinar
Edureka!
 
DOCX
Apache solr tech doc
Barot Sagar
 
PDF
Solr 8 interview
Alihossein shahabi
 
PPTX
Apache solr
Péter Király
 
PDF
How Solr Search Works
Atlogys Technical Consulting
 
PDF
Apache Solr
Kevin Wenger
 
PPTX
Solr Introduction
Ismaeel Enjreny
 
PDF
Introduction to Solr
Erik Hatcher
 
PDF
Meet Solr For The Tirst Again
Varun Thacker
 
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
Apache Solr Workshop
JSGB
 
Apache Solr - Enterprise search platform
Tommaso Teofili
 
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Solr search engine with multiple table relation
Jay Bharat
 
Solr tech talk
Prashant More
 
A Practical Introduction to Apache Solr
Angel Borroy López
 
Coffee at DBG- Solr introduction
Sajindbg Dbg
 
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Introduction to Solr
Jayesh Bhoyar
 
Apache Solr-Webinar
Edureka!
 
Apache solr tech doc
Barot Sagar
 
Solr 8 interview
Alihossein shahabi
 
Apache solr
Péter Király
 
How Solr Search Works
Atlogys Technical Consulting
 
Apache Solr
Kevin Wenger
 
Solr Introduction
Ismaeel Enjreny
 
Introduction to Solr
Erik Hatcher
 
Meet Solr For The Tirst Again
Varun Thacker
 
Ad

Recently uploaded (20)

PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 

Introduction to Apache Solr