Search Engine-Building
with Lucene and Solr
Kai Chan
SoCal Code Camp, June 2014
http://bit.ly/sdcodecamp2014solr
all data
matched data
data that a user actually sees
Lucene
● full-text search library
● creates, updates and read from the index
● takes queries and produces search results
● your application creates objects and calls
methods in the Lucene API
● provides building blocks for custom features
Solr
● full-text search platform
● uses Lucene for indexing and search
● REST-like API over HTTP
● different output formats (e.g. XML, JSON)
● provides some features not built into Lucene
machine running Java VM
your application
machine running Java VM
servlet container (e.g. Tomcat, Jetty)
Solr
Solr code
Lucene code libraries
index
Lucene
Lucene code
index
libraries
client
HTTP
Lucene
:
Solr:
How Data Are Organized
collection
document document document
field
field
field
field
field
field
field
field
field
field
content (e.g. "please read" or 30)
name (e.g. "title" or "price")
type
options
collection
document document document
subject
date
from
subject
date
from
date
from
text text
reply-to
text
reply-to
collection
document document document
subject
date
from
title
SKU
price
last name
phone
text description
first name
address
Solr Field Definition
● field
o name (e.g. "subject")
o type (e.g. "text_general")
o options (e.g. indexed="true" stored="true")
● field type
o text: "string", "text_general"
o numeric: "int", "long", "float", "double"
● options
o indexed: content can be searched
o stored: content can be returned at search-time
o multivalued: multiple values per field & document
Solr Dynamic Field
● define field by naming convention
● "amount_i": int, index, stored
● "tag_ss": string, indexed, stored, multivalued
Solr Copy Field
● copy one or more fields into another field
● can be used to define a catch-all field
o source: "title", "author", "content"
o destination: "text"
o searching the "text" field has the effect of searching
all the other three fields
Indexing - UpdateRequestHandler
● upload (POST) content or file to
http://host:port/solr/update
● formats: XML, JSON, CSV
Indexing - DataImportHandler
● has its own config file (data-config.xml)
● import data from various sources
o RDBMS (JDBC)
o e-mail (IMAP)
o XML data locally (file) or remotely (HTTP)
● transformers
o extract data (RegEx, XPath)
o manipulate data (strip HTML tags)
Indexing - ExtractingRequestHandler
● allows indexing of different formats
o e.g. PDF, MS Word, XML
● extract text and metadata
● maps extracted text to the “content” field
● maps metadata to different fields
Searching - Basics
● send request to http://host:port/solr/search
● parameters
o q - main query
o fq - filter query
o defType - query parser (e.g. lucene, edismax)
o fl - fields to return
o sort - sort criteria
o wt - response writer (e.g. xml, json)
o indent - set to true for pretty-printing
http://localhost:8983/solr/select?q=title:tablet&
fl=title,price,inStock&sort=price&wt=json
search handler's URL main query
response writersort criteriafields to return
Searching - Query Syntax
name:tablet
name:”galaxy tab”
name:tablet category:tablet
+name:tablet +category:tablet
Searching - Query Syntax (cont.)
+name:tablet +(manu:apple manu:samsung)
+name:tablet -manu:apple
+name:tablet +range:[300 TO 500]
+name:tablet manu:apple^5
EDisMax Parser
● suitable for user-generated queries
o does not complain about the syntax
o does not require field name in query
o searches across several fields
● configurable
● default: sorting by decreasing score
● custom sorting rules: use the sort parameter
o syntax: fieldName (asc|desc)
o e.g. sort by ascending price (i.e. lowest price
first):price asc
o e.g. sort by descending date (i.e. newest date
first):date asc
Sorting
Sorting
● multiple fields and orders: separate by
commas
o e.g. sort by descending starRating and ascending
price:
o starRating desc, price asc
Sorting
● cannot use multivalued fields
● overrides the default sorting behavior
Faceted Search
● facet values: (distinct) values (generally non-
overlapping) ranges of a field
● displaying facets
o show possible values
o let users narrow down their searches easily
facet
facet values (5 of them)
Faceted Search
● set facet parameter to true - enables
faceting
● other parameters
o facet.field - use the field's values as facets
 return <value, count> pairs
o facet.query - use the given queries as facets
 return <query, count> pairs
o facet.sort - set the ordering of the facets;
 can be "count" or "index"
o facet.offset and face.limit - used for
pagination of facets
Spatial Search
● data: locations (longitudes, latitudes)
● search: filter and/or sort by location
Filter by Location
● geofilt
o circle centered at a given point
o distance from a given point
o fq={!geofilt sfield=store}&pt=45.15,-
93.85&d=5
● bbox
o square (“bounding box”) centered at a given point
o distance from a given point + corners
o fq={!bbox sfield=store}&pt=45.15,-
93.85&d=5
Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
geofilt bbox
5 km 5 km
(45.15, -93.85) (45.15, -93.85)
Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
geofilt bbox
5 km 5 km
(45.15, -93.85) (45.15, -93.85)
x
o
o
x
x
x
o
o
o
o
x
o
Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
Sort by Location
● geodist
o returns the distance between the location given in a
field and a certain coordinate
o e.g. sort by ascending distance from (45.15,-93.85),
and return the distances as the
score:q={!func}geodist()&sfield=store&pt
=45.15,-93.85&sort=score+asc
Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
Scaling/Redundancy
problem solution
collection too large for a
single machine
distribution
too many requests for a
single machine
distribution
a machine can go down replication
SolrCloud
● Solr instances
o collection (logical index) divided into one or more
partial collections (“shards”)
o for each shard, one or more Solr instances keep
copies of the data
 one as leader - handles reads and writes
 others as replicas - handle reads
● ZooKeeper instances
SolrCloud
● Solr instances
● ZooKeeper instances
o management of Solr instances
o leader election
o node discovery
leader replica replica
leader replica
leader replica
shard 1:
⅓ of the
collection
shard 2:
⅓ of the
collection
shard 3:
⅓ of the
collection
collection (i.e. logical index)
replica
replica
replica
leader replica replica
leader replica
leader replica
shard 1:
⅓ of the
collection
shard 2:
⅓ of the
collection
shard 3:
⅓ of the
collection
collection (i.e. logical index)
replica
replica
replica
replica
leader replica replica
(offline) leader
leader replica
shard 1:
⅓ of the
collection
shard 2:
⅓ of the
collection
shard 3:
⅓ of the
collection
collection (i.e. logical index)
replica
replica
replica
replica
leader replica replica
replica leader
leader replica
shard 1:
⅓ of the
collection
shard 2:
⅓ of the
collection
shard 3:
⅓ of the
collection
collection (i.e. logical index)
replica
replica
replica
replica
Resources - Books
● Solr in Action
o just released, up-to-date
o http://www.manning.com/grainger/
● Apache Solr 4 Cookbook
o common problems and useful tips
o http://www.packtpub.com/apache-solr-4-
cookbook/book
● Lucene in Action
o written by 3 committer and PMC members
o somewhat outdated (2010; covers Lucene 3.0)
o http://www.manning.com/hatcher3/
Resources - Books
● Introduction to Information Retrieval
o not specific to Lucene/Solr, but about IR concepts
o free e-book
o http://nlp.stanford.edu/IR-book/
● Managing Gigabytes
o indexing, compression and other topics
o accompanied by MG4J - a full-text search software
o http://mg4j.di.unimi.it/
Resources - Web
● official website
o http://lucene.apache.org/
o Wiki
o reference guide
o mailing list
● StackOverflow
o http://stackoverflow.com/
o “Lucene” and “Solr” tags
Getting Started
● download Solr
o requires Java 7 or newer to run
● Solr comes bundled/configured with Jetty
o <Solr directory>/example/start.jar
● "exampledocs" directory contains sample
documents
o <Solr directory>/example/exampledocs/post.jar
o java -
Durl=http://localhost:8983/solr/update
-jar post.jar *.xml
● use the Solr admin interface
o http://localhost:8983/solr/
Thanks for Coming!
● Java Performance Tips @ 10:15, same room
● slides available
o http://bit.ly/sdcodecamp2014solr
● please vote for my conference session
o http://bit.ly/tvnews2014
● questions/feedback
o kai@ssc.ucla.edu
● questions?

More Related Content

PDF
Information Retrieval - Data Science Bootcamp
PDF
Search Engine-Building with Lucene and Solr
PDF
Query Parsing - Tips and Tricks
PDF
Apache Solr Workshop
PDF
Using Apache Solr
PPT
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
PPTX
Tutorial on developing a Solr search component plugin
PDF
Retrieving Information From Solr
Information Retrieval - Data Science Bootcamp
Search Engine-Building with Lucene and Solr
Query Parsing - Tips and Tricks
Apache Solr Workshop
Using Apache Solr
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Tutorial on developing a Solr search component plugin
Retrieving Information From Solr

What's hot (20)

PDF
Building your own search engine with Apache Solr
PDF
Solr Troubleshooting - TreeMap approach
PDF
Get the most out of Solr search with PHP
PDF
Solr Application Development Tutorial
PPTX
Ingesting and Manipulating Data with JavaScript
PPTX
Solr 6 Feature Preview
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Solr Recipes Workshop
PPTX
Apache Solr
PPTX
Solr introduction
PDF
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
PDF
Apache Solr crash course
PDF
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
PPTX
20130310 solr tuorial
PDF
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
PDF
Integrating the Solr search engine
PPTX
Apache Solr
PPT
Hive - SerDe and LazySerde
PDF
Getting started with apache solr
PPTX
Drupal 8. Search API. Facets. Customize / combine facets
Building your own search engine with Apache Solr
Solr Troubleshooting - TreeMap approach
Get the most out of Solr search with PHP
Solr Application Development Tutorial
Ingesting and Manipulating Data with JavaScript
Solr 6 Feature Preview
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Solr Recipes Workshop
Apache Solr
Solr introduction
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
Apache Solr crash course
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
20130310 solr tuorial
Apache Spark - Loading & Saving data | Big Data Hadoop Spark Tutorial | Cloud...
Integrating the Solr search engine
Apache Solr
Hive - SerDe and LazySerde
Getting started with apache solr
Drupal 8. Search API. Facets. Customize / combine facets
Ad

Similar to Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014) (20)

KEY
Apache Solr - Enterprise search platform
PPTX
Introduction to Apache Lucene/Solr
KEY
Solr 101
PPTX
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
PDF
Basics of Solr and Solr Integration with AEM6
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
PDF
Solr Masterclass Bangkok, June 2014
PDF
Introduction to Solr
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PPTX
Introduction to Lucene & Solr and Usecases
PDF
Rapid Prototyping with Solr
PDF
Apace Solr Web Development.pdf
PDF
Introduction to Solr
PPTX
Apache Solr - search for everyone!
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Introduction to Apache Solr
PDF
Rapid Prototyping with Solr
PDF
Solr search engine with multiple table relation
PDF
Solr 8 interview
Apache Solr - Enterprise search platform
Introduction to Apache Lucene/Solr
Solr 101
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Basics of Solr and Solr Integration with AEM6
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Solr Masterclass Bangkok, June 2014
Introduction to Solr
Oslo Solr MeetUp March 2012 - Solr4 alpha
Introduction to Lucene & Solr and Usecases
Rapid Prototyping with Solr
Apace Solr Web Development.pdf
Introduction to Solr
Apache Solr - search for everyone!
NoSQL, Apache SOLR and Apache Hadoop
Introduction to Apache Solr
Rapid Prototyping with Solr
Solr search engine with multiple table relation
Solr 8 interview
Ad

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Unlock new opportunities with location data.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PPT
What is a Computer? Input Devices /output devices
PPT
Geologic Time for studying geology for geologist
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
O2C Customer Invoices to Receipt V15A.pptx
Architecture types and enterprise applications.pdf
Hindi spoken digit analysis for native and non-native speakers
A contest of sentiment analysis: k-nearest neighbor versus neural network
Univ-Connecticut-ChatGPT-Presentaion.pdf
sustainability-14-14877-v2.pddhzftheheeeee
A review of recent deep learning applications in wood surface defect identifi...
Unlock new opportunities with location data.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
WOOl fibre morphology and structure.pdf for textiles
Group 1 Presentation -Planning and Decision Making .pptx
Benefits of Physical activity for teenagers.pptx
Chapter 5: Probability Theory and Statistics
What is a Computer? Input Devices /output devices
Geologic Time for studying geology for geologist
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A novel scalable deep ensemble learning framework for big data classification...
Final SEM Unit 1 for mit wpu at pune .pptx
Zenith AI: Advanced Artificial Intelligence
O2C Customer Invoices to Receipt V15A.pptx

Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)

  • 1. Search Engine-Building with Lucene and Solr Kai Chan SoCal Code Camp, June 2014 http://bit.ly/sdcodecamp2014solr
  • 2. all data matched data data that a user actually sees
  • 3. Lucene ● full-text search library ● creates, updates and read from the index ● takes queries and produces search results ● your application creates objects and calls methods in the Lucene API ● provides building blocks for custom features
  • 4. Solr ● full-text search platform ● uses Lucene for indexing and search ● REST-like API over HTTP ● different output formats (e.g. XML, JSON) ● provides some features not built into Lucene
  • 5. machine running Java VM your application machine running Java VM servlet container (e.g. Tomcat, Jetty) Solr Solr code Lucene code libraries index Lucene Lucene code index libraries client HTTP Lucene : Solr:
  • 6. How Data Are Organized collection document document document field field field field field field field field field
  • 7. field content (e.g. "please read" or 30) name (e.g. "title" or "price") type options
  • 10. Solr Field Definition ● field o name (e.g. "subject") o type (e.g. "text_general") o options (e.g. indexed="true" stored="true") ● field type o text: "string", "text_general" o numeric: "int", "long", "float", "double" ● options o indexed: content can be searched o stored: content can be returned at search-time o multivalued: multiple values per field & document
  • 11. Solr Dynamic Field ● define field by naming convention ● "amount_i": int, index, stored ● "tag_ss": string, indexed, stored, multivalued
  • 12. Solr Copy Field ● copy one or more fields into another field ● can be used to define a catch-all field o source: "title", "author", "content" o destination: "text" o searching the "text" field has the effect of searching all the other three fields
  • 13. Indexing - UpdateRequestHandler ● upload (POST) content or file to http://host:port/solr/update ● formats: XML, JSON, CSV
  • 14. Indexing - DataImportHandler ● has its own config file (data-config.xml) ● import data from various sources o RDBMS (JDBC) o e-mail (IMAP) o XML data locally (file) or remotely (HTTP) ● transformers o extract data (RegEx, XPath) o manipulate data (strip HTML tags)
  • 15. Indexing - ExtractingRequestHandler ● allows indexing of different formats o e.g. PDF, MS Word, XML ● extract text and metadata ● maps extracted text to the “content” field ● maps metadata to different fields
  • 16. Searching - Basics ● send request to http://host:port/solr/search ● parameters o q - main query o fq - filter query o defType - query parser (e.g. lucene, edismax) o fl - fields to return o sort - sort criteria o wt - response writer (e.g. xml, json) o indent - set to true for pretty-printing
  • 18. Searching - Query Syntax name:tablet name:”galaxy tab” name:tablet category:tablet +name:tablet +category:tablet
  • 19. Searching - Query Syntax (cont.) +name:tablet +(manu:apple manu:samsung) +name:tablet -manu:apple +name:tablet +range:[300 TO 500] +name:tablet manu:apple^5
  • 20. EDisMax Parser ● suitable for user-generated queries o does not complain about the syntax o does not require field name in query o searches across several fields ● configurable
  • 21. ● default: sorting by decreasing score ● custom sorting rules: use the sort parameter o syntax: fieldName (asc|desc) o e.g. sort by ascending price (i.e. lowest price first):price asc o e.g. sort by descending date (i.e. newest date first):date asc Sorting
  • 22. Sorting ● multiple fields and orders: separate by commas o e.g. sort by descending starRating and ascending price: o starRating desc, price asc
  • 23. Sorting ● cannot use multivalued fields ● overrides the default sorting behavior
  • 24. Faceted Search ● facet values: (distinct) values (generally non- overlapping) ranges of a field ● displaying facets o show possible values o let users narrow down their searches easily
  • 26. Faceted Search ● set facet parameter to true - enables faceting ● other parameters o facet.field - use the field's values as facets  return <value, count> pairs o facet.query - use the given queries as facets  return <query, count> pairs o facet.sort - set the ordering of the facets;  can be "count" or "index" o facet.offset and face.limit - used for pagination of facets
  • 27. Spatial Search ● data: locations (longitudes, latitudes) ● search: filter and/or sort by location
  • 28. Filter by Location ● geofilt o circle centered at a given point o distance from a given point o fq={!geofilt sfield=store}&pt=45.15,- 93.85&d=5 ● bbox o square (“bounding box”) centered at a given point o distance from a given point + corners o fq={!bbox sfield=store}&pt=45.15,- 93.85&d=5 Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
  • 29. geofilt bbox 5 km 5 km (45.15, -93.85) (45.15, -93.85) Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
  • 30. geofilt bbox 5 km 5 km (45.15, -93.85) (45.15, -93.85) x o o x x x o o o o x o Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
  • 31. Sort by Location ● geodist o returns the distance between the location given in a field and a certain coordinate o e.g. sort by ascending distance from (45.15,-93.85), and return the distances as the score:q={!func}geodist()&sfield=store&pt =45.15,-93.85&sort=score+asc Credit: Apache Solr Reference Guide 4.5 <http://lucene.apache.org/>
  • 32. Scaling/Redundancy problem solution collection too large for a single machine distribution too many requests for a single machine distribution a machine can go down replication
  • 33. SolrCloud ● Solr instances o collection (logical index) divided into one or more partial collections (“shards”) o for each shard, one or more Solr instances keep copies of the data  one as leader - handles reads and writes  others as replicas - handle reads ● ZooKeeper instances
  • 34. SolrCloud ● Solr instances ● ZooKeeper instances o management of Solr instances o leader election o node discovery
  • 35. leader replica replica leader replica leader replica shard 1: ⅓ of the collection shard 2: ⅓ of the collection shard 3: ⅓ of the collection collection (i.e. logical index) replica replica replica
  • 36. leader replica replica leader replica leader replica shard 1: ⅓ of the collection shard 2: ⅓ of the collection shard 3: ⅓ of the collection collection (i.e. logical index) replica replica replica replica
  • 37. leader replica replica (offline) leader leader replica shard 1: ⅓ of the collection shard 2: ⅓ of the collection shard 3: ⅓ of the collection collection (i.e. logical index) replica replica replica replica
  • 38. leader replica replica replica leader leader replica shard 1: ⅓ of the collection shard 2: ⅓ of the collection shard 3: ⅓ of the collection collection (i.e. logical index) replica replica replica replica
  • 39. Resources - Books ● Solr in Action o just released, up-to-date o http://www.manning.com/grainger/ ● Apache Solr 4 Cookbook o common problems and useful tips o http://www.packtpub.com/apache-solr-4- cookbook/book ● Lucene in Action o written by 3 committer and PMC members o somewhat outdated (2010; covers Lucene 3.0) o http://www.manning.com/hatcher3/
  • 40. Resources - Books ● Introduction to Information Retrieval o not specific to Lucene/Solr, but about IR concepts o free e-book o http://nlp.stanford.edu/IR-book/ ● Managing Gigabytes o indexing, compression and other topics o accompanied by MG4J - a full-text search software o http://mg4j.di.unimi.it/
  • 41. Resources - Web ● official website o http://lucene.apache.org/ o Wiki o reference guide o mailing list ● StackOverflow o http://stackoverflow.com/ o “Lucene” and “Solr” tags
  • 42. Getting Started ● download Solr o requires Java 7 or newer to run ● Solr comes bundled/configured with Jetty o <Solr directory>/example/start.jar ● "exampledocs" directory contains sample documents o <Solr directory>/example/exampledocs/post.jar o java - Durl=http://localhost:8983/solr/update -jar post.jar *.xml ● use the Solr admin interface o http://localhost:8983/solr/
  • 43. Thanks for Coming! ● Java Performance Tips @ 10:15, same room ● slides available o http://bit.ly/sdcodecamp2014solr ● please vote for my conference session o http://bit.ly/tvnews2014 ● questions/feedback o [email protected] ● questions?

Editor's Notes

  • #3: challenges with searching * size of all data can be huge (GB, even TBs) ** searching by going through all data can take too long and might not work * often, a user is only going to see a tiny fraction of the matched data (limited time and attention) ** essential to show the most relevant result first/at the top black box * benefits: speed, relevance * cost: pre-processing (time, space) - indexing
  • #7: * collections - all data you have * a collection can have many documents * a document can have many fields
  • #8: a field can have * name * content * type and options (will talk about them later)
  • #9: * each field is optional, i.e. a particular document doesn’t have to have every field
  • #10: * in fact, a collection can contain different kinds of documents, with different fields among them * e-mail * product * contact
  • #12: * these are just examples * Solr documentation has the full list
  • #14: * Solr’s documentation has the exact formats required
  • #19: * part before colon is the field name, part after colon is the field value * search for phrase: quote the phrase with double-quotes * separate two or more clauses by space: a document must match any of the clauses, for the document to be in the result set * “+” before a clause: a document must match the clause, for the document to be in the result set
  • #20: * parentheses: group clauses * “-” before a clause: a document must NOT match the clause, for the document to be in the result set * to match a range, surround the lower bound and upper bound with square brackets * boost a clause by adding “^” and a number (>1: more emphasis, <1: less emphasis)
  • #21: things to configure in solrconfig.xml: * what fields to search the words in * boosting of these fields
  • #22: special field names: * “ score”: document score * _docid_: document ID
  • #28: e.g. merchants with store locations
  • #33: distribution * spread the collection across multiple machines distribution * spread the requests across multiple machines replication * copy data and configuration across multiple machines * make sure no single point of failure