SOLR Power FTW: short version

Solr Power FTW
Robby Morgan

What Will I Cover?

● Who I am

● Why/how we use SOLR

● Can SOLR handle 20K queries per second?

● Lessons learned: large scale multi data center deployment

● Conclusion

Robby Morgan

● Software Engineering Lead

● 3 yrs experience w/
Solr @ Bazaarvoice

● Jack of all trades, master of some :)

Bazaarvoice

● Bazaarvoice is a software as a service
company powering User Generated Content
such as ratings and reviews
on thousands of web sites

● 5 billion page views per month

● 230 billion impressions

● 75 million UGC

Life Before SOLR

● Indexes for sorting and filtering

● Aggregate tables for stats

● Nightly jobs

● Bugs...

Enter SOLR

● Index content and product catalog
● De-normalization
● Filtering, faceting/stats and sorting
● Index every 15 minutes (20 seconds NRT)

SOLR usage @ Bazaarvoice

Documents 250 MM

Index size 200 GB

QPS, avg 2,350

QPS, max 10,200

Response time, avg 12 ms

Servers 6+20

SOLR Cloud @ Bazaarvoice

● Multiple cores (100+ per server)

● Re-balance indexes across cores and servers
○ Automatic
○ Manual

● Deployment map stored in MySQL
○ Host - Core - Partition
○ Statistics

● Partition lifecycle

Replication - Multiple Data Centers

Lessons: SOLR Performance

● SOLR loves RAM!
SOLR

● Simulate and measure
○ Same config, same hardware

● Get the most out of one instance

Lessons: Cross-DC Replication

Chatty if using multiple cores

Relay
● Core auto-warming disabled
● Connection wait and read timeouts increased
● Replication poll interval increased (15 min)
● Compression enabled
...
<str name="httpConnTimeout">20000</str>
<str name="httpReadTimeout">65000</str>
<str name="pollInterval">00:15:00</str>
<str name="compression">internal</str>
...

Performance Tuning

● Heap size
● Cache sizing
● Auto-warming
● Stored fields
● Merge factor
● Commit frequency
● Optimize frequency
Process: Simulate and measure
● Replay logs
● Analyze metrics
● Monitor GC

Performance Tuning - GC

# Java memory usage settings
# Force the NewSize to be larger than the JVM typically allocates.
# In practice, the JVM has been allocating an extremely small Young generation
which objects to be prematurely promoted to the Tenured generation
JAVA_MEM_OPTS="-Xms27g -Xmx27g -XX:NewRatio=8"

# -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps --> Turn on GC Logging
# -XX:+UseConcMarkSweepGC --> Use the concurrent collector
# -XX:+CMSIncrementalMode --> Incremental mode for the concurrent collector
# -XX:+CMSIncrementalPacing --> Let the JVM adjust the amount of incremental
collection

JAVA_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:
+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=55 -XX:
ParallelGCThreads=8 -XX:SurvivorRatio=4"

Lessons: Schema Changes

● Re-indexing is time consuming for large indexes

● Process:

1. Full re-index off-line prior to the release
2. Incremental indexing after the release

● Bottleneck: reading from MySQL

● Goal: Transparent on-line re-indexing

Conclusion - SOLR Strengths

● Not only full-text search

● Lightning fast given enough RAM

● Good scale out support
including multi-data center

● Great community, wide range of use cases

Conclusion - SOLR's Gaps

● Not fully elastic

● Real time takes work

● Secondary data store = sync overhead, inconsistencies

● Schema changes

Questions

robby.morgan@
bazaarvoice.com

developer.bazaarvoice.com

SOLR Power FTW: short version

More Related Content

What's hot (20)

Similar to SOLR Power FTW: short version (20)

Recently uploaded (20)

SOLR Power FTW: short version