SlideShare a Scribd company logo
+

INTRO TO APACHE
SOLR FOR DRUPAL
Presentation by Chris Caple

drupal.org username: reallyordinary
http://drupal.org/user/791914



Presented at May 30, 2011 Toronto Drupal usergroup meetup
WHAT IS APACHE SOLR?
• verypopular, extremely fast Java-based open source enterprise
 search platform from the Apache Lucene project

• runsas a standalone full-text search server within a servlet
 container such as Tomcat

• not   an acronym - doesn’t stand for anything

• powers  the search and navigation features on many of the
 world’s largest sites
SITES LIKE...
• the White   House   • Zappos

• AOL                 • SourceForge

• eHarmony            • Buy.com

• Ticketmaster        • the   Internet Archive

• GameSpot            • Citysearch

• The   Guardian      • eTrade

• Netflix              • Chowhound

• CNET     Reviews    • Homestars.com
And of course... drupal.org
• so     the point is - it’s great for large, high traffic sites

• it’s   heavy duty, internet-scale stuff

• butit’ll also serve you well on smaller scale but ambitious
  Drupal sites
A BIT OF HISTORY
• initially
         developed by CNET Networks as in-house search
  platform in 2004 called “Solar”

• CNET  granted existing codebase to Apache Software
  Foundation in 2006 - name changed to “Solr”

• in   January 2007 Solr became a Lucene subproject

• in   March 2010, Solr and Lucene-java merged
WHAT IS APACHE LUCENE?
The Apache Lucene project develops open source search
    software, including:

• Apache    Lucene Core (formerly Lucene Java) - provides Java-
    based indexing and search, plus spellchecking, hit highlighting,
    and advanced analysis/tokenization capabilities

•   Apache Solr

• Apache    PyLucene - a Python port of Lucene Core

• Apache    Open Relevance Project - collects and distributes
    free materials for relevance testing & performance
LIMITATIONS OF DEFAULT
     DRUPAL SEARCH
• default   Drupal search is decent for smaller sites

• doesn’t
        deal well with large amounts of content (say 10k+
 nodes) - doesn’t scale; gets bogged down

• limited   operators

• integrated    - it runs and searches directly on the same database

• SQL   was not designed as a searching language

• “Relational Database Management Systems (RDBMS) are
 physically incapable of handling search well.”
• thereare several modules that enhance core search by
 providing stuff like faceted search and improved stemming

• butthere’s no getting around its performance limitations and
 lack of scalability
BENEFITS OF USING SOLR
1. Index and make searchable a really large amount of content -
  from 10k+ nodes up into the millions

2. Provide faceted search-based navigation so users can find
  content faster & more intuitively, drilling down into content by
  date, author, tags, content type, & other attributes

3. Provide search autocomplete, spelling suggestions, and
  content recommendations
4. Provide a faster search experience than the default Drupal
  search is able to

5. Give site visitors access to simple, easy to use advanced
  search features without confronting them with the “advanced
  search” page

6. Provide users with the ability to do location-based search - to
  filter results by geographic location

7. Expose all attributes of nodes to search
8. Place search functions on a completely separate server


                              Web server +
                                 PHP

                                                                             GET to
                 SQL
                                                                             search
                                           POST to
                                            index



           database                                                       Solr server


                 Diagram adapted from Robert Douglass’ 2008 slide set - see Resources
KEY SOLR FEATURES
• powerful     full-text search   • content    recommendations

• hit   highlighting              • rich
                                       document (ex: Word,
                                    PDF) handling
• faceted    search
                                  • geospatial   search
• dynamic     clustering
                                  • allattributes of nodes are
• relevance    highlighting         searchable
• autocorrection                  • highly   scalable
• caching                         • can be run on a completely
                                    physically separate server
• multi-site   search
WHAT’S FACETED SEARCH?
• facetedsearch is dynamic clustering of items or search results
 into categories that let users drill down into search results (or
 even skip searching entirely) by any value in that field

• eachfacet also shows the number of hits within the search
 that match that category

• faceted search is also called faceted browsing, faceted
 navigation, guided navigation and sometimes parametric search
FACETED SEARCH EXAMPLE
diagram source: Lucid Imagination - http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
QUICK SOLR DEMOS ON LIVE
       DRUPAL SITES
Whitehouse.gov



Drupal.org



GoChicOrGoHome.com




New York Public Library
HOW DO YOU SET IT UP?
You’ll need:
•   Java 5 or higher

•   PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have
    PECL JSON extension or Zend Framework JSON classes
1. Go to the Apache Solr Search Integration project page
  http://drupal.org/project/apachesolr

2. Install the module

3. Grab the Solr PHP library via svn OR get the bundled Acquia
  Search download

4. Enable the module

5. Download Solr 1.4 and unpack outside of Drupal directory
6. Rename the existing files apache-solr-nightly/example/solr/
  conf/schema.xml and solrconfig.xml to *.bak to get them out
  of the way

7. Copy schema.xml and solrconfig.xml that come with Apache
  Solr Drupal module to take their place

8. Start Solr by opening a shell (Putty, Mac Terminal), going to
  the apache-solr-nightly/example folder, and executing
  command java -jar start.jar
9. Test that Solr server is available at http://localhost:8983/solr/
  admin

10. Make sure both the main Apache Solr Framework and
 Apache Solr Search modules are enabled - if the Solr Search
 module isn’t enabled, no indexing will occur

11. Run cron until your content is indexed

12. Enable blocks for facets
PRO TIP 1:
DISABLING CORE SEARCH
        INDEXER
•   Apache Solr module depends on Drupal’s core Search
    module

•   when Solr is enabled, the Search module will also be enabled

•   as soon as the core Search module is enabled it starts to
    index all your nodes

•   this takes time to run and fills up the database
    (search_dataset, search_index... tables)
•   if you’re installing Solr Search, you don’t need Drupal’s core
    search form

•   you replace it with the Solr one by going to the Solr module
    settings and clicking “Make Apache Solr Search the default”

•   this disables the core Search module’s form - but not the
    indexing
•   to disable the indexing - and save some CPU cycles and
    database space - go to your site’s search settings at admin/
    settings/search and set the “number of items to index per cron
    run” to 0




      Thanks to DrupalCoder.com for this tip - http://www.drupalcoder.com/blog/performance-tip-disable-drupals-core-search-indexer-when-using-apache-solr
PRO TIP 2:
CRON VS. ELYSIA CRON
•   Solr Search indexing is triggered by cron runs

•   default Drupal cron job triggers all cron tasks at the same time

•   this can be a serious drag on performance and can cause cron
    runs to fail if one or more tasks doesn’t finish in the allotted
    cron period

•   to get around this, use...
•   Elysia Cron - http://drupal.org/project/elysia_cron

•   expands cron capabilities - gives you crontab-like scheduling so
    you can run different tasks at different times and frequencies

•   so for example - set Solr Search to index 1000 nodes every
    15 minutes, while other cron tasks are set to run once every
    hour
•   to get fastest indexing on your server, experiment with
    different numbers of items to index per cron run and different
    cron run times until you find the max your server is capable of
    handling

•   ex: try indexing 1000 items per cron run and set the cron to
    run every 5 minutes

•   if you don’t get any errors, you’re good
DRUSH
•   Solr Search integrates with Drush

•   you can call Solr tasks from the Drush command line

•   commands include...
•   solr-delete-index
    Deletes the contents of the index. Can take content types as
    parameters

•   solr-index
    Send to Solr content marked for (re)indexing. Same as running
    cron once but without the other overhead

•   solr-reindex
    Marks content for reindexing. Can take content types as
    parameters

•   solr-search
    Search the site for keywords using Apache Solr
ACQUIA SEARCH
•   Acquia has a hosted SaaS version of Solr that they call Acquia
    Search

•   it’s plug and play and available for Drupal 6 and 7

•   gives you all the power of Solr without having to install any
    software (beyond the Solr Drupal modules) or manage any
    servers

•   really easy to set up, really fast and robust, kind of pricey

•   http://acquia.com/products-services/acquia-search
•   you can get a 30 day free trial of Acquia Search at http://
    acquia.com/trial

•   easiest way to test drive Solr
SOLR + VIEWS 3 =
THE (VERY NEAR) FUTURE
•   this is where it starts to get even more interesting

•   Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7)
    allows you to make custom searches against the Solr index the
    same way you currently make views against the MySQL
    database

•   ex: build a Solr search that just includes videos and MP3s and
    render the results as a playlist

•   ex: a Solr search that’s limited to the current user’s images,
    displayed as a slideshow
•   upshot: you can bypass the Drupal database and build your
    content straight off the Solr index

•   no database queries

•   no complex views queries with tons of joins

•   no node_load() calls for displaying the results
RESOURCES
•   best place to start learning is on the Solr Search docs page on
    drupal.org at -
    http://drupal.org/node/343467

•   Robert Douglass did a great Solr presentation in 2008 - slides
    are online at http://www.slideshare.net/robertDouglass/
    apachesolr-presentation-from-do-it-with-drupal-presentation

•   the book “Solr 1.4 Enterprise Search Server” is apparently
    good - review here:
    http://www.drupalcoder.com/blog/book-review-from-a-drupal-
    point-of-view-solr-14-enterprise-search-server
•   great article by Robert Douglass - “Views 3 + Apache Solr +
    Acquia Drupal = The Future of Search”
    http://acquia.com/blog/views-3-apache-solr-acquia-drupal-
    future-search

•   article - “Three things we learned from indexing a Drupal site
    with millions of nodes in Apache Solr” -
    http://www.drupalcoder.com/blog/three-things-we-learned-
    from-indexing-a-drupal-site-with-millions-of-nodes-in-apache-
    solr

•   article - “Geospatial Apache Solr searching in Drupal 6 by
    upgrading Solr to 3.1” -
    http://thedrupalblog.com/geospatial-apache-solr-searching-
    drupal-6-upgrading-solr-31
•   how to install Solr on Mac OS X Snow Leopard -
    http://www.drupalcoder.com/blog/installing-apache-solr-in-
    tomcat-for-drupal-on-snow-leopard

•   setting up Drupal 6 with Apache Solr on Tomcat 6 and
    Ubuntu 9.10 -
    http://www.nickveenhof.be/blog/setup-drupal-6-apache-solr-
    tomcat-6-and-ubuntu-910-karmic-koala

•   Configuring Apache Solr Multi-core with Drupal and Tomcat
    on Ubuntu 9.10 -
    http://drupalconnect.com/blog/steve/configuring-apache-solr-
    multi-core-drupal-and-tomcat-ubuntu-910
•   Jetty powered multicore Apache Solr and Drupal in Ubuntu
    10.04 -
    http://vladgh.com/blog/jetty-powered-multicore-apache-solr-
    and-drupal-ubuntu-1004

•   Solr tutorials on the official Apache Solr site -
    http://lucene.apache.org/solr/tutorial.html

•   the official Apache Solr wiki -
    http://wiki.apache.org/solr/FrontPage

•   DrupalCamp Montreal 2009 video presentation on Solr -
    http://yadadrop.com/drupal-video/drupal-apache-solr-setup-
    configuration-extensions-hooks

More Related Content

What's hot (20)

PPT
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
PDF
Solr Flair
Erik Hatcher
 
PPTX
Apache Solr
Minh Tran
 
ODP
Introduction to Apache Solr
Shalin Shekhar Mangar
 
PDF
Solr: 4 big features
David Smiley
 
PDF
Apache Solr crash course
Tommaso Teofili
 
PPT
Introduction to Apache Solr.
ashish0x90
 
PDF
Introduction to Solr
Erik Hatcher
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
PDF
it's just search
Erik Hatcher
 
PDF
Solr Recipes Workshop
Erik Hatcher
 
PDF
Introduction Apache Solr & PHP
Hiraq Citra M
 
PDF
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
PDF
Get the most out of Solr search with PHP
Paul Borgermans
 
PPTX
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
PDF
Solr Indexing and Analysis Tricks
Erik Hatcher
 
PDF
Apache Solr! Enterprise Search Solutions at your Fingertips!
Murshed Ahmmad Khan
 
PDF
Solr Black Belt Pre-conference
Erik Hatcher
 
PDF
Introduction to Apache Solr
Alexandre Rafalovitch
 
PPTX
Introduction to Apache Lucene/Solr
Rahul Jain
 
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
Solr Flair
Erik Hatcher
 
Apache Solr
Minh Tran
 
Introduction to Apache Solr
Shalin Shekhar Mangar
 
Solr: 4 big features
David Smiley
 
Apache Solr crash course
Tommaso Teofili
 
Introduction to Apache Solr.
ashish0x90
 
Introduction to Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Erik Hatcher
 
it's just search
Erik Hatcher
 
Solr Recipes Workshop
Erik Hatcher
 
Introduction Apache Solr & PHP
Hiraq Citra M
 
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
Get the most out of Solr search with PHP
Paul Borgermans
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Murshed Ahmmad Khan
 
Solr Black Belt Pre-conference
Erik Hatcher
 
Introduction to Apache Solr
Alexandre Rafalovitch
 
Introduction to Apache Lucene/Solr
Rahul Jain
 

Viewers also liked (20)

PPTX
Building a real time, solr-powered recommendation engine
Trey Grainger
 
PPT
Apache Solr search for Drupal. Ievgen Kartakov.
DrupalCampDN
 
PDF
Solr rug
phoet
 
PPTX
Drupal Overview For Techies
Robert Carr
 
KEY
State-of-the-Art Drupal Search with Apache Solr
Robert Douglass
 
PDF
Drupal 8: Most common beginner mistakes
Iztok Smolic
 
PDF
Things Made Easy: One Click CMS Integration with Solr & Drupal
lucenerevolution
 
PDF
Apache Solr - An Experience Report
Netcetera
 
ODP
Single Page Applications in Drupal
Chris Tankersley
 
PDF
Making Sense of Twig
Brandon Kelly
 
PDF
State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015
Dropsolid
 
PDF
Drupal 8: Entities
drubb
 
PDF
Drupal 8 templating with twig
Taras Omelianenko
 
PDF
Drupal 8: TWIG Template Engine
drubb
 
PDF
Drupal 8: Theming
drubb
 
PPTX
Drupal 7 and SolR
Patrick Morin
 
PPTX
Drupal 8 + Elasticsearch + Docker
Roald Umandal
 
PDF
Migrating Fast to Solr
Cominvent AS
 
PDF
Sharding for the masses
Giuseppe Maxia
 
PDF
Solr for Indexing and Searching Logs
Sematext Group, Inc.
 
Building a real time, solr-powered recommendation engine
Trey Grainger
 
Apache Solr search for Drupal. Ievgen Kartakov.
DrupalCampDN
 
Solr rug
phoet
 
Drupal Overview For Techies
Robert Carr
 
State-of-the-Art Drupal Search with Apache Solr
Robert Douglass
 
Drupal 8: Most common beginner mistakes
Iztok Smolic
 
Things Made Easy: One Click CMS Integration with Solr & Drupal
lucenerevolution
 
Apache Solr - An Experience Report
Netcetera
 
Single Page Applications in Drupal
Chris Tankersley
 
Making Sense of Twig
Brandon Kelly
 
State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015
Dropsolid
 
Drupal 8: Entities
drubb
 
Drupal 8 templating with twig
Taras Omelianenko
 
Drupal 8: TWIG Template Engine
drubb
 
Drupal 8: Theming
drubb
 
Drupal 7 and SolR
Patrick Morin
 
Drupal 8 + Elasticsearch + Docker
Roald Umandal
 
Migrating Fast to Solr
Cominvent AS
 
Sharding for the masses
Giuseppe Maxia
 
Solr for Indexing and Searching Logs
Sematext Group, Inc.
 
Ad

Similar to Intro to Apache Solr for Drupal (20)

KEY
Apache Solr - Enterprise search platform
Tommaso Teofili
 
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
PDF
The First Class Integration of Solr with Hadoop
lucenerevolution
 
PPTX
Solr
Peter Svehla
 
PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
ODP
Searchlight
jeremyfrench
 
PDF
Introduction to Solr
Erik Hatcher
 
PPTX
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
 
PDF
Solr search engine with multiple table relation
Jay Bharat
 
PDF
Search On Hadoop
bigdatagurus_meetup
 
PPTX
Sugblr sitecore search - absolute basics
Anindita Bhattacharya
 
PPTX
Sitecore search absolute basics
Gopikrishna Gujjula
 
KEY
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
 
PDF
Search all the things
cyberswat
 
PDF
New Persistence Features in Spring Roo 1.1
Stefan Schmidt
 
PPTX
Practical Machine Learning for Smarter Search with Spark+Solr
Jake Mannix
 
PPTX
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
PDF
Cloudera search
Mark Kerzner
 
PPTX
Scaling SolrCloud to a large number of Collections
Anshum Gupta
 
Apache Solr - Enterprise search platform
Tommaso Teofili
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
The First Class Integration of Solr with Hadoop
lucenerevolution
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Searchlight
jeremyfrench
 
Introduction to Solr
Erik Hatcher
 
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
 
Solr search engine with multiple table relation
Jay Bharat
 
Search On Hadoop
bigdatagurus_meetup
 
Sugblr sitecore search - absolute basics
Anindita Bhattacharya
 
Sitecore search absolute basics
Gopikrishna Gujjula
 
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
 
Search all the things
cyberswat
 
New Persistence Features in Spring Roo 1.1
Stefan Schmidt
 
Practical Machine Learning for Smarter Search with Spark+Solr
Jake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
Cloudera search
Mark Kerzner
 
Scaling SolrCloud to a large number of Collections
Anshum Gupta
 
Ad

Recently uploaded (20)

PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
July Patch Tuesday
Ivanti
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Français Patch Tuesday - Juillet
Ivanti
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
July Patch Tuesday
Ivanti
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 

Intro to Apache Solr for Drupal

  • 2. Presentation by Chris Caple drupal.org username: reallyordinary http://drupal.org/user/791914 Presented at May 30, 2011 Toronto Drupal usergroup meetup
  • 4. • verypopular, extremely fast Java-based open source enterprise search platform from the Apache Lucene project • runsas a standalone full-text search server within a servlet container such as Tomcat • not an acronym - doesn’t stand for anything • powers the search and navigation features on many of the world’s largest sites
  • 6. • the White House • Zappos • AOL • SourceForge • eHarmony • Buy.com • Ticketmaster • the Internet Archive • GameSpot • Citysearch • The Guardian • eTrade • Netflix • Chowhound • CNET Reviews • Homestars.com
  • 7. And of course... drupal.org
  • 8. • so the point is - it’s great for large, high traffic sites • it’s heavy duty, internet-scale stuff • butit’ll also serve you well on smaller scale but ambitious Drupal sites
  • 9. A BIT OF HISTORY
  • 10. • initially developed by CNET Networks as in-house search platform in 2004 called “Solar” • CNET granted existing codebase to Apache Software Foundation in 2006 - name changed to “Solr” • in January 2007 Solr became a Lucene subproject • in March 2010, Solr and Lucene-java merged
  • 11. WHAT IS APACHE LUCENE?
  • 12. The Apache Lucene project develops open source search software, including: • Apache Lucene Core (formerly Lucene Java) - provides Java- based indexing and search, plus spellchecking, hit highlighting, and advanced analysis/tokenization capabilities • Apache Solr • Apache PyLucene - a Python port of Lucene Core • Apache Open Relevance Project - collects and distributes free materials for relevance testing & performance
  • 13. LIMITATIONS OF DEFAULT DRUPAL SEARCH
  • 14. • default Drupal search is decent for smaller sites • doesn’t deal well with large amounts of content (say 10k+ nodes) - doesn’t scale; gets bogged down • limited operators • integrated - it runs and searches directly on the same database • SQL was not designed as a searching language • “Relational Database Management Systems (RDBMS) are physically incapable of handling search well.”
  • 15. • thereare several modules that enhance core search by providing stuff like faceted search and improved stemming • butthere’s no getting around its performance limitations and lack of scalability
  • 17. 1. Index and make searchable a really large amount of content - from 10k+ nodes up into the millions 2. Provide faceted search-based navigation so users can find content faster & more intuitively, drilling down into content by date, author, tags, content type, & other attributes 3. Provide search autocomplete, spelling suggestions, and content recommendations
  • 18. 4. Provide a faster search experience than the default Drupal search is able to 5. Give site visitors access to simple, easy to use advanced search features without confronting them with the “advanced search” page 6. Provide users with the ability to do location-based search - to filter results by geographic location 7. Expose all attributes of nodes to search
  • 19. 8. Place search functions on a completely separate server Web server + PHP GET to SQL search POST to index database Solr server Diagram adapted from Robert Douglass’ 2008 slide set - see Resources
  • 21. • powerful full-text search • content recommendations • hit highlighting • rich document (ex: Word, PDF) handling • faceted search • geospatial search • dynamic clustering • allattributes of nodes are • relevance highlighting searchable • autocorrection • highly scalable • caching • can be run on a completely physically separate server • multi-site search
  • 23. • facetedsearch is dynamic clustering of items or search results into categories that let users drill down into search results (or even skip searching entirely) by any value in that field • eachfacet also shows the number of hits within the search that match that category • faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search
  • 24. FACETED SEARCH EXAMPLE diagram source: Lucid Imagination - http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
  • 25. QUICK SOLR DEMOS ON LIVE DRUPAL SITES
  • 27. HOW DO YOU SET IT UP?
  • 28. You’ll need: • Java 5 or higher • PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have PECL JSON extension or Zend Framework JSON classes
  • 29. 1. Go to the Apache Solr Search Integration project page http://drupal.org/project/apachesolr 2. Install the module 3. Grab the Solr PHP library via svn OR get the bundled Acquia Search download 4. Enable the module 5. Download Solr 1.4 and unpack outside of Drupal directory
  • 30. 6. Rename the existing files apache-solr-nightly/example/solr/ conf/schema.xml and solrconfig.xml to *.bak to get them out of the way 7. Copy schema.xml and solrconfig.xml that come with Apache Solr Drupal module to take their place 8. Start Solr by opening a shell (Putty, Mac Terminal), going to the apache-solr-nightly/example folder, and executing command java -jar start.jar
  • 31. 9. Test that Solr server is available at http://localhost:8983/solr/ admin 10. Make sure both the main Apache Solr Framework and Apache Solr Search modules are enabled - if the Solr Search module isn’t enabled, no indexing will occur 11. Run cron until your content is indexed 12. Enable blocks for facets
  • 32. PRO TIP 1: DISABLING CORE SEARCH INDEXER
  • 33. Apache Solr module depends on Drupal’s core Search module • when Solr is enabled, the Search module will also be enabled • as soon as the core Search module is enabled it starts to index all your nodes • this takes time to run and fills up the database (search_dataset, search_index... tables)
  • 34. if you’re installing Solr Search, you don’t need Drupal’s core search form • you replace it with the Solr one by going to the Solr module settings and clicking “Make Apache Solr Search the default” • this disables the core Search module’s form - but not the indexing
  • 35. to disable the indexing - and save some CPU cycles and database space - go to your site’s search settings at admin/ settings/search and set the “number of items to index per cron run” to 0 Thanks to DrupalCoder.com for this tip - http://www.drupalcoder.com/blog/performance-tip-disable-drupals-core-search-indexer-when-using-apache-solr
  • 36. PRO TIP 2: CRON VS. ELYSIA CRON
  • 37. Solr Search indexing is triggered by cron runs • default Drupal cron job triggers all cron tasks at the same time • this can be a serious drag on performance and can cause cron runs to fail if one or more tasks doesn’t finish in the allotted cron period • to get around this, use...
  • 38. Elysia Cron - http://drupal.org/project/elysia_cron • expands cron capabilities - gives you crontab-like scheduling so you can run different tasks at different times and frequencies • so for example - set Solr Search to index 1000 nodes every 15 minutes, while other cron tasks are set to run once every hour
  • 39. to get fastest indexing on your server, experiment with different numbers of items to index per cron run and different cron run times until you find the max your server is capable of handling • ex: try indexing 1000 items per cron run and set the cron to run every 5 minutes • if you don’t get any errors, you’re good
  • 40. DRUSH
  • 41. Solr Search integrates with Drush • you can call Solr tasks from the Drush command line • commands include...
  • 42. solr-delete-index Deletes the contents of the index. Can take content types as parameters • solr-index Send to Solr content marked for (re)indexing. Same as running cron once but without the other overhead • solr-reindex Marks content for reindexing. Can take content types as parameters • solr-search Search the site for keywords using Apache Solr
  • 44. Acquia has a hosted SaaS version of Solr that they call Acquia Search • it’s plug and play and available for Drupal 6 and 7 • gives you all the power of Solr without having to install any software (beyond the Solr Drupal modules) or manage any servers • really easy to set up, really fast and robust, kind of pricey • http://acquia.com/products-services/acquia-search
  • 45. you can get a 30 day free trial of Acquia Search at http:// acquia.com/trial • easiest way to test drive Solr
  • 46. SOLR + VIEWS 3 = THE (VERY NEAR) FUTURE
  • 47. this is where it starts to get even more interesting • Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7) allows you to make custom searches against the Solr index the same way you currently make views against the MySQL database • ex: build a Solr search that just includes videos and MP3s and render the results as a playlist • ex: a Solr search that’s limited to the current user’s images, displayed as a slideshow
  • 48. upshot: you can bypass the Drupal database and build your content straight off the Solr index • no database queries • no complex views queries with tons of joins • no node_load() calls for displaying the results
  • 50. best place to start learning is on the Solr Search docs page on drupal.org at - http://drupal.org/node/343467 • Robert Douglass did a great Solr presentation in 2008 - slides are online at http://www.slideshare.net/robertDouglass/ apachesolr-presentation-from-do-it-with-drupal-presentation • the book “Solr 1.4 Enterprise Search Server” is apparently good - review here: http://www.drupalcoder.com/blog/book-review-from-a-drupal- point-of-view-solr-14-enterprise-search-server
  • 51. great article by Robert Douglass - “Views 3 + Apache Solr + Acquia Drupal = The Future of Search” http://acquia.com/blog/views-3-apache-solr-acquia-drupal- future-search • article - “Three things we learned from indexing a Drupal site with millions of nodes in Apache Solr” - http://www.drupalcoder.com/blog/three-things-we-learned- from-indexing-a-drupal-site-with-millions-of-nodes-in-apache- solr • article - “Geospatial Apache Solr searching in Drupal 6 by upgrading Solr to 3.1” - http://thedrupalblog.com/geospatial-apache-solr-searching- drupal-6-upgrading-solr-31
  • 52. how to install Solr on Mac OS X Snow Leopard - http://www.drupalcoder.com/blog/installing-apache-solr-in- tomcat-for-drupal-on-snow-leopard • setting up Drupal 6 with Apache Solr on Tomcat 6 and Ubuntu 9.10 - http://www.nickveenhof.be/blog/setup-drupal-6-apache-solr- tomcat-6-and-ubuntu-910-karmic-koala • Configuring Apache Solr Multi-core with Drupal and Tomcat on Ubuntu 9.10 - http://drupalconnect.com/blog/steve/configuring-apache-solr- multi-core-drupal-and-tomcat-ubuntu-910
  • 53. Jetty powered multicore Apache Solr and Drupal in Ubuntu 10.04 - http://vladgh.com/blog/jetty-powered-multicore-apache-solr- and-drupal-ubuntu-1004 • Solr tutorials on the official Apache Solr site - http://lucene.apache.org/solr/tutorial.html • the official Apache Solr wiki - http://wiki.apache.org/solr/FrontPage • DrupalCamp Montreal 2009 video presentation on Solr - http://yadadrop.com/drupal-video/drupal-apache-solr-setup- configuration-extensions-hooks

Editor's Notes