SlideShare a Scribd company logo
Rapid Prototyping
      with
       Solr
      uberconf - July 14, 2011
     Presented by Erik Hatcher
erik.hatcher@lucidimagination.com
         Lucid Imagination
 http://www.lucidimagination.com
About me...

• Co-author, “Lucene in Action”
• Commiter, Lucene and Solr
• Lucene PMC and ASF member
• Member of Technical Staff / co-founder,
  Lucid Imagination
About Lucid Imagination...
•   Lucid Imagination provides commercial-grade
    support, training, high-level consulting and value-
    added software for Lucene and Solr.

•   We make Lucene ‘enterprise-ready’ by offering:

    •   Free, certified, distributions and downloads.

    •   Support, training, and consulting.

    •   LucidWorks Enterprise, a commercial search
        platform built on top of Solr.
Abstract
Got data? Let's make it searchable! Rapid Prototyping with
Solr will demonstrate getting documents into Solr quickly,
provide some tips in adjusting Solr's schema to match your
needs better, and finally will discuss how to showcase your
  data in a flexible search user interface. We'll see how to
  rapidly leverage faceting, highlighting, spell checking, and
debugging. Even after all that, there will be enough time left
    to outline the next steps in developing your search
           application and taking it to production.
What is Lucene?
•   An open source Java-based IR library with best practice indexing
    and query capabilities, fast and lightweight search and indexing.

•   100% Java (.NET, Perl and other versions too).

•   Stable, mature API.

•   Continuously improved and tuned over more than 10 years.

•   Cleanly implemented, easy to embed in an application.

•   Compact, portable index representation.

•   Programmable text analyzers, spell checking and highlighting.

•   Not a crawler or a text extraction tool.
Lucene's History
•   Created by Doug Cutting in 1999

    •   built on ideas from search projects Doug created at Xerox PARC
        and Apple.

•   Donated to the Apache Software Foundation (ASF) in 2001.

•   Became an Apache top-level project in 2005.

•   Has grown and morphed through the years and is now both:

    •   A search library.

    •   An ASF Top-Level Project (TLP) encompassing several sub-projects.

•   Lucene and Solr "merged" development in early 2010.
What is Solr?
•   An open source search engine.

•   Indexes content sources, processes query requests, returns
    search results.

•   Uses Lucene as the "engine", but adds full enterprise search
    server features and capabilities.

•   A web-based application that processes HTTP requests and
    returns HTTP responses.

•   Initially started in 2004 and developed by CNET as an in-house
    project to add search capability for the company website.

•   Donated to ASF in 2006.
What Version of Solr?

•   There’s more than one answer!

•   The current, released, stable version is 3.3

•   The development release is referred to as “trunk”.

    •   This is where the new, less tested work goes on

    •   Also referred to as 4.0

•   LucidWorks Enterprise is built on a trunk snapshot +
    additional features.
Why prototype?

• Demonstrate Solr can handle your needs
• Stake/purse-holder buy-in
• It's quick, easy, and fun!
• The User Interface is the app
Workflow

• Ingest data
• Use
• Refine config/interactions, repeat
Got Data?
• Rich text files?
• Databases?
• Feeds (Atom/RSS/XML)?
• 3rd party repositories? (SharePoint,
  Documentum, ...)
• CSV!!!!
User Interface
Getting Started
• Download Solr
 • http://lucene.apache.org/solr
• "Install" it
 • unzip or tar -xvf
• Start it
 • cd example; java -jar start.jar
e.g. Conference
               Attendees

First Name,Last Name,Company,Title,Work Country

Erik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA

.

.

.
First Try

curl "http://localhost:8983/solr/update/csv?stream.file=attendees.csv"

undefined field First Name
Dynamic Fields


<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text" indexed="true" stored="true"/>
Second try

curl "http://localhost:8983/solr/update/csv?
stream.file=attendees.csv
&fieldnames=first_s,last_s,company_s,title_t,country_s
&header=true"

Document [null] missing required field: id
uniqueKey

• Optional, Solr-specific, feature
• generally "string" type
• schema.xml: <uniqueKey>id</uniqueKey>
• adds of existing id'd documents updates
  (delete + add)
id
curl "http://localhost:8983/solr/update/csv
?stream.file=attendees.csv
&fieldnames=first_s,   id,company_s,title_t,co   untry_s&header=true"


<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">40</int>
  </lst>
</response>
Tada!
Schema tinkering
•   Removed all example field definitions

•   Uncomment and adjust catch-all dynamic field:

    • <dynamicField   name="*" type="string"
        multiValued="false"/>

•   Ensure uniqueKey is appropriate

    •   unusual in this example, disabled it

•   Make every document/field fully searchable!

    • <copyField         source="*" dest="text"/>
After adjusting config...


• Restart Solr
• Or... reload the core (when in multicore
  mode)
Clean import
# Delete all documents
curl "http://localhost:8983/solr/update?stream.body=
%3Cdelete%3E%3Cquery %3E*:*%3C/query%3E%3C/delete
%3E&commit=true"

# Index your data
curl "http://localhost:8983/solr/update/csv?
commit=true&stream.file=EuroCon2010.csv&fieldnames=first
,last, company,title,country&header=true"
Facets
http://localhost:8983/solr/browse?facet.field=country
Value Normalization

• http://localhost:8983/solr/update/csv?
  commit=true&stream.file=attendees.csv&fi
  eldnames=first,last,company,title,country&h
  eader=true&f.country.map=Great
  +Britain:United+Kingdom
Polishing

• Customize request handler mappings
• Edit templates
 • hit display
 • header/footer
 • style
/browse
<requestHandler name="/browse" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="wt">velocity</str><str name="v.layout">layout</str>
    <str name="v.template">browse</str>

    <str name="rows">10</str><str name="fl">*,score</str>

    <str   name="defType">lucene</str><str name="q">*:*</str>
    <str   name="debugQuery">true</str>
    <str   name="hl">on</str><str name="hl.fl">title</str>
    <str   name="hl.fragsize">0</str>
    <str   name="hl.alternateField">title</str>

    <str name="facet">on</str>
    <str name="facet.mincount">1</str>
    <str name="facet.missing">true</str>
  </lst>
  <lst name="appends"><str name="facet.field">country</str></lst>
</requestHandler>
hit.vm

<div class="result-document">
  <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p>
  <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p>
  <p>$!doc.getFieldValue('country')</p>
</div>
Voila!
Adding bells and
            whistles
•   jQuery
    •   <script type="text/javascript" src="/solr/
        admin/jquery.js"/>
•   TreeMap
    •   <script type="text/javascript" src="/scripts/
        treemap.js"/>
TreeMap code
<script type="text/javascript">
 function onLoad() {
   jQuery("#treemap-country").treemap(640,480, {});
 }
</script>
----------------------------
<body onload="onLoad();">
----------------------------
<table id="treemap-country">
#foreach($facet in $response.getFacetField('country').values)
   <tr>
     <td>#if($facet.name)
$esc.html($facet.name)#else&lt;Unspecified&gt;#end</td>
     <td>$facet.count</td>
     <td>#if($facet.name)$esc.html($facet.name)#{else}
Unspecified#end</td>
   </tr>
#end
</table>
TreeMap
Ajax fun: giveaways

• Add a "static" templated page
• jQuery Ajax request
• snippet templated output
solrconfig.xml
                 "static" page
<requestHandler name="/giveaways"
class="solr.DumpRequestHandler">
 <lst name="defaults">
  <str name="wt">velocity</str>
  <str name="v.template">giveaways</str>
  <str name="v.layout">layout</str>
 </lst>
</requestHandler>

giveaways.vm
<input type="button" value="Pick a Winner"
onClick="javascript:$ ('#winner').load('/solr/
generate_winner?sort=random_' + new Date().getTime() +
'+asc');">
 <h2>And the winner is...</h2> <center><font
size="20"><div id="winner"></div></font></center>
fragment template
solrconfig.xml
<requestHandler name="/generate_winner" class="solr.SearchHandler">
  <!-- sort=random_... required -->
  <lst name="defaults">
    <str name="wt">velocity</str>
    <str name="v.template">winner</str>
    <str name="rows">1</str>
    <str name="fl">first,last</str>
    <str name="defType">lucene</str>
    <str name="q">*:*
                  -company:"Lucid Imagination"
                  -company:"Stone Circle Productions"</str>
  </lst>
</requestHandler>

winner.vm
#set($winner=$response.results.get(0))
$winner.getFieldValue('first') $winner.getFieldValue('last')
And the winner is...
e.g. data.gov
Data.gov CSV catalog
URL,Title,Agency,Subagency,Category,Date Released,Date Updated,Time
Period,Frequency,Description,Data.gov Data Category Type,Specialized Data Category
Designation,Keywords,Citation,Agency Program Page,Agency Data Series Page,Unit of
Analysis,Granularity,Geographic Coverage,Collection Mode,Data Collection
Instrument,Data Dictionary/Variable List,Applicable Agency Information Quality
Guideline Designation,Data Quality Certification,Privacy and Confidentiality,Technical
Documentation,Additional Metadata,FGDC Compliance (Geospatial Only),Statistical
Methodology,Sampling,Estimation,Weighting,Disclosure Avoidance,Questionnaire
Design,Series Breaks,Non-response Adjustment,Seasonal Adjustment,Statistical
Characteristics,Feeds Access Point,Feeds File Size,XML Access Point,XML File Size,CSV/
TXT Access Point,CSV/TXT File Size,XLS Access Point,XLS File Size,KML/KMZ Access
Point,KML File Size,ESRI Access Point,ESRI File Size,Map Access Point,Data Extraction
Access Point,Widget Access Point
"http://www.data.gov/details/4","Next Generation Radar (NEXRAD) Locations","Department of Commerce","National Oceanic
and Atmospheric Administration","Geography and Environment","1991","Irregular as needed","1991 to present","Between 4
and 10 minutes","This geospatial rendering of weather radar sites gives access to an historical archive of Terminal
Doppler Weather Radar data and is used primarily for research purposes. The archived data includes base data and
derived products of the National Weather Service (NWS) Weather Surveillance Radar 88 Doppler (WSR-88D) next generation
(NEXRAD) weather radar. Weather radar detects the three meteorological base data quantities: reflectivity, mean radial
velocity, and spectrum width. From these quantities, computer processing generates numerous meteorological analysis
products for forecasts, archiving and dissemination. There are 159 operational NEXRAD radar systems deployed
throughout the United States and at selected overseas locations. At the Radar Operations Center (ROC) in Norman OK,
personnel from the NWS, Air Force, Navy, and FAA use this distributed weather radar system to collect the data needed
to warn of impending severe weather and possible flash floods; support air traffic safety and assist in the management
of air traffic flow control; facilitate resource protection at military bases; and optimize the management of water,
agriculture, forest, and snow removal. This data set is jointly owned by the National Oceanic and Atmospheric
Administration, Federal Aviation Administration, and Department of Defense.","Raw Data Catalog",...
Rapid Prototyping with Solr
Debugging
http://localhost:8983/solr/data.gov?q=searching&debugQuery=true
Mapping field values
• CSV update handler can map field values
• &f.privacy_and_confidentiality.map=YES:Yes
  &f.data_quality_certification.map=YES:Yes
Splitting keywords
• CSV handler: f.keywords.split=true
 • stored values are split, multivalued
• Or via schema
 • Stored value remains as in original, single valued
<fieldType name="comma_separated" class="solr.TextField" omitNorms="true">
  <analyzer>
    <tokenizer class="solr.PatternTokenizerFactory" pattern="s*,s*"/>
  </analyzer>
</fieldType>
...
<field name="keywords" type="comma_separated" indexed="true" stored="true"/>
Suggest
      • Suggest terms as user types in search box
      • Technique: jQuery autocomplete, Solr’s
         TermsComponent,Velocity template
http://localhost:8983/solr/terms
?terms.fl=suggest
&terms.prefix=sola&terms.sort=count
&wt=velocity&v.template=suggest
        #foreach($t in $response.response.terms.suggest)
        $t.key
        #end
Suggest schema
<fieldType name="suggestable" class="solr.TextField"
positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PatternReplaceFilterFactory"
             pattern="([^a-z])"
             replacement="" replace="all"/>
    <filter class="solr.StopFilterFactory"
             ignoreCase="true" words="stopwords.txt"
             enablePositionIncrements="true" />
  </analyzer>
</fieldType>

...

<field name="suggest" type="suggestable"
       indexed="true" stored="false" multiValued="true"/>
Custom pages


• Document detail page
• Multiple query intersection comparison
  with Venn visualization
Document detail
http://localhost:8983/solr/data.gov/document
?id=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttp%2Fwww.data.gov%2Fdetails%2F61
Document detail detail
    solrconfig.xml
    <requestHandler name="/data.gov/document" class="solr.SearchHandler">
      <lst name="defaults">
        <str name="wt">velocity</str>
        <str name="v.template">document</str>
        <str name="v.layout">layout</str>
        <str name="title">Data.gov data set</str>
        <str name="q">{!raw f=id v=$id}</str>
      </lst>
    </requestHandler>
document.vm
#set($doc= $response.results.get(0))
<span><a href="$doc.getFieldValue('id')">$doc.getFieldValue('id')</a></span>

<table>
#foreach($fieldname in $doc.fieldNames)
  <tr>
     <td>$fieldname:</td>
     <td>
      #foreach($value in $doc.getFieldValues($fieldname))
        $esc.html($value)
      #end
      </td>
  </tr>
#end
</table>
Query intersection

• Just showing off.... how easy it is to do
  something with a bit of visual impact
• Compare three independent queries,
  intersecting them in a Venn diagram
  visualization
Rapid Prototyping with Solr
Compare static page
solrconfig.xml
<requestHandler name="/data.gov/compare" class="solr.DumpRequestHandler">
  <lst name="defaults">
    <str name="wt">velocity</str>
    <str name="v.template">compare</str>
    <str name="v.layout">layout</str>
    <str name="title">Data.gov Query Comparison</str>
  </lst>
</requestHandler> compare.vm
                  <script type="text/javascript">
                    function generate_venn() {
                      var a=encodeURIComponent($("#a").val());
                      var b=encodeURIComponent($("#b").val());
                      var c=encodeURIComponent($("#c").val());
                      var ab='('+a+')+AND+('+b+')';
                      var ac='('+a+')+AND+('+c+')';
                      var bc='('+b+')+AND+('+c+')';
                      var abc='('+a+')+AND+('+b+')+AND+('+c+')';
                     $('#venn').load('/solr/select?
                  q=*:*&wt=velocity&v.template=venn&rows=0&facet=on&facet.query={!key=a}'+a+'&facet.query={!key=b}'+b
                  +'&facet.query={!key=c}'+c+'&facet.query={!key=intersect_ab}'+ab+'&facet.query={!key=intersect_ac}'+ac
                  +'&facet.query={!key=intersect_bc}'+bc+'&facet.query={!key=intersect_abc}'+abc+'&q_a='+a+'&q_b='+b+'&q_c='+c
                  +'&q_ab='+ab+'&q_ac='+ac+'&q_bc='+bc+'&q_abc='+abc);
                      return false;
                    }
                  </script>
                  <form action="#" id="compare_form" onsubmit="return generate_venn()">
                    A: <input type="text" name="a" id="a" value="health"/>
                    B: <input type="text" name="b" id="b" value="weather"/>
                    C: <input type="text" name="c" id="c" value="ozone"/>
                    <input type="submit"/>
                  </form>
                  <div id="venn"></div>
Venn chart
venn.vm
#set($values = $response.response.facet_counts.facet_queries)
#set($params = $response.responseHeader.params)

<img src="http://chart.apis.google.com/chart?
chs=600x400&cht=v&chd=t:$values.a,$values.b,$values.c,
$values.intersect_ab,$values.intersect_ac,$values.intersect_bc,
$values.intersect_abc&chdl=$esc.url($params.q_a)|$esc.url
($params.q_b)|$esc.url($params.q_c)"/>
<ul>
  <li>A: <a href="/solr/data.gov?q={!lucene}$params.q_a">$params.q_a</a> ($values.a)</li>
  <li>B: <a href="/solr/data.gov?q={!lucene}$params.q_b">$params.q_b</a> ($values.b)</li>
  <li>C: <a href="/solr/data.gov?q={!lucene}$params.q_c">$params.q_c</a> ($values.c)</li>
  <li>A&B: <a href="/solr/data.gov?q={!lucene}$params.q_ab">$params.q_ab</a>
($values.intersect_ab)</li>
  <li>A&C: <a href="/solr/data.gov?q={!lucene}$params.q_ac">$params.q_ac</a>
($values.intersect_ac)</li>
  <li>B&C: <a href="/solr/data.gov?q={!lucene}$params.q_bc">$params.q_bc</a>
($values.intersect_bc)</li>
  <li>A&B&C: <a href="/solr/data.gov?q={!lucene}$params.q_abc">$params.q_abc</a>
($values.intersect_abc)</li>
</ul>
Solritas
•   Pronounced: so-LAIR-uh-toss

•   Celeritas is a Latin word, translated as "swiftness" or
    "speed". It is often given as the origin of the symbol c,
    the universal notation for the speed of light - http://
    en.wikipedia.org/wiki/Celeritas

•   VelocityResponseWriter - simply passes the Solr
    response through the Apache Velocity templating
    engine

•   http://wiki.apache.org/solr/VelocityResponseWriter
Solr Flare
• Ruby on Rails plugin
• facet field detection, autosuggest, saved
  search, inverted facets, pie charts, Simile
  Timeline and Exhibit integration
• Useful for rapid prototyping
• See Flare's big brother, Blacklight, for
  production quality
Tang on Flare
• UVA radiation = blacklight
• libraries are much more than books
• opinionated
  • Ruby on Rails: best choice for an
    extensible user interface development
    framework
Blacklight @ UVa
Blacklight @ Stanford
Blacklight @ AgNIC
Prototyping Tools

• CSV update handler - /update/csv
• Schema Browser
• Solritas, Flare, Blacklight, or...
 • just HTML+JavaScript (wt=json)
Test

• Performance
• Scalability
• Relevance
• Automate all of the above, start baselines
  early, avoid regressions
Then what?
•   Script the indexing process: full & delta
•   Work with real users on actual needs
•   Integrate with production systems
•   Iterate on schema enhancements,
    configuration tweaks such as caching
•   Deploy to staging/production environments
    and work at scale: collection size, real queries
    and performance, hardware and JVM settings
LucidFind




http://www.lucidimagination.com/search/?q=user+interface
For more information...
•   http://www.lucidimagination.com

•   LucidFind

    •   search Lucene ecosystem: mailing lists, wikis, JIRA, etc

    •   http://search.lucidimagination.com

•   Getting started with LucidWorks Enterprise:

    •   http://www.lucidimagination.com/products/
        lucidworks-search-platform/enterprise

•   http://lucene.apache.org/solr - wiki, e-mail lists
Thank You!

More Related Content

PDF
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Erik Hatcher
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PDF
Lucene's Latest (for Libraries)
Erik Hatcher
 
PDF
Solr 4
Erik Hatcher
 
PDF
Solr Black Belt Pre-conference
Erik Hatcher
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PDF
Solr Recipes Workshop
Erik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Erik Hatcher
 
Lucene for Solr Developers
Erik Hatcher
 
Lucene's Latest (for Libraries)
Erik Hatcher
 
Solr 4
Erik Hatcher
 
Solr Black Belt Pre-conference
Erik Hatcher
 
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Erik Hatcher
 
Solr Recipes Workshop
Erik Hatcher
 

What's hot (20)

PDF
Solr Recipes
Erik Hatcher
 
PDF
Integrating the Solr search engine
th0masr
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
PDF
Apache Solr Workshop
Saumitra Srivastav
 
PDF
Solr Application Development Tutorial
Erik Hatcher
 
PPTX
Apache Solr
Minh Tran
 
PDF
Solr Query Parsing
Erik Hatcher
 
PDF
Building your own search engine with Apache Solr
Biogeeks
 
PDF
Apache Solr crash course
Tommaso Teofili
 
PPTX
Introduction to Apache Lucene/Solr
Rahul Jain
 
PDF
Get the most out of Solr search with PHP
Paul Borgermans
 
PDF
Introduction to Solr
Erik Hatcher
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PPTX
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
PDF
Solr Powered Lucene
Erik Hatcher
 
PPT
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
PPT
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
PPTX
Introduction to Apache Solr
Andy Jackson
 
PPTX
Tutorial on developing a Solr search component plugin
searchbox-com
 
PPTX
Solr 6 Feature Preview
Yonik Seeley
 
Solr Recipes
Erik Hatcher
 
Integrating the Solr search engine
th0masr
 
Rapid Prototyping with Solr
Erik Hatcher
 
Apache Solr Workshop
Saumitra Srivastav
 
Solr Application Development Tutorial
Erik Hatcher
 
Apache Solr
Minh Tran
 
Solr Query Parsing
Erik Hatcher
 
Building your own search engine with Apache Solr
Biogeeks
 
Apache Solr crash course
Tommaso Teofili
 
Introduction to Apache Lucene/Solr
Rahul Jain
 
Get the most out of Solr search with PHP
Paul Borgermans
 
Introduction to Solr
Erik Hatcher
 
Lucene for Solr Developers
Erik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Solr Powered Lucene
Erik Hatcher
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
Introduction to Apache Solr
Andy Jackson
 
Tutorial on developing a Solr search component plugin
searchbox-com
 
Solr 6 Feature Preview
Yonik Seeley
 
Ad

Viewers also liked (11)

PDF
Lucene for Solr Developers
Erik Hatcher
 
PDF
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Lucidworks
 
PDF
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
Lucidworks
 
PDF
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 
PDF
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...
Lucidworks
 
PDF
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucidworks
 
PDF
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Lucidworks
 
PDF
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
PDF
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
PDF
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
PDF
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Lucene for Solr Developers
Erik Hatcher
 
Deep Data at Macy's - Searching Hierarchichal Documents for eCommerce Merchan...
Lucidworks
 
Multi-language Content Discovery Through Entity Driven Search: Presented by A...
Lucidworks
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...
Lucidworks
 
Lucene/Solr Revolution 2015 Opening Keynote with Lucidworks CEO Will Hayes
Lucidworks
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Lucidworks
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Lucidworks
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Ad

Similar to Rapid Prototyping with Solr (20)

PDF
Introduction to Solr
Erik Hatcher
 
PPTX
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
PDF
Rapid Prototyping with Solr
Erik Hatcher
 
PDF
Solr search engine with multiple table relation
Jay Bharat
 
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
KEY
Apache Solr - Enterprise search platform
Tommaso Teofili
 
PDF
Rapid prototyping with solr - By Erik Hatcher
lucenerevolution
 
PDF
Rapid Prototyping with Solr
Lucidworks (Archived)
 
PDF
Search Engine-Building with Lucene and Solr
Kai Chan
 
KEY
Solr 101
Findwise
 
PPTX
Intro to Apache Lucene and Solr
Grant Ingersoll
 
PPTX
Solr introduction
Lap Tran
 
PPTX
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
PDF
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
ODP
Solr: Enterprise Search Server
Armen Polischuk
 
PDF
Apache solr liferay
Binesh Gummadi
 
PPTX
Apache Solr - search for everyone!
Jaran Flaath
 
PDF
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
PPTX
20130310 solr tuorial
Chris Huang
 
PDF
Small wins in a small time with Apache Solr
Sourcesense
 
Introduction to Solr
Erik Hatcher
 
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Rapid Prototyping with Solr
Erik Hatcher
 
Solr search engine with multiple table relation
Jay Bharat
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Apache Solr - Enterprise search platform
Tommaso Teofili
 
Rapid prototyping with solr - By Erik Hatcher
lucenerevolution
 
Rapid Prototyping with Solr
Lucidworks (Archived)
 
Search Engine-Building with Lucene and Solr
Kai Chan
 
Solr 101
Findwise
 
Intro to Apache Lucene and Solr
Grant Ingersoll
 
Solr introduction
Lap Tran
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
Solr: Enterprise Search Server
Armen Polischuk
 
Apache solr liferay
Binesh Gummadi
 
Apache Solr - search for everyone!
Jaran Flaath
 
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
20130310 solr tuorial
Chris Huang
 
Small wins in a small time with Apache Solr
Sourcesense
 

More from Erik Hatcher (12)

PDF
Ted Talk
Erik Hatcher
 
PDF
Solr Payloads
Erik Hatcher
 
PDF
it's just search
Erik Hatcher
 
PDF
Solr Indexing and Analysis Tricks
Erik Hatcher
 
PDF
Solr Powered Libraries
Erik Hatcher
 
PDF
"Solr Update" at code4lib '13 - Chicago
Erik Hatcher
 
PDF
Query Parsing - Tips and Tricks
Erik Hatcher
 
PDF
Solr Flair
Erik Hatcher
 
PDF
Introduction to Solr
Erik Hatcher
 
PDF
What's New in Solr 3.x / 4.0
Erik Hatcher
 
PDF
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Erik Hatcher
 
PDF
Solr Flair: Search User Interfaces Powered by Apache Solr
Erik Hatcher
 
Ted Talk
Erik Hatcher
 
Solr Payloads
Erik Hatcher
 
it's just search
Erik Hatcher
 
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Solr Powered Libraries
Erik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
Erik Hatcher
 
Query Parsing - Tips and Tricks
Erik Hatcher
 
Solr Flair
Erik Hatcher
 
Introduction to Solr
Erik Hatcher
 
What's New in Solr 3.x / 4.0
Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Erik Hatcher
 

Recently uploaded (20)

PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 

Rapid Prototyping with Solr

  • 1. Rapid Prototyping with Solr uberconf - July 14, 2011 Presented by Erik Hatcher [email protected] Lucid Imagination http://www.lucidimagination.com
  • 2. About me... • Co-author, “Lucene in Action” • Commiter, Lucene and Solr • Lucene PMC and ASF member • Member of Technical Staff / co-founder, Lucid Imagination
  • 3. About Lucid Imagination... • Lucid Imagination provides commercial-grade support, training, high-level consulting and value- added software for Lucene and Solr. • We make Lucene ‘enterprise-ready’ by offering: • Free, certified, distributions and downloads. • Support, training, and consulting. • LucidWorks Enterprise, a commercial search platform built on top of Solr.
  • 4. Abstract Got data? Let's make it searchable! Rapid Prototyping with Solr will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how to showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
  • 5. What is Lucene? • An open source Java-based IR library with best practice indexing and query capabilities, fast and lightweight search and indexing. • 100% Java (.NET, Perl and other versions too). • Stable, mature API. • Continuously improved and tuned over more than 10 years. • Cleanly implemented, easy to embed in an application. • Compact, portable index representation. • Programmable text analyzers, spell checking and highlighting. • Not a crawler or a text extraction tool.
  • 6. Lucene's History • Created by Doug Cutting in 1999 • built on ideas from search projects Doug created at Xerox PARC and Apple. • Donated to the Apache Software Foundation (ASF) in 2001. • Became an Apache top-level project in 2005. • Has grown and morphed through the years and is now both: • A search library. • An ASF Top-Level Project (TLP) encompassing several sub-projects. • Lucene and Solr "merged" development in early 2010.
  • 7. What is Solr? • An open source search engine. • Indexes content sources, processes query requests, returns search results. • Uses Lucene as the "engine", but adds full enterprise search server features and capabilities. • A web-based application that processes HTTP requests and returns HTTP responses. • Initially started in 2004 and developed by CNET as an in-house project to add search capability for the company website. • Donated to ASF in 2006.
  • 8. What Version of Solr? • There’s more than one answer! • The current, released, stable version is 3.3 • The development release is referred to as “trunk”. • This is where the new, less tested work goes on • Also referred to as 4.0 • LucidWorks Enterprise is built on a trunk snapshot + additional features.
  • 9. Why prototype? • Demonstrate Solr can handle your needs • Stake/purse-holder buy-in • It's quick, easy, and fun! • The User Interface is the app
  • 10. Workflow • Ingest data • Use • Refine config/interactions, repeat
  • 11. Got Data? • Rich text files? • Databases? • Feeds (Atom/RSS/XML)? • 3rd party repositories? (SharePoint, Documentum, ...) • CSV!!!!
  • 13. Getting Started • Download Solr • http://lucene.apache.org/solr • "Install" it • unzip or tar -xvf • Start it • cd example; java -jar start.jar
  • 14. e.g. Conference Attendees First Name,Last Name,Company,Title,Work Country Erik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA . . .
  • 16. Dynamic Fields <dynamicField name="*_s" type="string" indexed="true" stored="true"/> <dynamicField name="*_t" type="text" indexed="true" stored="true"/>
  • 18. uniqueKey • Optional, Solr-specific, feature • generally "string" type • schema.xml: <uniqueKey>id</uniqueKey> • adds of existing id'd documents updates (delete + add)
  • 19. id curl "http://localhost:8983/solr/update/csv ?stream.file=attendees.csv &fieldnames=first_s, id,company_s,title_t,co untry_s&header=true" <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">40</int> </lst> </response>
  • 20. Tada!
  • 21. Schema tinkering • Removed all example field definitions • Uncomment and adjust catch-all dynamic field: • <dynamicField name="*" type="string" multiValued="false"/> • Ensure uniqueKey is appropriate • unusual in this example, disabled it • Make every document/field fully searchable! • <copyField source="*" dest="text"/>
  • 22. After adjusting config... • Restart Solr • Or... reload the core (when in multicore mode)
  • 23. Clean import # Delete all documents curl "http://localhost:8983/solr/update?stream.body= %3Cdelete%3E%3Cquery %3E*:*%3C/query%3E%3C/delete %3E&commit=true" # Index your data curl "http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=first ,last, company,title,country&header=true"
  • 25. Value Normalization • http://localhost:8983/solr/update/csv? commit=true&stream.file=attendees.csv&fi eldnames=first,last,company,title,country&h eader=true&f.country.map=Great +Britain:United+Kingdom
  • 26. Polishing • Customize request handler mappings • Edit templates • hit display • header/footer • style
  • 27. /browse <requestHandler name="/browse" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str><str name="v.layout">layout</str> <str name="v.template">browse</str> <str name="rows">10</str><str name="fl">*,score</str> <str name="defType">lucene</str><str name="q">*:*</str> <str name="debugQuery">true</str> <str name="hl">on</str><str name="hl.fl">title</str> <str name="hl.fragsize">0</str> <str name="hl.alternateField">title</str> <str name="facet">on</str> <str name="facet.mincount">1</str> <str name="facet.missing">true</str> </lst> <lst name="appends"><str name="facet.field">country</str></lst> </requestHandler>
  • 28. hit.vm <div class="result-document"> <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p> <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p> <p>$!doc.getFieldValue('country')</p> </div>
  • 30. Adding bells and whistles • jQuery • <script type="text/javascript" src="/solr/ admin/jquery.js"/> • TreeMap • <script type="text/javascript" src="/scripts/ treemap.js"/>
  • 31. TreeMap code <script type="text/javascript"> function onLoad() { jQuery("#treemap-country").treemap(640,480, {}); } </script> ---------------------------- <body onload="onLoad();"> ---------------------------- <table id="treemap-country"> #foreach($facet in $response.getFacetField('country').values) <tr> <td>#if($facet.name) $esc.html($facet.name)#else&lt;Unspecified&gt;#end</td> <td>$facet.count</td> <td>#if($facet.name)$esc.html($facet.name)#{else} Unspecified#end</td> </tr> #end </table>
  • 33. Ajax fun: giveaways • Add a "static" templated page • jQuery Ajax request • snippet templated output
  • 34. solrconfig.xml "static" page <requestHandler name="/giveaways" class="solr.DumpRequestHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">giveaways</str> <str name="v.layout">layout</str> </lst> </requestHandler> giveaways.vm <input type="button" value="Pick a Winner" onClick="javascript:$ ('#winner').load('/solr/ generate_winner?sort=random_' + new Date().getTime() + '+asc');"> <h2>And the winner is...</h2> <center><font size="20"><div id="winner"></div></font></center>
  • 35. fragment template solrconfig.xml <requestHandler name="/generate_winner" class="solr.SearchHandler"> <!-- sort=random_... required --> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">winner</str> <str name="rows">1</str> <str name="fl">first,last</str> <str name="defType">lucene</str> <str name="q">*:* -company:"Lucid Imagination" -company:"Stone Circle Productions"</str> </lst> </requestHandler> winner.vm #set($winner=$response.results.get(0)) $winner.getFieldValue('first') $winner.getFieldValue('last')
  • 36. And the winner is...
  • 38. Data.gov CSV catalog URL,Title,Agency,Subagency,Category,Date Released,Date Updated,Time Period,Frequency,Description,Data.gov Data Category Type,Specialized Data Category Designation,Keywords,Citation,Agency Program Page,Agency Data Series Page,Unit of Analysis,Granularity,Geographic Coverage,Collection Mode,Data Collection Instrument,Data Dictionary/Variable List,Applicable Agency Information Quality Guideline Designation,Data Quality Certification,Privacy and Confidentiality,Technical Documentation,Additional Metadata,FGDC Compliance (Geospatial Only),Statistical Methodology,Sampling,Estimation,Weighting,Disclosure Avoidance,Questionnaire Design,Series Breaks,Non-response Adjustment,Seasonal Adjustment,Statistical Characteristics,Feeds Access Point,Feeds File Size,XML Access Point,XML File Size,CSV/ TXT Access Point,CSV/TXT File Size,XLS Access Point,XLS File Size,KML/KMZ Access Point,KML File Size,ESRI Access Point,ESRI File Size,Map Access Point,Data Extraction Access Point,Widget Access Point "http://www.data.gov/details/4","Next Generation Radar (NEXRAD) Locations","Department of Commerce","National Oceanic and Atmospheric Administration","Geography and Environment","1991","Irregular as needed","1991 to present","Between 4 and 10 minutes","This geospatial rendering of weather radar sites gives access to an historical archive of Terminal Doppler Weather Radar data and is used primarily for research purposes. The archived data includes base data and derived products of the National Weather Service (NWS) Weather Surveillance Radar 88 Doppler (WSR-88D) next generation (NEXRAD) weather radar. Weather radar detects the three meteorological base data quantities: reflectivity, mean radial velocity, and spectrum width. From these quantities, computer processing generates numerous meteorological analysis products for forecasts, archiving and dissemination. There are 159 operational NEXRAD radar systems deployed throughout the United States and at selected overseas locations. At the Radar Operations Center (ROC) in Norman OK, personnel from the NWS, Air Force, Navy, and FAA use this distributed weather radar system to collect the data needed to warn of impending severe weather and possible flash floods; support air traffic safety and assist in the management of air traffic flow control; facilitate resource protection at military bases; and optimize the management of water, agriculture, forest, and snow removal. This data set is jointly owned by the National Oceanic and Atmospheric Administration, Federal Aviation Administration, and Department of Defense.","Raw Data Catalog",...
  • 41. Mapping field values • CSV update handler can map field values • &f.privacy_and_confidentiality.map=YES:Yes &f.data_quality_certification.map=YES:Yes
  • 42. Splitting keywords • CSV handler: f.keywords.split=true • stored values are split, multivalued • Or via schema • Stored value remains as in original, single valued <fieldType name="comma_separated" class="solr.TextField" omitNorms="true"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="s*,s*"/> </analyzer> </fieldType> ... <field name="keywords" type="comma_separated" indexed="true" stored="true"/>
  • 43. Suggest • Suggest terms as user types in search box • Technique: jQuery autocomplete, Solr’s TermsComponent,Velocity template http://localhost:8983/solr/terms ?terms.fl=suggest &terms.prefix=sola&terms.sort=count &wt=velocity&v.template=suggest #foreach($t in $response.response.terms.suggest) $t.key #end
  • 44. Suggest schema <fieldType name="suggestable" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> </analyzer> </fieldType> ... <field name="suggest" type="suggestable" indexed="true" stored="false" multiValued="true"/>
  • 45. Custom pages • Document detail page • Multiple query intersection comparison with Venn visualization
  • 47. Document detail detail solrconfig.xml <requestHandler name="/data.gov/document" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">document</str> <str name="v.layout">layout</str> <str name="title">Data.gov data set</str> <str name="q">{!raw f=id v=$id}</str> </lst> </requestHandler> document.vm #set($doc= $response.results.get(0)) <span><a href="$doc.getFieldValue('id')">$doc.getFieldValue('id')</a></span> <table> #foreach($fieldname in $doc.fieldNames) <tr> <td>$fieldname:</td> <td> #foreach($value in $doc.getFieldValues($fieldname)) $esc.html($value) #end </td> </tr> #end </table>
  • 48. Query intersection • Just showing off.... how easy it is to do something with a bit of visual impact • Compare three independent queries, intersecting them in a Venn diagram visualization
  • 50. Compare static page solrconfig.xml <requestHandler name="/data.gov/compare" class="solr.DumpRequestHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">compare</str> <str name="v.layout">layout</str> <str name="title">Data.gov Query Comparison</str> </lst> </requestHandler> compare.vm <script type="text/javascript"> function generate_venn() { var a=encodeURIComponent($("#a").val()); var b=encodeURIComponent($("#b").val()); var c=encodeURIComponent($("#c").val()); var ab='('+a+')+AND+('+b+')'; var ac='('+a+')+AND+('+c+')'; var bc='('+b+')+AND+('+c+')'; var abc='('+a+')+AND+('+b+')+AND+('+c+')'; $('#venn').load('/solr/select? q=*:*&wt=velocity&v.template=venn&rows=0&facet=on&facet.query={!key=a}'+a+'&facet.query={!key=b}'+b +'&facet.query={!key=c}'+c+'&facet.query={!key=intersect_ab}'+ab+'&facet.query={!key=intersect_ac}'+ac +'&facet.query={!key=intersect_bc}'+bc+'&facet.query={!key=intersect_abc}'+abc+'&q_a='+a+'&q_b='+b+'&q_c='+c +'&q_ab='+ab+'&q_ac='+ac+'&q_bc='+bc+'&q_abc='+abc); return false; } </script> <form action="#" id="compare_form" onsubmit="return generate_venn()"> A: <input type="text" name="a" id="a" value="health"/> B: <input type="text" name="b" id="b" value="weather"/> C: <input type="text" name="c" id="c" value="ozone"/> <input type="submit"/> </form> <div id="venn"></div>
  • 51. Venn chart venn.vm #set($values = $response.response.facet_counts.facet_queries) #set($params = $response.responseHeader.params) <img src="http://chart.apis.google.com/chart? chs=600x400&cht=v&chd=t:$values.a,$values.b,$values.c, $values.intersect_ab,$values.intersect_ac,$values.intersect_bc, $values.intersect_abc&chdl=$esc.url($params.q_a)|$esc.url ($params.q_b)|$esc.url($params.q_c)"/> <ul> <li>A: <a href="/solr/data.gov?q={!lucene}$params.q_a">$params.q_a</a> ($values.a)</li> <li>B: <a href="/solr/data.gov?q={!lucene}$params.q_b">$params.q_b</a> ($values.b)</li> <li>C: <a href="/solr/data.gov?q={!lucene}$params.q_c">$params.q_c</a> ($values.c)</li> <li>A&B: <a href="/solr/data.gov?q={!lucene}$params.q_ab">$params.q_ab</a> ($values.intersect_ab)</li> <li>A&C: <a href="/solr/data.gov?q={!lucene}$params.q_ac">$params.q_ac</a> ($values.intersect_ac)</li> <li>B&C: <a href="/solr/data.gov?q={!lucene}$params.q_bc">$params.q_bc</a> ($values.intersect_bc)</li> <li>A&B&C: <a href="/solr/data.gov?q={!lucene}$params.q_abc">$params.q_abc</a> ($values.intersect_abc)</li> </ul>
  • 52. Solritas • Pronounced: so-LAIR-uh-toss • Celeritas is a Latin word, translated as "swiftness" or "speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas • VelocityResponseWriter - simply passes the Solr response through the Apache Velocity templating engine • http://wiki.apache.org/solr/VelocityResponseWriter
  • 53. Solr Flare • Ruby on Rails plugin • facet field detection, autosuggest, saved search, inverted facets, pie charts, Simile Timeline and Exhibit integration • Useful for rapid prototyping • See Flare's big brother, Blacklight, for production quality
  • 55. • UVA radiation = blacklight • libraries are much more than books • opinionated • Ruby on Rails: best choice for an extensible user interface development framework
  • 59. Prototyping Tools • CSV update handler - /update/csv • Schema Browser • Solritas, Flare, Blacklight, or... • just HTML+JavaScript (wt=json)
  • 60. Test • Performance • Scalability • Relevance • Automate all of the above, start baselines early, avoid regressions
  • 61. Then what? • Script the indexing process: full & delta • Work with real users on actual needs • Integrate with production systems • Iterate on schema enhancements, configuration tweaks such as caching • Deploy to staging/production environments and work at scale: collection size, real queries and performance, hardware and JVM settings
  • 63. For more information... • http://www.lucidimagination.com • LucidFind • search Lucene ecosystem: mailing lists, wikis, JIRA, etc • http://search.lucidimagination.com • Getting started with LucidWorks Enterprise: • http://www.lucidimagination.com/products/ lucidworks-search-platform/enterprise • http://lucene.apache.org/solr - wiki, e-mail lists