SlideShare a Scribd company logo
Building an open-source based search solution –
                  first steps

                      Roman Kern

             Institute of Knowledge Management
                Graz University of Technology
                       Know-Center Graz
            rkern@tugraz.at, rkern@know-center.at


          Data Science Meetup / 2012-04-12
Overview           Graz University of Technology




  Motivation


  Background


  Solr Ecosystem


  Solr Features


  Conclusions




                                                   2 / 28
Motivation                                              Graz University of Technology




   Search
       Change in users expectations
       Missing, sub-optimal search causes frustration

   Science
       Information retrieval
       Success story
       Mostly focused on web search

   Industry
       Enterprise search
       Heterogeneous data sources

                                                                                        3 / 28
Background of the Speaker                      Graz University of Technology




http://a1.net




                            http://wissen.de
                                                                               4 / 28
Apache Lucene Umbrella Project                          Graz University of Technology




   Components
       Search engine ⇒ Lucene
       Search server ⇒ Solr
       Web search engine ⇒ Nutch
       Lightweight crawler ⇒ Droids
       File-format parsing ⇒ Tika
       Communicate with CMS ⇒ ManifoldCF
       Distributed coordination ⇒ ZooKeeper
       Natural language processing ⇒ OpenNLP
       Related projects: Hadoop, Mahout, Carrot2, ...

   Common aspects
   Apache license, implemented in Java, community
                                                                                        5 / 28
Lucene                                Graz University of Technology




  Search Engine Library
         Java API
             Only for expert users
         Search-Index
             File-system
             In-memory index
         Advanced features
             Incremental indexing
             Update while searching
         Base for many projects
             Solr
             ir-lib
             elasticsearch
         LIA (Lucene in Action)


  http://lucene.apache.org/core/                                      6 / 28
Nutch                                       Graz University of Technology




  Web search engine
        Builds upon Solr
        Web crawler
            Link database, crawl database
        Distributed
            Runs on Hadoop
        Mode of operation
            Crawl a single domain
            Crawl the web with seed sites



  http://nutch.apache.org/




                                                                            7 / 28
Droids                                   Graz University of Technology




   Crawler component
         Lightweight crawler
         Main features
             Throttling
             Multi-threaded
             Well behaved (robots.txt)



   http://incubator.apache.org/droids/




                                                                         8 / 28
Tika                                               Graz University of Technology




   Text extraction
        Text & meta-data
        File-formats
             Office
                  Microsoft Formats (Apache POI)
                  OpenDocument
             Common text formats
                  PDF (PDFBox)
                  HTML (tagsoup)
             Non-text
                  Images
                  Sound



   http://tika.apache.org/


                                                                                   9 / 28
ManifoldCF                                  Graz University of Technology




  Content Management System Connectors
       Communicate with CMS/DMS
       Connectors
            FileNet P8 (IBM)
            Documentum (EMC)
            LiveLink (OpenText)
            Meridio (Autonomy)
            Windows shares (Microsoft)
            SharePoint (Microsoft)
            More: Alfresco, JDBC, ...
       Data is then stored and indexed
            e.g. Solr



  http://incubator.apache.org/connectors/


                                                                            10 / 28
ZooKeeper                        Graz University of Technology




  Distributed coordination
       Orchestrate servers
       Distributed
            Configuration
            Name lookup
            Synchronization




  http://zookeeper.apache.org/
                                                                 11 / 28
OpenNLP                                                      Graz University of Technology




  Natural language processing
       Process plain text
       Maximum entropy classification with beam search
       Models
            Sentence splitting
            Token splitting
            Part-of-speech (POS) tagging
            Named entity recognition
            more: chunker, parser, co-reference resolution



  http://opennlp.sourceforge.net/




                                                                                             12 / 28
Hadoop                                                  Graz University of Technology




  Distributed computing
       Scale out framework
       Distributed file-system
            Data is partitioned
            Stored on multiple nodes
       Map/Reduce paradigm
            Map your algorithms to mappers & reducers



  Related projects: HBase, Pig, Hive, ...


  http://hadoop.apache.org/



                                                                                        13 / 28
Mahout                            Graz University of Technology




  Distributed machine learning
       Scale out framework
       Machine learning
            Recommender systems
            Clustering
            Classification
       Integration
            Standalone
            Hadoop
            Amazon EC2



  http://mahout.apache.org/



                                                                  14 / 28
Details   Graz University of Technology




                                          15 / 28
Search Server                                          Graz University of Technology




   What Solr is
       Web-Service
       Full-text indexing & search
       Support to store arbitrary content

   What Solr isn’t
       Solr = grep
       Database
            But, somehow similar to No-SQL databases

   Solr vs. IR-Lib
       Solr: easy to use, easy to integrate, XML configuration
       IR-Lib: expert knowledge to use, Java configuration, fast

                                                                                       16 / 28
Index Structure                                        Graz University of Technology




   Inverted Index
       Dictionary of words (terms)
       Map from term to document

   Document
       List of fields
       Input fields are them mapped according to the schema

   Field-types
       Defined in the schema
       Type (string, boolean, date, number) - internally mapped to
       string


                                                                                       17 / 28
Index Management                                         Graz University of Technology




  API
        HTTP Server
        Various formats (XML, binary, JavaScript, ...)

  Document life-cycle
        There is no update
        Delete (done automatically by Solr)
        Insert
        Implications
            An unique id is necessary
            Use batch updates
        Commit, rollback (and optimize)


                                                                                         18 / 28
Input Handling                                             Graz University of Technology




   Different input formats
       XML
       CSV
       JDBC (database)
            DIH (data import handler)
            Support incremental updates (via timestamps)
       Solr Cell
            Binary content
            Apache Tika
            Text content and metadata




                                                                                           19 / 28
Text Processing                                     Graz University of Technology




   Scope
       During indexing & query

   Tokenization
       Split text into tokens
       Lower-case alignment
       Stemming (e.g. ponies, pony ⇒ poni, triplicate ⇒
       triplic, ...)
       Synonyms (via Thesaurus)
       Stop-word filtering
       Multi-word splitting (e.g. Wi-Fi ⇒ Wi, Fi)
       n-grams, soundex, umlauts


                                                                                    20 / 28
Query Processing                                               Graz University of Technology




   Query parsers
        Lucene query parser (rich syntax)
              AND, OR, NOT, range queries, wildcards, fuzzy query, phrase
              query
              Boosting of individual parts
              Example: ((boltzmann OR schroedinger) NOT einstein)
        Dismax query parser
              No query syntax
              Searches over multiple fields (separate boost for each field)
              Configure the amount of terms to be mandatory
              Distance between terms is used for ranking (phrase boosting)


   Dismax is a good starting point, but may become expensive




                                                                                               21 / 28
Search Features               Graz University of Technology




   Query filter
       Additional query
       No impact on ranking
       Results are cached

   Boosting query
       Only in Dismax

   Query elevation
       Fix certain queries

   Request handler
       Pre-define clauses
       Invariants
                                                              22 / 28
Search Result                                               Graz University of Technology




   Ranking
       Relevance
       Sort on field value (only single term per document)

   Available data & features
       Sequence of IDs & score
       Stored fields
       Snippets (plus highlighting)
       Facets
             Count the search hits
             Types: field value, dates, queries
             Sort, prefix, ...
             Could be used for term suggestion (aka. query suggestion)
       Field collapsing (grouping)
       Spell checking (did-you-mean)
                                                                                            23 / 28
Additional Solr Features                Graz University of Technology




   Query by Example
       More like this

   Stats
       Per field
       Min, max, sum, missing, ...

   Admin-GUI
       Webapp to troubleshoot queries
       Browse schema

   JMX
       Read properties & statistics
       Can be accessed remotely
                                                                        24 / 28
Integration                               Graz University of Technology




   Deployment
       Within a web application server
       Embedded

   Monitor
       Log output

   Access
       Various language bindings
       Java, Ruby, JavaScript, PHP, ...



                                                                          25 / 28
Multi-core                                           Graz University of Technology




   Multiple indices
       Each index has its own configuration

   Operations
       Reload (when configuration has been changed)
       Rename
       Swap
       Merge
       Create, Status




                                                                                     26 / 28
Scale Solr                       Graz University of Technology




   Replication
       Master and slaves nodes
       Replication
       Slaves poll master

   Dispatch search request
       Load balancer




                                                                 27 / 28
Sharding Indexes                                       Graz University of Technology




   Single index
       Index spawned over multiple machines
       Search is done in parallel

   Mapping
       Application has to provide a deterministic mapping
       Document ⇒ index




                                                                                       28 / 28
Conclusions                                           Graz University of Technology




   Ecosystem
          Vivid community
          Corporative backing

   Solr
          Easy to get started
          Hard to optimize for specific requirements




                                                                                      29 / 28
The End        Graz University of Technology




  Thank you!




                                               30 / 28

More Related Content

PDF
Glossário de inglês técnico para informática
esb2174
 
PPSX
Primo at the University of Amsterdam - Technology vs. Real Life
Lukas Koster
 
PPTX
247th ACS Meeting: Experiment Markup Language (ExptML)
Stuart Chalk
 
PPTX
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
PPTX
247th ACS Meeting: The Eureka Research Workbench
Stuart Chalk
 
PDF
eScience Cluster Arch. Overview
Francesco Bongiovanni
 
PDF
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
datascience_at
 
PDF
What's Next in Growth? 2016
Andrew Chen
 
Glossário de inglês técnico para informática
esb2174
 
Primo at the University of Amsterdam - Technology vs. Real Life
Lukas Koster
 
247th ACS Meeting: Experiment Markup Language (ExptML)
Stuart Chalk
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
247th ACS Meeting: The Eureka Research Workbench
Stuart Chalk
 
eScience Cluster Arch. Overview
Francesco Bongiovanni
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
datascience_at
 
What's Next in Growth? 2016
Andrew Chen
 

Similar to DataScience Meeting II - Roman Kern - Building an open source based search solution - first steps (20)

PPTX
OpenSearchLab and the Lucene Ecosystem
Grant Ingersoll
 
PDF
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
PPT
Νetworking content repositories to provide meaningful services to users
Nikos Manouselis
 
PPTX
Leveraging Solr and Mahout
Grant Ingersoll
 
PPT
Metadata-powered dissemination of content
Nikos Manouselis
 
KEY
From legacy, to batch, to near real-time
Marc Sturlese
 
KEY
Fostering Synergies - How Semantic Web Technology could influence Software Re...
Michael Würsch
 
PDF
Interoperability Requirements for a Sustainable Component to Support Manageme...
Martin Memmel
 
KEY
Preliminary committee presentation
Richard Drake
 
PPTX
Inside Wordnik's Architecture
Tony Tam
 
PDF
NetIKX Semantic Search Presentation
urvics
 
KEY
From legacy, to batch, to near real-time
Dani Solà Lagares
 
PPTX
Apache Lucene 4
Grant Ingersoll
 
PDF
Scientific Social Objects
seanb
 
PDF
Dynamic and repeatable transformation of existing Thesauri and Authority list...
DESTIN-Informatique.com
 
PDF
Data-Intensive Text Processing with MapReduce
George Ang
 
PDF
Data-Intensive Text Processing with MapReduce
George Ang
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PPT
If we build it will they come? BOSC2012 Keynote Goble
Carole Goble
 
OpenSearchLab and the Lucene Ecosystem
Grant Ingersoll
 
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
Νetworking content repositories to provide meaningful services to users
Nikos Manouselis
 
Leveraging Solr and Mahout
Grant Ingersoll
 
Metadata-powered dissemination of content
Nikos Manouselis
 
From legacy, to batch, to near real-time
Marc Sturlese
 
Fostering Synergies - How Semantic Web Technology could influence Software Re...
Michael Würsch
 
Interoperability Requirements for a Sustainable Component to Support Manageme...
Martin Memmel
 
Preliminary committee presentation
Richard Drake
 
Inside Wordnik's Architecture
Tony Tam
 
NetIKX Semantic Search Presentation
urvics
 
From legacy, to batch, to near real-time
Dani Solà Lagares
 
Apache Lucene 4
Grant Ingersoll
 
Scientific Social Objects
seanb
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
DESTIN-Informatique.com
 
Data-Intensive Text Processing with MapReduce
George Ang
 
Data-Intensive Text Processing with MapReduce
George Ang
 
Lucene for Solr Developers
Erik Hatcher
 
If we build it will they come? BOSC2012 Keynote Goble
Carole Goble
 
Ad

Recently uploaded (20)

PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Software Development Methodologies in 2025
KodekX
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Doc9.....................................
SofiaCollazos
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Ad

DataScience Meeting II - Roman Kern - Building an open source based search solution - first steps

  • 1. Building an open-source based search solution – first steps Roman Kern Institute of Knowledge Management Graz University of Technology Know-Center Graz [email protected], [email protected] Data Science Meetup / 2012-04-12
  • 2. Overview Graz University of Technology Motivation Background Solr Ecosystem Solr Features Conclusions 2 / 28
  • 3. Motivation Graz University of Technology Search Change in users expectations Missing, sub-optimal search causes frustration Science Information retrieval Success story Mostly focused on web search Industry Enterprise search Heterogeneous data sources 3 / 28
  • 4. Background of the Speaker Graz University of Technology http://a1.net http://wissen.de 4 / 28
  • 5. Apache Lucene Umbrella Project Graz University of Technology Components Search engine ⇒ Lucene Search server ⇒ Solr Web search engine ⇒ Nutch Lightweight crawler ⇒ Droids File-format parsing ⇒ Tika Communicate with CMS ⇒ ManifoldCF Distributed coordination ⇒ ZooKeeper Natural language processing ⇒ OpenNLP Related projects: Hadoop, Mahout, Carrot2, ... Common aspects Apache license, implemented in Java, community 5 / 28
  • 6. Lucene Graz University of Technology Search Engine Library Java API Only for expert users Search-Index File-system In-memory index Advanced features Incremental indexing Update while searching Base for many projects Solr ir-lib elasticsearch LIA (Lucene in Action) http://lucene.apache.org/core/ 6 / 28
  • 7. Nutch Graz University of Technology Web search engine Builds upon Solr Web crawler Link database, crawl database Distributed Runs on Hadoop Mode of operation Crawl a single domain Crawl the web with seed sites http://nutch.apache.org/ 7 / 28
  • 8. Droids Graz University of Technology Crawler component Lightweight crawler Main features Throttling Multi-threaded Well behaved (robots.txt) http://incubator.apache.org/droids/ 8 / 28
  • 9. Tika Graz University of Technology Text extraction Text & meta-data File-formats Office Microsoft Formats (Apache POI) OpenDocument Common text formats PDF (PDFBox) HTML (tagsoup) Non-text Images Sound http://tika.apache.org/ 9 / 28
  • 10. ManifoldCF Graz University of Technology Content Management System Connectors Communicate with CMS/DMS Connectors FileNet P8 (IBM) Documentum (EMC) LiveLink (OpenText) Meridio (Autonomy) Windows shares (Microsoft) SharePoint (Microsoft) More: Alfresco, JDBC, ... Data is then stored and indexed e.g. Solr http://incubator.apache.org/connectors/ 10 / 28
  • 11. ZooKeeper Graz University of Technology Distributed coordination Orchestrate servers Distributed Configuration Name lookup Synchronization http://zookeeper.apache.org/ 11 / 28
  • 12. OpenNLP Graz University of Technology Natural language processing Process plain text Maximum entropy classification with beam search Models Sentence splitting Token splitting Part-of-speech (POS) tagging Named entity recognition more: chunker, parser, co-reference resolution http://opennlp.sourceforge.net/ 12 / 28
  • 13. Hadoop Graz University of Technology Distributed computing Scale out framework Distributed file-system Data is partitioned Stored on multiple nodes Map/Reduce paradigm Map your algorithms to mappers & reducers Related projects: HBase, Pig, Hive, ... http://hadoop.apache.org/ 13 / 28
  • 14. Mahout Graz University of Technology Distributed machine learning Scale out framework Machine learning Recommender systems Clustering Classification Integration Standalone Hadoop Amazon EC2 http://mahout.apache.org/ 14 / 28
  • 15. Details Graz University of Technology 15 / 28
  • 16. Search Server Graz University of Technology What Solr is Web-Service Full-text indexing & search Support to store arbitrary content What Solr isn’t Solr = grep Database But, somehow similar to No-SQL databases Solr vs. IR-Lib Solr: easy to use, easy to integrate, XML configuration IR-Lib: expert knowledge to use, Java configuration, fast 16 / 28
  • 17. Index Structure Graz University of Technology Inverted Index Dictionary of words (terms) Map from term to document Document List of fields Input fields are them mapped according to the schema Field-types Defined in the schema Type (string, boolean, date, number) - internally mapped to string 17 / 28
  • 18. Index Management Graz University of Technology API HTTP Server Various formats (XML, binary, JavaScript, ...) Document life-cycle There is no update Delete (done automatically by Solr) Insert Implications An unique id is necessary Use batch updates Commit, rollback (and optimize) 18 / 28
  • 19. Input Handling Graz University of Technology Different input formats XML CSV JDBC (database) DIH (data import handler) Support incremental updates (via timestamps) Solr Cell Binary content Apache Tika Text content and metadata 19 / 28
  • 20. Text Processing Graz University of Technology Scope During indexing & query Tokenization Split text into tokens Lower-case alignment Stemming (e.g. ponies, pony ⇒ poni, triplicate ⇒ triplic, ...) Synonyms (via Thesaurus) Stop-word filtering Multi-word splitting (e.g. Wi-Fi ⇒ Wi, Fi) n-grams, soundex, umlauts 20 / 28
  • 21. Query Processing Graz University of Technology Query parsers Lucene query parser (rich syntax) AND, OR, NOT, range queries, wildcards, fuzzy query, phrase query Boosting of individual parts Example: ((boltzmann OR schroedinger) NOT einstein) Dismax query parser No query syntax Searches over multiple fields (separate boost for each field) Configure the amount of terms to be mandatory Distance between terms is used for ranking (phrase boosting) Dismax is a good starting point, but may become expensive 21 / 28
  • 22. Search Features Graz University of Technology Query filter Additional query No impact on ranking Results are cached Boosting query Only in Dismax Query elevation Fix certain queries Request handler Pre-define clauses Invariants 22 / 28
  • 23. Search Result Graz University of Technology Ranking Relevance Sort on field value (only single term per document) Available data & features Sequence of IDs & score Stored fields Snippets (plus highlighting) Facets Count the search hits Types: field value, dates, queries Sort, prefix, ... Could be used for term suggestion (aka. query suggestion) Field collapsing (grouping) Spell checking (did-you-mean) 23 / 28
  • 24. Additional Solr Features Graz University of Technology Query by Example More like this Stats Per field Min, max, sum, missing, ... Admin-GUI Webapp to troubleshoot queries Browse schema JMX Read properties & statistics Can be accessed remotely 24 / 28
  • 25. Integration Graz University of Technology Deployment Within a web application server Embedded Monitor Log output Access Various language bindings Java, Ruby, JavaScript, PHP, ... 25 / 28
  • 26. Multi-core Graz University of Technology Multiple indices Each index has its own configuration Operations Reload (when configuration has been changed) Rename Swap Merge Create, Status 26 / 28
  • 27. Scale Solr Graz University of Technology Replication Master and slaves nodes Replication Slaves poll master Dispatch search request Load balancer 27 / 28
  • 28. Sharding Indexes Graz University of Technology Single index Index spawned over multiple machines Search is done in parallel Mapping Application has to provide a deterministic mapping Document ⇒ index 28 / 28
  • 29. Conclusions Graz University of Technology Ecosystem Vivid community Corporative backing Solr Easy to get started Hard to optimize for specific requirements 29 / 28
  • 30. The End Graz University of Technology Thank you! 30 / 28