SlideShare a Scribd company logo
4
Most read
6
Most read
12
Most read
Apache Lucene
Presenter – Anirudh Sharma
What we’ll cover?
 What is Lucene?
 Basic Concept
 Indexing
 Documents
 Searching
 Code Examples
What is Lucene?
 Full-text search library in Java.
 Adds content to a full-text index.
 Allows you to perform queries on this index and then results.
 Content can be from various sources, like an SQL/NoSQL database, a
filesystem, or even from websites.
Basic Concept
Raw
Content
Acquire
content
Build
document
Analyze
document
Index
document
Index
Users
Search UI
Build
query
Render
results
Run query
Core Inndexing Classes
 IndexWriter
 Central component that allows you to create a new index, open an existing one,
and add, remove, or update documents in an index
 Built on an IndexWriterConfig and a Directory
 Directory
 Abstract class that represents the location of an index
 Analyzer
 Extracts tokens from a text stream
Creating an IndexWriter
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
...
private IndexWriter writer;
public Indexer(String dir) throws IOException {
Directory indexDir = FSDirectory.open(new File(dir));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig cfg = new IndexWriterConfig(analyzer);
cfg.setOpenMode(OpenMode.CREATE);
writer = new IndexWriter(indexDir, cfg)
}
Core indexing classes (contd.)
 Document
 A Document is the unit of search and index
 Represents a collection of named Fields. Text in these Fields are indexed.
 Field
 A Document consists of one or more Fields.
 A Field is simply a name-value pair. For example, a Field commonly found in
applications is title. In the case of a title Field, the field name is title and the
value is the title of that content item.
Creating a Document
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
...
protected Document getDocument(Integer id, String firstName, String lastName,
String website) throws Exception {
Document document = new Document();
document.add(new StringField("id", id.toString(),Field.Store.YES));
document.add(new TextField("firstName", firstName, Field.Store.YES));
document.add(new TextField("lastName", lastName, Field.Store.YES));
document.add(new TextField("website", website, Field.Store.YES));
return document;
}
Core Searching Classes
 IndexSearcher
 Central class that exposes several search methods on an index
 Accessed via an IndexReader
 Query
 Abstract query class. Concrete subclasses represent specific types of queries,
e.g., matching terms in fields, boolean queries, phrase queries etc.
 QueryParser
 Parses a textual representation of a query into a Query instance
IndexSearcher
IndexSearcher
IndexReader
Directory
Query TopDocs
Creating an IndexSearcher
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.Directory;
import org.apache.lucene.index.IndexReader;
...
public static void search(String indexDir,String q)throws IOException{
private static IndexSearcher createSearcher() {
Directory directory = FSDirectory.open(Paths.get(indexDir));
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
return searcher;
}
Query and Query Parser
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
private static TopDocs search(String keyword, IndexSearcher searcher)
throws Exception {
QueryParser qp = new QueryParser(“keyword", new
StandardAnalyzer());
Query keywordQuery = qp.parse(firstName);
TopDocs hits = searcher.search(keywordQuery, 10);
return hits;
}
Core searching classes (contd.)
 TopDocs
 Contains references to the top documents returned by a search
 ScoreDoc
 Represents a single search result
TopDocs and ScoreDoc
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
…..
TopDocs foundDocs = searchById(1, searcher);
System.out.println("Total results: " + foundDocs1.totalHits);
for(ScoreDoc doc : foundDocs.scoreDocs) {
Document d = searcher.doc(doc.doc);
System.out.println(String.format(d.get("firstName")));
}
Resources
 Complete code used in this session - https://github.com/ani03sha/lucene-
starter
 Further reading - https://lucene.apache.org/core/quickstart.html
Thank you

More Related Content

What's hot (6)

PDF
Python - Lecture 11
Ravi Kiran Khareedi
 
PDF
Pinecone Vector Database.pdf
Aniruddha Chakrabarti
 
PDF
Programmazione funzionale e Stream in Java
Cristina Attori
 
PDF
Python Flask Tutorial For Beginners | Flask Web Development Tutorial | Python...
Edureka!
 
PPTX
(SW 아키텍트 대회 2차)단위테스트자동화도구
Lim SungHyun
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
Python - Lecture 11
Ravi Kiran Khareedi
 
Pinecone Vector Database.pdf
Aniruddha Chakrabarti
 
Programmazione funzionale e Stream in Java
Cristina Attori
 
Python Flask Tutorial For Beginners | Flask Web Development Tutorial | Python...
Edureka!
 
(SW 아키텍트 대회 2차)단위테스트자동화도구
Lim SungHyun
 
Why APM Is Not the Same As ML Monitoring
Databricks
 

Similar to Apache Lucene Basics (20)

PPTX
Apache lucene
Dr. Abhiram Gandhe
 
PPTX
Lucene
Harshit Agarwal
 
PPT
Lucene basics
Nitin Pande
 
PDF
Tutorial 5 (lucene)
Kira
 
PDF
Full Text Search with Lucene
WO Community
 
PPT
Advanced full text searching techniques using Lucene
Asad Abbas
 
PPT
Lucene and MySQL
farhan "Frank"​ mashraqi
 
ODP
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
PPTX
Introduction to Information Retrieval using Lucene
DeeKan3
 
PPT
Lucene BootCamp
GokulD
 
PPT
Apache Lucene Searching The Web
Francisco Gonçalves
 
PPT
Lucene Bootcamp - 2
GokulD
 
PPTX
Search enabled applications with lucene.net
Willem Meints
 
PPTX
Introduction to apache lucene
Shrikrishna Parab
 
PDF
IR with lucene
Stelios Gorilas
 
PPTX
Apache lucene - full text search
Marcelo Cure
 
PPTX
Lucene indexing
Lucky Sharma
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
PPTX
Search Me: Using Lucene.Net
gramana
 
PPTX
Illuminating Lucene.Net
Dean Thrasher
 
Apache lucene
Dr. Abhiram Gandhe
 
Lucene basics
Nitin Pande
 
Tutorial 5 (lucene)
Kira
 
Full Text Search with Lucene
WO Community
 
Advanced full text searching techniques using Lucene
Asad Abbas
 
Lucene and MySQL
farhan "Frank"​ mashraqi
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Introduction to Information Retrieval using Lucene
DeeKan3
 
Lucene BootCamp
GokulD
 
Apache Lucene Searching The Web
Francisco Gonçalves
 
Lucene Bootcamp - 2
GokulD
 
Search enabled applications with lucene.net
Willem Meints
 
Introduction to apache lucene
Shrikrishna Parab
 
IR with lucene
Stelios Gorilas
 
Apache lucene - full text search
Marcelo Cure
 
Lucene indexing
Lucky Sharma
 
Lucene for Solr Developers
Erik Hatcher
 
Search Me: Using Lucene.Net
gramana
 
Illuminating Lucene.Net
Dean Thrasher
 
Ad

Recently uploaded (20)

PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Ad

Apache Lucene Basics

  • 2. What we’ll cover?  What is Lucene?  Basic Concept  Indexing  Documents  Searching  Code Examples
  • 3. What is Lucene?  Full-text search library in Java.  Adds content to a full-text index.  Allows you to perform queries on this index and then results.  Content can be from various sources, like an SQL/NoSQL database, a filesystem, or even from websites.
  • 5. Core Inndexing Classes  IndexWriter  Central component that allows you to create a new index, open an existing one, and add, remove, or update documents in an index  Built on an IndexWriterConfig and a Directory  Directory  Abstract class that represents the location of an index  Analyzer  Extracts tokens from a text stream
  • 6. Creating an IndexWriter import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; ... private IndexWriter writer; public Indexer(String dir) throws IOException { Directory indexDir = FSDirectory.open(new File(dir)); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig cfg = new IndexWriterConfig(analyzer); cfg.setOpenMode(OpenMode.CREATE); writer = new IndexWriter(indexDir, cfg) }
  • 7. Core indexing classes (contd.)  Document  A Document is the unit of search and index  Represents a collection of named Fields. Text in these Fields are indexed.  Field  A Document consists of one or more Fields.  A Field is simply a name-value pair. For example, a Field commonly found in applications is title. In the case of a title Field, the field name is title and the value is the title of that content item.
  • 8. Creating a Document import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; ... protected Document getDocument(Integer id, String firstName, String lastName, String website) throws Exception { Document document = new Document(); document.add(new StringField("id", id.toString(),Field.Store.YES)); document.add(new TextField("firstName", firstName, Field.Store.YES)); document.add(new TextField("lastName", lastName, Field.Store.YES)); document.add(new TextField("website", website, Field.Store.YES)); return document; }
  • 9. Core Searching Classes  IndexSearcher  Central class that exposes several search methods on an index  Accessed via an IndexReader  Query  Abstract query class. Concrete subclasses represent specific types of queries, e.g., matching terms in fields, boolean queries, phrase queries etc.  QueryParser  Parses a textual representation of a query into a Query instance
  • 11. Creating an IndexSearcher import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.store.Directory; import org.apache.lucene.index.IndexReader; ... public static void search(String indexDir,String q)throws IOException{ private static IndexSearcher createSearcher() { Directory directory = FSDirectory.open(Paths.get(indexDir)); IndexReader reader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); return searcher; }
  • 12. Query and Query Parser import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Query; private static TopDocs search(String keyword, IndexSearcher searcher) throws Exception { QueryParser qp = new QueryParser(“keyword", new StandardAnalyzer()); Query keywordQuery = qp.parse(firstName); TopDocs hits = searcher.search(keywordQuery, 10); return hits; }
  • 13. Core searching classes (contd.)  TopDocs  Contains references to the top documents returned by a search  ScoreDoc  Represents a single search result
  • 14. TopDocs and ScoreDoc import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; ….. TopDocs foundDocs = searchById(1, searcher); System.out.println("Total results: " + foundDocs1.totalHits); for(ScoreDoc doc : foundDocs.scoreDocs) { Document d = searcher.doc(doc.doc); System.out.println(String.format(d.get("firstName"))); }
  • 15. Resources  Complete code used in this session - https://github.com/ani03sha/lucene- starter  Further reading - https://lucene.apache.org/core/quickstart.html