SlideShare a Scribd company logo
Enrico Risa
The Dynamic Duo
OrientDB & Lucene
Outline
❖ Apache Lucene in a nutshell!
❖ OrientDB Indexing!
❖ OrientDB-Lucene

- Full Text Index

- Spatial Index!
❖ Roadmap 2.0
What Is Lucene?
❖ Free-text indexing library!
❖ Implements standard IR/search functionality

● Query models, ranking, indexing!
❖ Written in Java!
❖ Simple Api!
❖ Fast, Mature and constantly evolving!
❖ Many extension points
Who uses Lucene?
❖ Twitter!
❖ Linkedin!
❖ Apple!
❖ Solr!
❖ Elastic Search!
❖ Neo4J!
❖ and now OrientDB
Base Lucene workflow
Documents
❖ Basic Unit for indexing and searching!
❖ Contains a list of Fields!
❖ Schema-less
Fields
❖ Basic component of a Document!
❖ Fields

- name

- value

- store

- analyzed

Fields Types & Options
❖ Types

-Field

-StringField

-TextField

-StoredField

-IntField

-…More!
❖ Options

-Stored or Not

-Indexed or not

-Analyzed or not



Directory
❖ RAMDirectory

Ram based index!
❖ FSDirectory

File-based index!
❖ NIOFSDirectory

Same as FSDirectory but using NIO api.

Indexing Documents
Searching Index
Inverted Index
Luke: a graphical user interface
❖ Open Lucene Index!
❖ Browse documents!
❖ Run query!
❖ ….
OrientDB Indexing
❖ SBTree 

(Unique,Not unique, Full Text, Dictionary)!
❖ HashIndex

(Unique,Not unique, Full Text, Dictionary)!
❖ MVRB-Tree (Deprecated since 1.6)!
❖ Lucene (OrientDB-Lucene)!
❖ … https://github.com/orientechnologies/orientdb/
wiki/Custom-Index-Engine
OrientDB Lucene
❖ Open Source at 

https://github.com/orientechnologies/orientdb-lucene!
❖ This project aims to bring the power of Lucene index
into OrientDB.!
❖ Supports only Spatial Index And Full Text
Installing OrientDB Lucene
❖ Embedded Mode







❖ Server Mode

Grab a jar build and copy it into $ORIENTDB_HOME/plugins
Spatial Index
❖ No native implementation.!
❖ Build on top Lucene-Spatial Module.!
❖ Currently only points are supported.!
❖ Near and Within query.
Lucene Spatial
❖ Spatial4j

- Handle Shapes (Point,Circle,Rectangle, Polygon)

- Distance and Area math utitilities

- Read WKT format!
❖ Provide Indexing Strategy

- RecursivePrefixTree!
❖ Spatial Query using Shapes
Creating a Spatial Index
❖ SQL



❖ JAVA
Spatial Operators
❖ NEAR

Find all Points near a given location (latitude,longitude)!
❖ WITHIN

Find all Points within a Given Bounding Box
Near Operator
❖ Custom Operator that rely on Lucene Index!
❖ Special Syntax to support spatial args ($spatial)!
❖ Context variable $distance!
❖ Result set sorted from nearest to farthest.
Within Operator
❖ Bounding Box Search!
❖ Currently Points within Box!
❖ Result set not sorted
Full Text Index
❖ Native Full Text Implementation.!
❖ Supports multiple fields.!
❖ Supports Lucene query syntax.!
❖ Lucene Analyzers
Creating a Full Text Index
❖ SQL



❖ JAVA
Full Text Operators
❖ LUCENE

[<fields>] LUCENE <exp>



- Query your index using Query Parser syntax

- Support Multiple fields

- Target all fields (MultiFieldQueryParser)

- Target specific field (QueryParser)

Lucene Operator
❖ MultiFieldQueryParser

Target all fields



❖ QueryParser

Target specific field
Indexing Performance
❖ Full Text

- 9M records in ~300s with StandardAnalyzer and one field!
❖ Spatial 

9M records in ~500s with two field (Point)
Roadmap 2.0
❖ Production Ready!
❖ Monitoring lucene index!
❖ More configuration!
❖ Gui tool integrated in Studio
Roadmap 2.0 (Spatial Index)
❖ Index more shape!
❖ More operators (Intersect..)!
❖ Not only BBox!
❖ Support for GeoJson

http://geojson.org
Roadmap 2.0 (Full Text)
❖ Document & Field Boosting!
❖ Score in result set!
❖ Custom Analyzers & Filters!
❖ Search Engine
Thank You
Questions?
❖ Contact Me

- Enrico Risa e.risa@orientechnologies.com

- Twitter https://twitter.com/wolf4ood

More Related Content

What's hot (19)

PPTX
Day 8 - jRuby
Barry Jones
 
PPTX
Day 7 - Make it Fast
Barry Jones
 
PPTX
Search and analyze your data with elasticsearch
Anton Udovychenko
 
PDF
Kubernetes and AWS Lambda can play nicely together
Edward Wilde
 
PPTX
Automation with phing
Joey Rivera
 
PPTX
Taming Text
Grant Ingersoll
 
ODP
Introduction to Apache Solr
Shalin Shekhar Mangar
 
PDF
Alexey Golub - Writing parsers in c# | 3Shape Meetup
Oleksii Holub
 
PPTX
I18nize Scala programs à la gettext
Ngoc Dao
 
PDF
Elastic Search
Lukas Vlcek
 
PDF
Find it, possibly also near you!
Paul Borgermans
 
PPTX
Building Apis in Scala with Playframework2
Manish Pandit
 
PPTX
Indic threads pune12-typesafe stack software development on the jvm
IndicThreads
 
PDF
Gizzard, DAL and more
fulin tang
 
PDF
Taking eZ Find beyond full-text search
Paul Borgermans
 
KEY
A rubyist's naive comparison of some database systems and toolkits
Belighted
 
PPTX
Dns system Ahmadullah Alnoor at AfSIG 2017 by NITPAA
Ahmad Waleed Khaliqi
 
PPTX
Go from a PHP Perspective
Barry Jones
 
PDF
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 
Day 8 - jRuby
Barry Jones
 
Day 7 - Make it Fast
Barry Jones
 
Search and analyze your data with elasticsearch
Anton Udovychenko
 
Kubernetes and AWS Lambda can play nicely together
Edward Wilde
 
Automation with phing
Joey Rivera
 
Taming Text
Grant Ingersoll
 
Introduction to Apache Solr
Shalin Shekhar Mangar
 
Alexey Golub - Writing parsers in c# | 3Shape Meetup
Oleksii Holub
 
I18nize Scala programs à la gettext
Ngoc Dao
 
Elastic Search
Lukas Vlcek
 
Find it, possibly also near you!
Paul Borgermans
 
Building Apis in Scala with Playframework2
Manish Pandit
 
Indic threads pune12-typesafe stack software development on the jvm
IndicThreads
 
Gizzard, DAL and more
fulin tang
 
Taking eZ Find beyond full-text search
Paul Borgermans
 
A rubyist's naive comparison of some database systems and toolkits
Belighted
 
Dns system Ahmadullah Alnoor at AfSIG 2017 by NITPAA
Ahmad Waleed Khaliqi
 
Go from a PHP Perspective
Barry Jones
 
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 

Viewers also liked (10)

PDF
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks
 
PDF
Webinar: Solr 6 Deep Dive - SQL and Graph
Lucidworks
 
KEY
Apache UIMA and Semantic Search
Tommaso Teofili
 
PDF
Tackling a 1 billion member social network
Artur Bańkowski
 
PPT
Dynamic Application Development by NodeJS ,AngularJS with OrientDB
Apaichon Punopas
 
PPTX
OrientDB vs Neo4j - and an introduction to NoSQL databases
Curtis Mosters
 
PPT
Natural Language Processing with Neo4j
Kenny Bastani
 
PDF
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Lucidworks
 
PDF
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
PPTX
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks
 
Webinar: Solr 6 Deep Dive - SQL and Graph
Lucidworks
 
Apache UIMA and Semantic Search
Tommaso Teofili
 
Tackling a 1 billion member social network
Artur Bańkowski
 
Dynamic Application Development by NodeJS ,AngularJS with OrientDB
Apaichon Punopas
 
OrientDB vs Neo4j - and an introduction to NoSQL databases
Curtis Mosters
 
Natural Language Processing with Neo4j
Kenny Bastani
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Lucidworks
 
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
Curtis Mosters
 
Ad

Similar to OrientDB & Lucene (20)

PDF
Lucene 101
Varun Thacker
 
PDF
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PDF
Portable UDFs: Write Once, Run Anywhere
Databricks
 
PDF
Full Text Search with Lucene
WO Community
 
PPTX
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
PDF
What is in a Lucene index?
lucenerevolution
 
PPT
Finite State Queries In Lucene
otisg
 
PDF
Elasticsearch Basics
Shifa Khan
 
PDF
Turning a Search Engine into a Relational Database
Matthias Wahl
 
PPTX
Musings on Secondary Indexing in HBase
Jesse Yates
 
PDF
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
OpenBlend society
 
PDF
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Sease
 
PDF
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
PPTX
Powering an API with GraphQL, Golang, and NoSQL
Nic Raboy
 
PDF
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PT.JUG
 
PPTX
Doctrine 2.0 Enterprise Persistence Layer for PHP
Guilherme Blanco
 
ODP
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
PDF
ElasticSearch - Search done right
bwullems
 
PDF
Techorama 2018 - Elasticsearch - search done right - Bart Wullems
N Core
 
Lucene 101
Varun Thacker
 
NoSQL, Apache SOLR and Apache Hadoop
Dmitry Kan
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Portable UDFs: Write Once, Run Anywhere
Databricks
 
Full Text Search with Lucene
WO Community
 
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong
 
What is in a Lucene index?
lucenerevolution
 
Finite State Queries In Lucene
otisg
 
Elasticsearch Basics
Shifa Khan
 
Turning a Search Engine into a Relational Database
Matthias Wahl
 
Musings on Secondary Indexing in HBase
Jesse Yates
 
Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero...
OpenBlend society
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Sease
 
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
Powering an API with GraphQL, Golang, and NoSQL
Nic Raboy
 
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PT.JUG
 
Doctrine 2.0 Enterprise Persistence Layer for PHP
Guilherme Blanco
 
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
ElasticSearch - Search done right
bwullems
 
Techorama 2018 - Elasticsearch - search done right - Bart Wullems
N Core
 
Ad

Recently uploaded (20)

PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 

OrientDB & Lucene

  • 1. Enrico Risa The Dynamic Duo OrientDB & Lucene
  • 2. Outline ❖ Apache Lucene in a nutshell! ❖ OrientDB Indexing! ❖ OrientDB-Lucene
 - Full Text Index
 - Spatial Index! ❖ Roadmap 2.0
  • 3. What Is Lucene? ❖ Free-text indexing library! ❖ Implements standard IR/search functionality
 ● Query models, ranking, indexing! ❖ Written in Java! ❖ Simple Api! ❖ Fast, Mature and constantly evolving! ❖ Many extension points
  • 4. Who uses Lucene? ❖ Twitter! ❖ Linkedin! ❖ Apple! ❖ Solr! ❖ Elastic Search! ❖ Neo4J! ❖ and now OrientDB
  • 6. Documents ❖ Basic Unit for indexing and searching! ❖ Contains a list of Fields! ❖ Schema-less
  • 7. Fields ❖ Basic component of a Document! ❖ Fields
 - name
 - value
 - store
 - analyzed

  • 8. Fields Types & Options ❖ Types
 -Field
 -StringField
 -TextField
 -StoredField
 -IntField
 -…More! ❖ Options
 -Stored or Not
 -Indexed or not
 -Analyzed or not
 

  • 9. Directory ❖ RAMDirectory
 Ram based index! ❖ FSDirectory
 File-based index! ❖ NIOFSDirectory
 Same as FSDirectory but using NIO api.

  • 13. Luke: a graphical user interface ❖ Open Lucene Index! ❖ Browse documents! ❖ Run query! ❖ ….
  • 14. OrientDB Indexing ❖ SBTree 
 (Unique,Not unique, Full Text, Dictionary)! ❖ HashIndex
 (Unique,Not unique, Full Text, Dictionary)! ❖ MVRB-Tree (Deprecated since 1.6)! ❖ Lucene (OrientDB-Lucene)! ❖ … https://github.com/orientechnologies/orientdb/ wiki/Custom-Index-Engine
  • 15. OrientDB Lucene ❖ Open Source at 
 https://github.com/orientechnologies/orientdb-lucene! ❖ This project aims to bring the power of Lucene index into OrientDB.! ❖ Supports only Spatial Index And Full Text
  • 16. Installing OrientDB Lucene ❖ Embedded Mode
 
 
 
 ❖ Server Mode
 Grab a jar build and copy it into $ORIENTDB_HOME/plugins
  • 17. Spatial Index ❖ No native implementation.! ❖ Build on top Lucene-Spatial Module.! ❖ Currently only points are supported.! ❖ Near and Within query.
  • 18. Lucene Spatial ❖ Spatial4j
 - Handle Shapes (Point,Circle,Rectangle, Polygon)
 - Distance and Area math utitilities
 - Read WKT format! ❖ Provide Indexing Strategy
 - RecursivePrefixTree! ❖ Spatial Query using Shapes
  • 19. Creating a Spatial Index ❖ SQL
 
 ❖ JAVA
  • 20. Spatial Operators ❖ NEAR
 Find all Points near a given location (latitude,longitude)! ❖ WITHIN
 Find all Points within a Given Bounding Box
  • 21. Near Operator ❖ Custom Operator that rely on Lucene Index! ❖ Special Syntax to support spatial args ($spatial)! ❖ Context variable $distance! ❖ Result set sorted from nearest to farthest.
  • 22. Within Operator ❖ Bounding Box Search! ❖ Currently Points within Box! ❖ Result set not sorted
  • 23. Full Text Index ❖ Native Full Text Implementation.! ❖ Supports multiple fields.! ❖ Supports Lucene query syntax.! ❖ Lucene Analyzers
  • 24. Creating a Full Text Index ❖ SQL
 
 ❖ JAVA
  • 25. Full Text Operators ❖ LUCENE
 [<fields>] LUCENE <exp>
 
 - Query your index using Query Parser syntax
 - Support Multiple fields
 - Target all fields (MultiFieldQueryParser)
 - Target specific field (QueryParser)

  • 26. Lucene Operator ❖ MultiFieldQueryParser
 Target all fields
 
 ❖ QueryParser
 Target specific field
  • 27. Indexing Performance ❖ Full Text
 - 9M records in ~300s with StandardAnalyzer and one field! ❖ Spatial 
 9M records in ~500s with two field (Point)
  • 28. Roadmap 2.0 ❖ Production Ready! ❖ Monitoring lucene index! ❖ More configuration! ❖ Gui tool integrated in Studio
  • 29. Roadmap 2.0 (Spatial Index) ❖ Index more shape! ❖ More operators (Intersect..)! ❖ Not only BBox! ❖ Support for GeoJson
 http://geojson.org
  • 30. Roadmap 2.0 (Full Text) ❖ Document & Field Boosting! ❖ Score in result set! ❖ Custom Analyzers & Filters! ❖ Search Engine
  • 31. Thank You Questions? ❖ Contact Me
 - Enrico Risa [email protected]
 - Twitter https://twitter.com/wolf4ood