Apache Lucene
Full text search
Marcelo
What’s that?
 API created on 00’s
 Apache owns that
 Indexing
 Searching
 Available on Java, .NET, C++
Why is that so good?
 Enhance user experience
 More inteligent products
 Speed processing
 Relevance
 Efficient
 Suggestions
Indexing
 IndexWritter
1. Directory implementation
2. Analizer
 Create documents
 Add these document to IndexWritter
 Optimize (merge segments)
 Close writter
Indexing
Searching
 Directory
 IndexSearcher
 QueryParser
 Query(“my search”)
 TopDocs
Searching
How does that work?
 Inverted Index
 Term Normalization
1. Similar words (merge)
2. Stop words (remove)
3. +relevance –size on disk
Term Document Ids
And 1,2,3
Big 2,4,7
Fire 1
Keep 7,8
keeper 3,4
the 1,8
Analyzers
“@Andy52 went to school yesterday!”
 StandardAnalyzer
[@Andy52] [went] [school] [yesterday!]
 StopAnalyzer
[Andy] [went] [school] [yesterday]
 SimpleAnalyzer
[andy] [went] [to] [school] [yesterday]
 WhitespaceAnalyzer
[@Andy52] [went] [to] [school] [yesterday]
 KeywordAnalyzer
[@Andy52 went to school yesterday!]
What known apps use that?
 Twitter
 Linked In
 My Space
That’s all, thanks!

Apache lucene - full text search

  • 1.
  • 2.
    What’s that?  APIcreated on 00’s  Apache owns that  Indexing  Searching  Available on Java, .NET, C++
  • 3.
    Why is thatso good?  Enhance user experience  More inteligent products  Speed processing  Relevance  Efficient  Suggestions
  • 4.
    Indexing  IndexWritter 1. Directoryimplementation 2. Analizer  Create documents  Add these document to IndexWritter  Optimize (merge segments)  Close writter
  • 5.
  • 6.
    Searching  Directory  IndexSearcher QueryParser  Query(“my search”)  TopDocs
  • 7.
  • 8.
    How does thatwork?  Inverted Index  Term Normalization 1. Similar words (merge) 2. Stop words (remove) 3. +relevance –size on disk Term Document Ids And 1,2,3 Big 2,4,7 Fire 1 Keep 7,8 keeper 3,4 the 1,8
  • 9.
    Analyzers “@Andy52 went toschool yesterday!”  StandardAnalyzer [@Andy52] [went] [school] [yesterday!]  StopAnalyzer [Andy] [went] [school] [yesterday]  SimpleAnalyzer [andy] [went] [to] [school] [yesterday]  WhitespaceAnalyzer [@Andy52] [went] [to] [school] [yesterday]  KeywordAnalyzer [@Andy52 went to school yesterday!]
  • 10.
    What known appsuse that?  Twitter  Linked In  My Space
  • 11.