SlideShare a Scribd company logo
Samuel
Folasayo
Building a Search App with
Apache Solr and Python
A guide to building a full-text search app using FastAPI and Solr
Joe Nyirenda
Learning Objectives
● Grasp the basics of full-text search and its applications
● Set up Apache Solr and integrate it with Python using pysolr
● Index, add, and retrieve data effectively in Solr
● Managing index configuration
● Perform basic and advanced search queries with filters and sorting
● Troubleshoot common issues for optimized data retrieval
Introduction to Apache Solr
What is Solr
A powerful open-source search platform
Built on Apache Lucene
Supports full-text search, faceted search, and analytics
Why Solr?
High performance
Scalable and extensible
Proven technology
eBay: Product search, auto-suggestions, and filtering for millions of listings
Netflix: Content search (movies, TV shows), personalized recommendations,
scalability
Adobe: Document, tutorial, and support content search with faceted filtering
Best Buy: Fast, relevant product search with customizable ranking
LinkedIn: People, jobs, and company search with advanced filters
Top Websites Using Apache Solr
Content Management Systems: Searching for articles or blogs by exact titles or
tags
E-commerce Platforms: Locating products by exact names or categories
Knowledge Bases: Retrieving FAQs or documentation by specific keywords or
phrases
Example Use Cases
The Average UK Salary for a Solr Administrator is £65,000 per year!
Apache Solr Jobs
Scalable and extensible search functionality
Easy-to-integrate Python API for querying Solr
Customizable and user-friendly UI
Benefits of This Implementation
Prerequisites:
Java (JDK 8 or higher)
Apache Solr downloaded and installed
Steps:
Download and extract Solr from the official Solr site.
Start Solr with bin/solr start.
Create a core (or collection) with bin/solr create -c <core_name>.
Setting Up Apache Solr
CPU: At least 2 CPU cores (more cores for high query volume and indexing)
Memory: Minimum 8GB RAM (16GB or more recommended for large datasets and
high traffic)
Disk Space: At least 10GB free disk space (more depending on index size)
Disk Type: SSDs recommended for faster indexing and query performance
Hardware Requirements:
Operating System: Linux, macOS, or Windows (Linux is the most common for
production).
Java: JDK 8 or higher.
Network: Solr instances require access to ports 8983 (default for Solr) and may need
additional ports for replication or clustering.
System Requirements:
Considerations:
Data Volume: Estimate data to be indexed (affects memory & disk usage)
Query Load: Plan for expected queries per second (affects CPU & RAM)
Indexing Load: Allocate resources for frequent indexing without slowing down
queries
Redundancy & High Availability: Use SolrCloud for distributed search and
failover (requires more resources)
Best Practices:
Memory Allocation: Allocate ~50% of system RAM to Solr (max 64GB, but leave
room for other apps)
Disk I/O: Use SSD storage for better performance with large datasets
Replication: Set up SolrCloud replication for load distribution and improved
availability
Sizing an Apache Solr Instance:
Definition:
A Solr schema is a configuration file that defines the structure of the data Solr indexes and
searches
It describes the fields, their types, how data is indexed, stored, and how queries are handled
Key Components:
Fields: Specifies the data attributes (e.g., title, content, date)
Field Types: Defines the data type (e.g., string, text, integer, date) and how they are indexed
Analyzers: Determines how text fields are processed (e.g., tokenization, stemming)
Copy Fields: Allows combining multiple fields for efficient searching
What is a Solr Schema?
Prerequisites:
File: managed-schema.xml
Purpose: Defines field types and the structure of the Solr index
Define Fields in managed-schema.xml:
Setting Up Solr Schema
<field name="title" type="string" indexed="true" stored="true"/>
<field name="content" type="string" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true"/>
indexed="true": The field will be searchable
stored="true": The field’s value will be stored and retrievable (useful for
returning field values in search results)
How to Create a Schema in the Solr Admin UI
Access the Schema Tab:
Open the Solr Admin UI (e.g., http://localhost:8983/solr), select the core you want to modify, and
click on the Schema tab.
Add Fields and Field Types:
Under the Fields section, click Add Field to define the field name, type, and attributes like indexing.
Optionally, add new Field Types under the Field Types section.
Apply Changes:
After making changes, go to the Core Admin tab and click Reload to apply your schema updates.
Setting Up Solr Schema
Key Field Types:
text_general:
Tokenized: Breaks text into individual
words/tokens
Supports text analysis: Useful for
full-text search (e.g., search by
keywords in a document)
Impact on Performance:
Pros: Optimized for search relevance and
flexibility
Cons: Requires more CPU and memory
for indexing and querying due to text
analysis and tokenization
Impact on Index Size:
Larger index size due to tokenization and
additional metadata for search analysis
Setting Up Solr Schema
Key Field Types:
string:
Exact Matches: No tokenization,
stores the entire string as a single
value.
Not tokenized: Ideal for fields that
require exact matches (e.g., IDs,
usernames).
Impact on Performance:
Pros: Faster for exact matches (e.g.,
filtering, sorting).
Cons: Less flexible for full-text search
operations.
Impact on Index Size:
Smaller index size compared to
text_general, as no additional
processing is required.
Impact of Field Types on Performance and
Index Size
Performance Impact:
text_general requires more
processing (tokenization, text
analysis), which can impact
indexing speed and query
response time
string offers faster exact
matching but is not useful for
full-text search
Index Size Impact:
text_general creates larger
indexes due to tokenization and
additional metadata
string typically results in
smaller index sizes since it
doesn’t require tokenization
Optimizes Search Performance
Ensures Accurate Data Representation
Customizes Indexing Behavior
Supports Complex Queries
Scalability and Flexibility
Benefits of Proper Solr Schema Configuration
Common Solr Field Types and Their Use Cases
Field Type Purpose Use Case
text_general Tokenized, analyzed
text
Full-text search (e.g.,
articles)
string Exact match, no
analysis
Identifiers (e.g., IDs,
codes)
text_en Tokenized, analyzed
text with language-
specific analysis
(English)
Full-text search for
English language
content (e.g., blog
posts, product
descriptions)
int Integer values, no
analysis
Storing and querying
numerical values like
product prices, user
ages, or ratings
date Date and time values,
no analysis
Filtering or sorting by
date fields, such as
creation or
modification
Deciding on Field Types
Application Use Case:
Use text_general for title and content to enable full-text search
Use string for id as an exact identifier
Numerical Fields: Use int for whole numbers and float for decimal values (e.g.,
prices, ratings)
Date Fields: Use date or pdate when you need to filter or range-query dates (e.g.,
transaction timestamps)
Booleans: Use boolean for binary flags (e.g., active users)
Install Dependencies: pip install “fastapi[standard]” pysolr
Why FastAPI?
This setup leverages FastAPI for fast, asynchronous web requests and PySolr to
integrate Solr with Python, optimizing performance and ease of use
Search App with FastAPI and PySolr
Connecting to Solr
Initialize Solr Client:
import pysolr
solr = pysolr.Solr('http://localhost:8983/solr/my_core', always_commit=True)
Root Endpoint:
Defines the root endpoint (/) with an HTML response using FastAPI
Returns HTML content when the root URL is accessed, typically for landing pages or
documentation.
Asynchronous function ensures efficient handling of multiple requests concurrently
Creating the API: Root Endpoint
@app.get("/", response_class=HTMLResponse)
async def read_root():
return """
<html>...</html>
"""
Handles user queries with search term (query) and pagination (page).
Uses Solr to fetch and return results.
Dynamically generates search results in HTML.
Essential for building interactive search features.
Search Endpoint
@app.get("/search", response_class=HTMLResponse)
async def search(query: str = Query(...), page: int = Query(1)):
# Solr query logic here
return results_html
Example Query:
Query Parameters in Solr
query_params = {
"q": f"title:{query} content:{query}",
"hl": "true",
"start": start,
"rows": results_per_page,
}
results = solr.search(**query_params)
query_params defines fields, search term, highlighting, and pagination.
solr.search(**query_params) fetches results from Solr.
Proper structure ensures effective search and pagination.
Basic Search Form:
HTML UI for the Search Engine
<form action="/search" method="get">
<input type="text" name="query" placeholder="Search..."
required/>
<input type="submit" value="Search"/>
</form>
Enable Highlighting:
Highlighting Search Results
"hl": "true",
"hl.fl": "title,content",
Example Highlighted Result:
Highlighted text appears as <em>Highlighted Text</em> in the search results
to show where matches occur.
Highlighting improves the user experience by visually identifying search term
matches in results.
Handle multiple pages of results:
prev_page = page - 1 if page > 1 else None (for previous page)
next_page = page + 1 if len(results.docs) == results_per_page else
None (for next page)
Pagination
prev_page = page - 1 if page > 1 else None
next_page = page + 1 if len(results.docs) == results_per_page else None
<a href="?page=prev_page">Previous</a>
<a href="?page=next_page">Next</a>
Navigation Links:
<a href="?page=prev_page">Previous</a> (link to previous page)
<a href="?page=next_page">Next</a> (link to next page)
Benefits of Pagination
Improved Performance: Loads smaller, manageable chunks of data, reducing
server load and improving speed.
Better User Experience: Allows users to easily navigate through large datasets
without overwhelming them with too many results at once.
Scalability: Handles large amounts of data efficiently, making it easier to scale
applications without performance issues.
Clear Solr Index:
Data Cleanup Commands
curl "http://localhost:8983/solr/search_core/update?commit=true" -d
'<delete><query>*:*</query></delete>'
Deletes all documents in the specified Solr core (search_core) with the query *:*
commit=true ensures immediate changes
Benefits:
Clears outdated or irrelevant documents
Prevents index bloat, improving search performance
What is Solr Core?
A Solr core is an independent instance that contains its own index, schema, and
configuration, allowing multiple cores to run on a Solr server, each managing separate
datasets.
Explanation:
Base URL: localhost:8983 (Solr server)
Endpoint: /select (used for fetching search
results)
Parameters:
indent=true: Formats the response for
readability
q.op=OR: Default operator for query
terms is "OR"
q=*%3A*: Retrieves all documents (query
*:* encoded)
useParams: For query parameters
Solr Query URL
Viewing Search Results
The screenshot shows the Solr search query result as displayed in the browser.
Conclusion
Full text search is crucial for applications requiring precision and accuracy
Apache Solr offers a reliable platform for implementing robust, full-text search functionality
Integrating Solr with Python and FastAPI ensures flexibility, scalability, and ease of
development

More Related Content

Similar to Implementing full text search with Apache Solr (20)

PPTX
Apache solr
Dipen Rangwani
 
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
PDF
Solr Application Development Tutorial
Erik Hatcher
 
PDF
New-Age Search through Apache Solr
Edureka!
 
PPTX
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic
 
PPTX
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
PPTX
Search Engine Capabilities - Apache Solr(Lucene)
Manish kumar
 
KEY
Solr 101
Findwise
 
PPT
Apache Lucene Searching The Web
Francisco Gonçalves
 
ODP
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
DOCX
Apache solr tech doc
Barot Sagar
 
PDF
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
PDF
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Alkacon Software GmbH & Co. KG
 
PDF
Elasticsearch and Spark
Audible, Inc.
 
PDF
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
PPTX
IT talk SPb "Full text search for lazy guys"
DataArt
 
PDF
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic
 
PDF
Introduction to Solr
Erik Hatcher
 
PDF
Faceted Search with Lucene
lucenerevolution
 
PPT
ProjectHub
Sematext Group, Inc.
 
Apache solr
Dipen Rangwani
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Solr Application Development Tutorial
Erik Hatcher
 
New-Age Search through Apache Solr
Edureka!
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
David Hoerster
 
Search Engine Capabilities - Apache Solr(Lucene)
Manish kumar
 
Solr 101
Findwise
 
Apache Lucene Searching The Web
Francisco Gonçalves
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Apache solr tech doc
Barot Sagar
 
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
Alkacon Software GmbH & Co. KG
 
Elasticsearch and Spark
Audible, Inc.
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
BIOVIA
 
IT talk SPb "Full text search for lazy guys"
DataArt
 
Sumo Logic QuickStart Webinar - Jan 2016
Sumo Logic
 
Introduction to Solr
Erik Hatcher
 
Faceted Search with Lucene
lucenerevolution
 

More from techprane (17)

PDF
REDIS + FastAPI: Implementing a Rate Limiter
techprane
 
PDF
Performance Optimization MongoDB: Compound Indexes
techprane
 
PPTX
SSO with Social Login Integration & FastAPI Simplified
techprane
 
PDF
A Beginner's Guide to Tortoise ORM and PostgreSQL
techprane
 
PDF
Boost Your API with Asynchronous Programming in FastAPI
techprane
 
PDF
Top 10 Network Troubleshooting Commands.pdf
techprane
 
PPTX
Using jq to Process and Query MongoDB Logs
techprane
 
PPTX
How to Integrate PostgreSQL with Prometheus
techprane
 
PPTX
10 Basic Git Commands to Get You Started
techprane
 
PPTX
Top Linux 10 Commands for Windows Admins
techprane
 
PPTX
How to Overcome Doubts as a New Developer(Imposter Syndrome)
techprane
 
PPTX
How to Use JSONB in PostgreSQL for Product Attributes Storage
techprane
 
PDF
A Beginners Guide to Building MicroServices with FastAPI
techprane
 
PDF
Implementing Schema Validation in MongoDB with Pydantic
techprane
 
PPTX
Storing Large Image Files in MongoDB Using GRIDFS
techprane
 
PPTX
Open Source Mapping with Python, and MongoDB
techprane
 
PPTX
Learning MongoDB Aggregations in 10 Minutes
techprane
 
REDIS + FastAPI: Implementing a Rate Limiter
techprane
 
Performance Optimization MongoDB: Compound Indexes
techprane
 
SSO with Social Login Integration & FastAPI Simplified
techprane
 
A Beginner's Guide to Tortoise ORM and PostgreSQL
techprane
 
Boost Your API with Asynchronous Programming in FastAPI
techprane
 
Top 10 Network Troubleshooting Commands.pdf
techprane
 
Using jq to Process and Query MongoDB Logs
techprane
 
How to Integrate PostgreSQL with Prometheus
techprane
 
10 Basic Git Commands to Get You Started
techprane
 
Top Linux 10 Commands for Windows Admins
techprane
 
How to Overcome Doubts as a New Developer(Imposter Syndrome)
techprane
 
How to Use JSONB in PostgreSQL for Product Attributes Storage
techprane
 
A Beginners Guide to Building MicroServices with FastAPI
techprane
 
Implementing Schema Validation in MongoDB with Pydantic
techprane
 
Storing Large Image Files in MongoDB Using GRIDFS
techprane
 
Open Source Mapping with Python, and MongoDB
techprane
 
Learning MongoDB Aggregations in 10 Minutes
techprane
 
Ad

Recently uploaded (20)

PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Ad

Implementing full text search with Apache Solr

  • 1. Samuel Folasayo Building a Search App with Apache Solr and Python A guide to building a full-text search app using FastAPI and Solr Joe Nyirenda
  • 2. Learning Objectives ● Grasp the basics of full-text search and its applications ● Set up Apache Solr and integrate it with Python using pysolr ● Index, add, and retrieve data effectively in Solr ● Managing index configuration ● Perform basic and advanced search queries with filters and sorting ● Troubleshoot common issues for optimized data retrieval
  • 3. Introduction to Apache Solr What is Solr A powerful open-source search platform Built on Apache Lucene Supports full-text search, faceted search, and analytics Why Solr? High performance Scalable and extensible Proven technology
  • 4. eBay: Product search, auto-suggestions, and filtering for millions of listings Netflix: Content search (movies, TV shows), personalized recommendations, scalability Adobe: Document, tutorial, and support content search with faceted filtering Best Buy: Fast, relevant product search with customizable ranking LinkedIn: People, jobs, and company search with advanced filters Top Websites Using Apache Solr
  • 5. Content Management Systems: Searching for articles or blogs by exact titles or tags E-commerce Platforms: Locating products by exact names or categories Knowledge Bases: Retrieving FAQs or documentation by specific keywords or phrases Example Use Cases
  • 6. The Average UK Salary for a Solr Administrator is £65,000 per year! Apache Solr Jobs
  • 7. Scalable and extensible search functionality Easy-to-integrate Python API for querying Solr Customizable and user-friendly UI Benefits of This Implementation
  • 8. Prerequisites: Java (JDK 8 or higher) Apache Solr downloaded and installed Steps: Download and extract Solr from the official Solr site. Start Solr with bin/solr start. Create a core (or collection) with bin/solr create -c <core_name>. Setting Up Apache Solr
  • 9. CPU: At least 2 CPU cores (more cores for high query volume and indexing) Memory: Minimum 8GB RAM (16GB or more recommended for large datasets and high traffic) Disk Space: At least 10GB free disk space (more depending on index size) Disk Type: SSDs recommended for faster indexing and query performance Hardware Requirements:
  • 10. Operating System: Linux, macOS, or Windows (Linux is the most common for production). Java: JDK 8 or higher. Network: Solr instances require access to ports 8983 (default for Solr) and may need additional ports for replication or clustering. System Requirements:
  • 11. Considerations: Data Volume: Estimate data to be indexed (affects memory & disk usage) Query Load: Plan for expected queries per second (affects CPU & RAM) Indexing Load: Allocate resources for frequent indexing without slowing down queries Redundancy & High Availability: Use SolrCloud for distributed search and failover (requires more resources) Best Practices: Memory Allocation: Allocate ~50% of system RAM to Solr (max 64GB, but leave room for other apps) Disk I/O: Use SSD storage for better performance with large datasets Replication: Set up SolrCloud replication for load distribution and improved availability Sizing an Apache Solr Instance:
  • 12. Definition: A Solr schema is a configuration file that defines the structure of the data Solr indexes and searches It describes the fields, their types, how data is indexed, stored, and how queries are handled Key Components: Fields: Specifies the data attributes (e.g., title, content, date) Field Types: Defines the data type (e.g., string, text, integer, date) and how they are indexed Analyzers: Determines how text fields are processed (e.g., tokenization, stemming) Copy Fields: Allows combining multiple fields for efficient searching What is a Solr Schema?
  • 13. Prerequisites: File: managed-schema.xml Purpose: Defines field types and the structure of the Solr index Define Fields in managed-schema.xml: Setting Up Solr Schema <field name="title" type="string" indexed="true" stored="true"/> <field name="content" type="string" indexed="true" stored="true"/> <field name="id" type="string" indexed="true" stored="true"/> indexed="true": The field will be searchable stored="true": The field’s value will be stored and retrievable (useful for returning field values in search results)
  • 14. How to Create a Schema in the Solr Admin UI Access the Schema Tab: Open the Solr Admin UI (e.g., http://localhost:8983/solr), select the core you want to modify, and click on the Schema tab. Add Fields and Field Types: Under the Fields section, click Add Field to define the field name, type, and attributes like indexing. Optionally, add new Field Types under the Field Types section. Apply Changes: After making changes, go to the Core Admin tab and click Reload to apply your schema updates.
  • 15. Setting Up Solr Schema Key Field Types: text_general: Tokenized: Breaks text into individual words/tokens Supports text analysis: Useful for full-text search (e.g., search by keywords in a document) Impact on Performance: Pros: Optimized for search relevance and flexibility Cons: Requires more CPU and memory for indexing and querying due to text analysis and tokenization Impact on Index Size: Larger index size due to tokenization and additional metadata for search analysis
  • 16. Setting Up Solr Schema Key Field Types: string: Exact Matches: No tokenization, stores the entire string as a single value. Not tokenized: Ideal for fields that require exact matches (e.g., IDs, usernames). Impact on Performance: Pros: Faster for exact matches (e.g., filtering, sorting). Cons: Less flexible for full-text search operations. Impact on Index Size: Smaller index size compared to text_general, as no additional processing is required.
  • 17. Impact of Field Types on Performance and Index Size Performance Impact: text_general requires more processing (tokenization, text analysis), which can impact indexing speed and query response time string offers faster exact matching but is not useful for full-text search Index Size Impact: text_general creates larger indexes due to tokenization and additional metadata string typically results in smaller index sizes since it doesn’t require tokenization
  • 18. Optimizes Search Performance Ensures Accurate Data Representation Customizes Indexing Behavior Supports Complex Queries Scalability and Flexibility Benefits of Proper Solr Schema Configuration
  • 19. Common Solr Field Types and Their Use Cases Field Type Purpose Use Case text_general Tokenized, analyzed text Full-text search (e.g., articles) string Exact match, no analysis Identifiers (e.g., IDs, codes) text_en Tokenized, analyzed text with language- specific analysis (English) Full-text search for English language content (e.g., blog posts, product descriptions) int Integer values, no analysis Storing and querying numerical values like product prices, user ages, or ratings date Date and time values, no analysis Filtering or sorting by date fields, such as creation or modification
  • 20. Deciding on Field Types Application Use Case: Use text_general for title and content to enable full-text search Use string for id as an exact identifier Numerical Fields: Use int for whole numbers and float for decimal values (e.g., prices, ratings) Date Fields: Use date or pdate when you need to filter or range-query dates (e.g., transaction timestamps) Booleans: Use boolean for binary flags (e.g., active users)
  • 21. Install Dependencies: pip install “fastapi[standard]” pysolr Why FastAPI? This setup leverages FastAPI for fast, asynchronous web requests and PySolr to integrate Solr with Python, optimizing performance and ease of use Search App with FastAPI and PySolr Connecting to Solr Initialize Solr Client: import pysolr solr = pysolr.Solr('http://localhost:8983/solr/my_core', always_commit=True)
  • 22. Root Endpoint: Defines the root endpoint (/) with an HTML response using FastAPI Returns HTML content when the root URL is accessed, typically for landing pages or documentation. Asynchronous function ensures efficient handling of multiple requests concurrently Creating the API: Root Endpoint @app.get("/", response_class=HTMLResponse) async def read_root(): return """ <html>...</html> """
  • 23. Handles user queries with search term (query) and pagination (page). Uses Solr to fetch and return results. Dynamically generates search results in HTML. Essential for building interactive search features. Search Endpoint @app.get("/search", response_class=HTMLResponse) async def search(query: str = Query(...), page: int = Query(1)): # Solr query logic here return results_html
  • 24. Example Query: Query Parameters in Solr query_params = { "q": f"title:{query} content:{query}", "hl": "true", "start": start, "rows": results_per_page, } results = solr.search(**query_params) query_params defines fields, search term, highlighting, and pagination. solr.search(**query_params) fetches results from Solr. Proper structure ensures effective search and pagination.
  • 25. Basic Search Form: HTML UI for the Search Engine <form action="/search" method="get"> <input type="text" name="query" placeholder="Search..." required/> <input type="submit" value="Search"/> </form>
  • 26. Enable Highlighting: Highlighting Search Results "hl": "true", "hl.fl": "title,content", Example Highlighted Result: Highlighted text appears as <em>Highlighted Text</em> in the search results to show where matches occur. Highlighting improves the user experience by visually identifying search term matches in results.
  • 27. Handle multiple pages of results: prev_page = page - 1 if page > 1 else None (for previous page) next_page = page + 1 if len(results.docs) == results_per_page else None (for next page) Pagination prev_page = page - 1 if page > 1 else None next_page = page + 1 if len(results.docs) == results_per_page else None <a href="?page=prev_page">Previous</a> <a href="?page=next_page">Next</a> Navigation Links: <a href="?page=prev_page">Previous</a> (link to previous page) <a href="?page=next_page">Next</a> (link to next page)
  • 28. Benefits of Pagination Improved Performance: Loads smaller, manageable chunks of data, reducing server load and improving speed. Better User Experience: Allows users to easily navigate through large datasets without overwhelming them with too many results at once. Scalability: Handles large amounts of data efficiently, making it easier to scale applications without performance issues.
  • 29. Clear Solr Index: Data Cleanup Commands curl "http://localhost:8983/solr/search_core/update?commit=true" -d '<delete><query>*:*</query></delete>' Deletes all documents in the specified Solr core (search_core) with the query *:* commit=true ensures immediate changes Benefits: Clears outdated or irrelevant documents Prevents index bloat, improving search performance
  • 30. What is Solr Core? A Solr core is an independent instance that contains its own index, schema, and configuration, allowing multiple cores to run on a Solr server, each managing separate datasets. Explanation: Base URL: localhost:8983 (Solr server) Endpoint: /select (used for fetching search results) Parameters: indent=true: Formats the response for readability q.op=OR: Default operator for query terms is "OR" q=*%3A*: Retrieves all documents (query *:* encoded) useParams: For query parameters Solr Query URL
  • 31. Viewing Search Results The screenshot shows the Solr search query result as displayed in the browser.
  • 32. Conclusion Full text search is crucial for applications requiring precision and accuracy Apache Solr offers a reliable platform for implementing robust, full-text search functionality Integrating Solr with Python and FastAPI ensures flexibility, scalability, and ease of development

Editor's Notes

  • #8: 1. Prerequisites: Java (JDK 8 or higher): Apache Solr requires Java to run. Ensure you have Java Development Kit (JDK) 8 or a later version installed on your machine. You can check your Java version by running the command `java -version` in your terminal. If you need to install it, you can download it from the official Oracle or OpenJDK websites. Apache Solr Downloaded and Installed: You need to have Apache Solr installed on your machine. You can download the latest version of Solr from the official Apache Solr website: [https://solr.apache.org/downloads.html](https://solr.apache.org/downloads.html). Once downloaded, extract the files to a directory on your system where you want to run Solr. 2. Steps for Setup: Download and Extract Solr from the Official Solr Site: After downloading the Solr tarball or zip file, extract it to a folder on your machine. This will give you access to all Solr's files and binaries. The Solr directory will contain everything you need to run and configure Solr. Example command for extraction (on Linux or macOS): tar xvf solr-<version>.tgz Start Solr with bin/solr start: Solr includes a script that helps you start the Solr server. Navigate to the directory where Solr is extracted and run the following command to start the Solr instance: bin/solr start This will start the Solr server on the default port (8983). You should see Solr running, and it can be accessed via your browser at `http://localhost:8983/solr/`. If Solr starts successfully, you will be able to interact with the Solr Admin UI and monitor your cores. Create a Core (or Collection) with bin/solr create -c <core_name>: Solr organizes its data into "cores" or "collections." A core is a single index, and it can be thought of as a container for your data and schema. To create a new core, run the following command: bin/solr create -c <core_name> Replace <core_name> with the name you want to assign to your core. This will create a new core in the Solr instance and initialize it with default configurations. Once created, you can start indexing and querying data within this core. Example: bin/solr create -c mycore After the core is created, you can begin interacting with it through Solr’s APIs or the Admin UI to upload documents, run searches, and manage your Solr instance. These steps will set up a basic Solr instance on your local machine, which is the first step to start using Solr for full-text search. The next steps will involve configuring and indexing data, integrating it with Python (using `pysolr`), and running queries to test the setup.
  • #14: To create a schema in the Solr Admin UI, follow these steps: Access the Schema Tab: Open the Solr Admin UI (e.g., http://localhost:8983/solr) Select the core you want to modify. Click on the Schema tab in the top navigation bar. Add Fields: In the Fields section, click the Add Field button. Define the field's name, type, and other attributes (e.g., multi-valued, indexed). Click Add to save. Add Field Types (optional): In the Field Types section, click Add Field Type. Define the type name, class, and any necessary settings. Click Add to save. Reload the Core: Go to the Core Admin tab and click Reload to apply the schema changes. These steps allow you to modify and manage the schema using the Solr Admin UI.
  • #25: "HTML UI for the Search Engine" Basic Search Form: <form action="/search" method="get"> sends the search request to the /search endpoint. <input type="text" name="query" placeholder="Search..." required/> allows users to input their query. <input type="submit" value="Search"/> submits the form to trigger the search. This form provides a simple and effective UI for users to search within the application.
  • #28: Improved Performance: Loads smaller, manageable chunks of data, reducing server load and improving speed. Better User Experience: Allows users to easily navigate through large datasets without overwhelming them with too many results at once. Scalability: Handles large amounts of data efficiently, making it easier to scale applications without performance issues.
  • #30: What is Solr Core? A Solr core is a self-contained instance of Solr that holds its own index, schema, and configuration. Multiple cores can be run on the same Solr server, each managing a separate dataset. Cores allow for isolated configurations and indexing for different types of data. Query "q": "*:*" retrieves all documents in the index using a wildcard search The result returns 21 documents (numFound: 21), with fields like id, title, and content Documents include topics like Python, Django, FastAPI, Solr, etc "responseHeader" shows status (0 = success), query time (QTime: 59ms), and other search parameters