SlideShare a Scribd company logo
2
Most read
3
Most read
9
Most read
BY
N. SUMANJALI
DPT OF LIS
PONDICHERRY UNIVERSITY
INFORMATION RETRIEVAL
 Information retrieval is the activity of obtaining
information resources relevant to an information need
from a collection of information resources.
 Searches can be based on metadata or on full-text (or
other content-based) indexing.
 Goal: Find the documents most relevant to a certain Query
 Dealing with notions of:
 Collection of documents
 Query (User’s information need)
 Notion of Relevancy
MODEL
 A model is a construct designed help us understand a
complex system
 A particular way of “looking at things”
 Models inevitably make simplifying assumptions
 What are the limitations of the model?
 Different types of models:
 Conceptual models
 Physical analog models
 Mathematical models
Retrieval Models
A retrieval model specifies the details
of:
 Document representation
 Query representation
 Retrieval function
Determines a notion of relevance.
Notion of relevance can be binary or
continuous (i.e. ranked retrieval).
CLASSES OF RM
Boolean models (set theoretic)
 Extended Boolean
Vector space models
(statistical/algebraic)
 Generalized VS
 Latent Semantic Indexing
Probabilistic models
MODELS OF IR
 Boolean model
 Based on the notion of sets
 Documents are retrieved only if they satisfy Boolean
conditions specified in the query
 Does not impose a ranking on retrieved documents
 Exact match
 Vector space model
 Based on geometry, the notion of vectors in high dimensional
space
 Documents are ranked based on their similarity to the query
(ranked retrieval)
 Best/partial match
 Language models
 Based on the notion of probabilities and processes for
generating text
 Documents are ranked based on the probability that
they generated the query
 Best/partial match
BOOLEAN MODEL
 Invented by George Boole (1815-1864)
 He devised a system of symbolic logic in which he used
three operators (+, , - ) to combine statements in
symbolic form.
 John Venn named to this operators of Boolean logic
are the logical sum(+), logical product(), and logical
difference(-).
 IR systems allow the users to express their queries by
using this operators.
BOOLEAN MODEL
 Each index term is either present or absent
 Documents are either Relevant or Not Relevant(no
ranking)
 A document is represented as a set of keywords.
 Queries are Boolean expressions of
keywords, connected by AND, OR, and
NOT, including the use of brackets to indicate scope.
 [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]
 Output: Document is relevant or not. No partial
matches or ranking.
BOOLEAN RETRIEVAL MODEL
 Popular retrieval model because:
 Easy to understand for simple queries.
 Clean formalism.
 Boolean models can be extended to include ranking.
 Reasonably efficient implementations possible for
normal queries.
BOOLEAN MODEL
 Weights assigned to terms are either “0” or “1”
 “0” represents “absence”: term isn’t in the document
 “1” represents “presence”: term is in the document
 Build queries by combining terms with Boolean
operators
 AND, OR, NOT
 The system returns all documents that satisfy the
query
AND/OR/NOT
A B
C
Why Boolean Retrieval Works
 Boolean operators approximate natural language
 Find documents about a good party that is not over
 AND can discover relationships between concepts
 good party
 OR can discover alternate terminology
 excellent party, wild party, etc.
 NOT can discover alternate meanings
 Democratic party
The Perfect Query Paradox
 Every information need has a perfect set of documents
 If not, there would be no sense doing retrieval
 Every document set has a perfect query
 AND every word in a document to get a query for it
 Repeat for each document in the set
 OR every document query to get the set query
 But can users realistically be expected to formulate this
perfect query?
 Boolean query formulation is hard!
Why Boolean Retrieval Fails
• Natural language is way more complex
• AND “discovers” nonexistent relationships
– Terms in different sentences, paragraphs, …
• Guessing terminology for OR is hard
– good, nice, excellent, outstanding, awesome, …
• Guessing terms to exclude is even harder!
– Democratic party, party to a lawsuit, …
BOOLEAN MODEL
 Strengths
 Precise, if you know the right strategies
 Precise, if you have an idea of what you’re looking for
 Efficient for the computer
 Simple
 Weaknesses
 Users must learn Boolean logic
 Boolean logic insufficient to capture the richness of language
 No control over size of result set: either too many documents or none
 When do you stop reading? All documents in the result set are
considered “equally good”
 What about partial matches? Documents that “don’t quite match” the
query may be useful also
 No notion of ranking (exact matching only)
 All index terms have equal weight
PROBLEMS
 Very rigid: AND means all; OR means any.
 Difficult to express complex user requests.
 Difficult to control the number of documents retrieved.
 All matched documents will be returned.
 Difficult to rank output.
 All matched documents logically satisfy the query.
 Difficult to perform relevance feedback.
 If a document is identified by the user as relevant or
irrelevant, how should the query be modified?
ADVANTAGES & DISADVANTAGES
 Advantages
 Results are predictable, relatively easy to explain
 Many different features can be incorporated
 Efficient processing since many documents can be
eliminated from search
 Disadvantages
 Effectiveness depends entirely on user
 Simple queries usually don’t work well
 Complex queries are difficult.
LIMITATIONS
 The first relates to the formulation of search statements.
 It has been noted that users are not able to formulate an exact search
statement by the combination of AND, OR and NOT operators,
especially when several query terms are involved.
 In such cases either the search statement becomes too narrow or too
broad.
 The second limitation relates to the number of retrieval items.
 It has been noted that users cannot predict a priori exactly how many
items are to be retrieved to satisfy a given query.
 If the search statement is broad, the number of retrieved items may
sometimes be several hundreds and thus it may be quite difficult to
find out the exact information required.
 The third limitation is that it identifies an item as relevant by finding
out whether a given query term is present or not in a given record in the
database.
Model  of information retrieval (3)

More Related Content

PPT
Information Retrieval Models
Nisha Arankandath
 
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
PPTX
Vector space model of information retrieval
Nanthini Dominique
 
PPTX
Information retrieval introduction
nimmyjans4
 
PPTX
Information retrieval (introduction)
Primya Tamil
 
PPT
Inverted index
Krishna Gehlot
 
PPTX
Probabilistic information retrieval models & systems
Selman BozkÄąr
 
PPT
6&7-Query Languages & Operations.ppt
BereketAraya
 
Information Retrieval Models
Nisha Arankandath
 
Information_Retrieval_Models_Nfaoui_El_Habib
El Habib NFAOUI
 
Vector space model of information retrieval
Nanthini Dominique
 
Information retrieval introduction
nimmyjans4
 
Information retrieval (introduction)
Primya Tamil
 
Inverted index
Krishna Gehlot
 
Probabilistic information retrieval models & systems
Selman BozkÄąr
 
6&7-Query Languages & Operations.ppt
BereketAraya
 

What's hot (20)

PPTX
Ppt evaluation of information retrieval system
silambu111
 
PPTX
Information retrieval s
silambu111
 
PPT
Data Mining and Its Application in Library and Information Science
Rishi Bankim Chandra Evening College, Naihati, North 24 Parganas, West Bengal, India
 
PDF
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
PPTX
Introduction to Information Retrieval
Roi Blanco
 
PPTX
Boolean,vector space retrieval Models
Primya Tamil
 
PPT
Common communication format
avid
 
PPTX
Informatio retrival evaluation
NidhirBiswas
 
PPTX
Automatic indexing
dhatchayaninandu
 
PPTX
Precis
silambu111
 
PDF
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
PPTX
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
PPTX
Evolution of Digital Libraries
Dept of Library and Information Science Tumkur University
 
DOCX
key word indexing and their types with example
Sourav Sarkar
 
PDF
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
PPTX
Functions of information retrival system(1)
silambu111
 
PPTX
Post coordinate indexing .. Library and information science
harshaec
 
PPTX
POPSI
silambu111
 
PPTX
Probabilistic retrieval model
baradhimarch81
 
PDF
Information storage and retrieval
Dr. Utpal Das
 
Ppt evaluation of information retrieval system
silambu111
 
Information retrieval s
silambu111
 
Data Mining and Its Application in Library and Information Science
Rishi Bankim Chandra Evening College, Naihati, North 24 Parganas, West Bengal, India
 
Introduction to Information Retrieval & Models
Mounia Lalmas-Roelleke
 
Introduction to Information Retrieval
Roi Blanco
 
Boolean,vector space retrieval Models
Primya Tamil
 
Common communication format
avid
 
Informatio retrival evaluation
NidhirBiswas
 
Automatic indexing
dhatchayaninandu
 
Precis
silambu111
 
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
Sai Kumar Ale
 
key word indexing and their types with example
Sourav Sarkar
 
CS8080 INFORMATION RETRIEVAL TECHNIQUES - IRT - UNIT - I PPT IN PDF
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Functions of information retrival system(1)
silambu111
 
Post coordinate indexing .. Library and information science
harshaec
 
POPSI
silambu111
 
Probabilistic retrieval model
baradhimarch81
 
Information storage and retrieval
Dr. Utpal Das
 
Ad

Viewers also liked (20)

PPTX
Information storage and retrieval
Sadaf Rafiq
 
PDF
Copyright issues in a library digital environment
Fe Angela Verzosa
 
PPTX
Boolean Matching in Logic Synthesis
Iffat Anjum
 
PDF
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille UniversitĂŠ / CNRS (LIS, INS2I)
 
PPT
Ir models
Ambreen Angel
 
PPTX
Introduction to Information Retrieval
Carsten Eickhoff
 
PPTX
SemTech 2011 Semantic Search tutorial
Peter Mika
 
PPT
Models for Information Retrieval and Recommendation
Arjen de Vries
 
PPT
Planning and Implementing a Digital Library Project
Jenn Riley
 
PDF
E-RESOURCES
jasminshamnad
 
PPTX
Proofreading and Editing
Molly Amell
 
PDF
Information Retrieval Models Part I
Ingo Frommholz
 
PPTX
Query formulation process
malathimurugan
 
PPTX
Ir for it&ites
Punam Jagtap
 
PPTX
Information Consolidation
Kishor Sakariya
 
PDF
Tutorial 1 (information retrieval basics)
Kira
 
PDF
Proof reading, editing and revising by sohail ahmed
Sohail Ahmed Solangi
 
PPTX
Editing ppt
awatkin
 
PPTX
RTOS- Real Time Operating Systems
Bayar shahab
 
PDF
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Information storage and retrieval
Sadaf Rafiq
 
Copyright issues in a library digital environment
Fe Angela Verzosa
 
Boolean Matching in Logic Synthesis
Iffat Anjum
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille UniversitĂŠ / CNRS (LIS, INS2I)
 
Ir models
Ambreen Angel
 
Introduction to Information Retrieval
Carsten Eickhoff
 
SemTech 2011 Semantic Search tutorial
Peter Mika
 
Models for Information Retrieval and Recommendation
Arjen de Vries
 
Planning and Implementing a Digital Library Project
Jenn Riley
 
E-RESOURCES
jasminshamnad
 
Proofreading and Editing
Molly Amell
 
Information Retrieval Models Part I
Ingo Frommholz
 
Query formulation process
malathimurugan
 
Ir for it&ites
Punam Jagtap
 
Information Consolidation
Kishor Sakariya
 
Tutorial 1 (information retrieval basics)
Kira
 
Proof reading, editing and revising by sohail ahmed
Sohail Ahmed Solangi
 
Editing ppt
awatkin
 
RTOS- Real Time Operating Systems
Bayar shahab
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Ad

Similar to Model of information retrieval (3) (20)

PPTX
Information retrieval 7 boolean model
Vaibhav Khanna
 
PPTX
Ir 02
Mohammed Romi
 
PPTX
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
PPTX
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 
PPTX
01 IRS-1 (1) document upload the link to
tiggu56
 
PPTX
01 IRS to upload the data according to the.pptx
tiggu56
 
PDF
Chapter 6 Query Language .pdf
Habtamu100
 
PPTX
Introduction to Information Retrieval (concepts and principles)
ImtithalSaeed1
 
PDF
Information Retrieval Fundamentals - An introduction
Grace Hui Yang
 
PPT
4-IR Models_new.ppt
BereketAraya
 
PPT
4-IR Models_new.ppt
BereketAraya
 
PDF
Information Retrieval and Map-Reduce Implementations
Jason J Pulikkottil
 
PPTX
The comparative study of information retrieval models used in search engines
fawad khan
 
PPT
lecture1-intro.ppt
WrushabhShirsat3
 
PPT
lecture1-intro.pptbbbbbbbbbbbbbbbbbbbbbbbbbb
RAtna29
 
PPT
lecture1-intro.ppt
IshaXogaha
 
PDF
Chapter 1: Introduction to Information Storage and Retrieval
captainmactavish1996
 
PPT
introduction into IR
ssusere3b1a2
 
PPTX
Boolean IR and Indexing.pptx
Mahsadelavari
 
PDF
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf
siddiquitanveer1
 
Information retrieval 7 boolean model
Vaibhav Khanna
 
Ir 02
Mohammed Romi
 
Information retrival system and PageRank algorithm
Rupali Bhatnagar
 
Document ranking using qprp with concept of multi dimensional subspace
Prakash Dubey
 
01 IRS-1 (1) document upload the link to
tiggu56
 
01 IRS to upload the data according to the.pptx
tiggu56
 
Chapter 6 Query Language .pdf
Habtamu100
 
Introduction to Information Retrieval (concepts and principles)
ImtithalSaeed1
 
Information Retrieval Fundamentals - An introduction
Grace Hui Yang
 
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
BereketAraya
 
Information Retrieval and Map-Reduce Implementations
Jason J Pulikkottil
 
The comparative study of information retrieval models used in search engines
fawad khan
 
lecture1-intro.ppt
WrushabhShirsat3
 
lecture1-intro.pptbbbbbbbbbbbbbbbbbbbbbbbbbb
RAtna29
 
lecture1-intro.ppt
IshaXogaha
 
Chapter 1: Introduction to Information Storage and Retrieval
captainmactavish1996
 
introduction into IR
ssusere3b1a2
 
Boolean IR and Indexing.pptx
Mahsadelavari
 
ICDIM 06 Web IR Tutorial [Compatibility Mode].pdf
siddiquitanveer1
 

Recently uploaded (20)

PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Software Development Methodologies in 2025
KodekX
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Doc9.....................................
SofiaCollazos
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 

Model of information retrieval (3)

  • 1. BY N. SUMANJALI DPT OF LIS PONDICHERRY UNIVERSITY
  • 2. INFORMATION RETRIEVAL  Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.  Searches can be based on metadata or on full-text (or other content-based) indexing.  Goal: Find the documents most relevant to a certain Query  Dealing with notions of:  Collection of documents  Query (User’s information need)  Notion of Relevancy
  • 3. MODEL  A model is a construct designed help us understand a complex system  A particular way of “looking at things”  Models inevitably make simplifying assumptions  What are the limitations of the model?  Different types of models:  Conceptual models  Physical analog models  Mathematical models
  • 4. Retrieval Models A retrieval model specifies the details of:  Document representation  Query representation  Retrieval function Determines a notion of relevance. Notion of relevance can be binary or continuous (i.e. ranked retrieval).
  • 5. CLASSES OF RM Boolean models (set theoretic)  Extended Boolean Vector space models (statistical/algebraic)  Generalized VS  Latent Semantic Indexing Probabilistic models
  • 6. MODELS OF IR  Boolean model  Based on the notion of sets  Documents are retrieved only if they satisfy Boolean conditions specified in the query  Does not impose a ranking on retrieved documents  Exact match  Vector space model  Based on geometry, the notion of vectors in high dimensional space  Documents are ranked based on their similarity to the query (ranked retrieval)  Best/partial match
  • 7.  Language models  Based on the notion of probabilities and processes for generating text  Documents are ranked based on the probability that they generated the query  Best/partial match
  • 8. BOOLEAN MODEL  Invented by George Boole (1815-1864)  He devised a system of symbolic logic in which he used three operators (+, , - ) to combine statements in symbolic form.  John Venn named to this operators of Boolean logic are the logical sum(+), logical product(), and logical difference(-).  IR systems allow the users to express their queries by using this operators.
  • 9. BOOLEAN MODEL  Each index term is either present or absent  Documents are either Relevant or Not Relevant(no ranking)  A document is represented as a set of keywords.  Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including the use of brackets to indicate scope.  [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]  Output: Document is relevant or not. No partial matches or ranking.
  • 10. BOOLEAN RETRIEVAL MODEL  Popular retrieval model because:  Easy to understand for simple queries.  Clean formalism.  Boolean models can be extended to include ranking.  Reasonably efficient implementations possible for normal queries.
  • 11. BOOLEAN MODEL  Weights assigned to terms are either “0” or “1”  “0” represents “absence”: term isn’t in the document  “1” represents “presence”: term is in the document  Build queries by combining terms with Boolean operators  AND, OR, NOT  The system returns all documents that satisfy the query
  • 13. Why Boolean Retrieval Works  Boolean operators approximate natural language  Find documents about a good party that is not over  AND can discover relationships between concepts  good party  OR can discover alternate terminology  excellent party, wild party, etc.  NOT can discover alternate meanings  Democratic party
  • 14. The Perfect Query Paradox  Every information need has a perfect set of documents  If not, there would be no sense doing retrieval  Every document set has a perfect query  AND every word in a document to get a query for it  Repeat for each document in the set  OR every document query to get the set query  But can users realistically be expected to formulate this perfect query?  Boolean query formulation is hard!
  • 15. Why Boolean Retrieval Fails • Natural language is way more complex • AND “discovers” nonexistent relationships – Terms in different sentences, paragraphs, … • Guessing terminology for OR is hard – good, nice, excellent, outstanding, awesome, … • Guessing terms to exclude is even harder! – Democratic party, party to a lawsuit, …
  • 16. BOOLEAN MODEL  Strengths  Precise, if you know the right strategies  Precise, if you have an idea of what you’re looking for  Efficient for the computer  Simple  Weaknesses  Users must learn Boolean logic  Boolean logic insufficient to capture the richness of language  No control over size of result set: either too many documents or none  When do you stop reading? All documents in the result set are considered “equally good”  What about partial matches? Documents that “don’t quite match” the query may be useful also  No notion of ranking (exact matching only)  All index terms have equal weight
  • 17. PROBLEMS  Very rigid: AND means all; OR means any.  Difficult to express complex user requests.  Difficult to control the number of documents retrieved.  All matched documents will be returned.  Difficult to rank output.  All matched documents logically satisfy the query.  Difficult to perform relevance feedback.  If a document is identified by the user as relevant or irrelevant, how should the query be modified?
  • 18. ADVANTAGES & DISADVANTAGES  Advantages  Results are predictable, relatively easy to explain  Many different features can be incorporated  Efficient processing since many documents can be eliminated from search  Disadvantages  Effectiveness depends entirely on user  Simple queries usually don’t work well  Complex queries are difficult.
  • 19. LIMITATIONS  The first relates to the formulation of search statements.  It has been noted that users are not able to formulate an exact search statement by the combination of AND, OR and NOT operators, especially when several query terms are involved.  In such cases either the search statement becomes too narrow or too broad.  The second limitation relates to the number of retrieval items.  It has been noted that users cannot predict a priori exactly how many items are to be retrieved to satisfy a given query.  If the search statement is broad, the number of retrieved items may sometimes be several hundreds and thus it may be quite difficult to find out the exact information required.  The third limitation is that it identifies an item as relevant by finding out whether a given query term is present or not in a given record in the database.