SlideShare a Scribd company logo
10
Most read
Information Retrieval : 1
Introduction to IR
Prof Neeraj Bhargava
Vaibhav Khanna
Department of Computer Science
School of Engineering and Systems Sciences
Maharshi Dayanand Saraswati University Ajmer
Learning objectives of IR Series
• Introduction: Motivation, Basic concepts, past, present, and future, the retrieval process.
•
• Modeling: Introduction, A taxonomy of information retrieval models, retrieval: ad hoc and filtering,
a formal characterization of IR models, classic information retrieval, alternative set theoretic
models, alternative algebraic models, alternative probabilistic models, structured text retrieval
models, models for browsing.
•
• Retrieval Evaluation: Introduction, retrieval performance evaluation, reference collections. query
• Languages: Introduction, keyword-based querying, Pattern matching, Structural queries, Query
protocols.
•
• Query Operations: Introduction, user relevance feedback, automatic local analysis, automatic global
analysis.
•
• Text and multimedia languages and Properties: Introduction, metadata, text, markup languages,
• Indexing and searching: Introduction; inverted files; other indices for text; Boolean queries;
sequential searching; pattern matching; structural queries; compression.
•
• Searching the Web: Introduction, challenges, characterizing the web, search engines, browsing,
meta searchers, finding the needle in the haystack, searching using hyperlinks.
Architecture of the IR System
Information Retrieval (IR)
• IR deals with the representation, storage,
organization of, and access to information items
• Types of information items: documents, Web
pages, online catalogs, structured records,
multimedia objects
• Early goals of the IR area: indexing text and
searching for useful documents in a collection
• Nowadays, research in IR includes:
– Modeling, Web search, text classification, systems
architecture, user interfaces, data visualization,
filtering and languages.
Early Developments
• For more than 5,000 years, man has organized
information for later retrieval and searching
• This has been done by compiling, storing, organizing,
and indexing papyrus, hieroglyphics, and books
• For holding the various items, special purpose buildings
called libraries, or bibliothekes, are used
• The oldest known library was created in Elba, in the
Fertile Crescent, between 3,000 and 2,500 BC
• Since the volume of information in libraries is always
growing, it is necessary to build specialized data
structures for fast search — the indexes
Libraries and Digital Libraries
• For centuries indexes have been created manually as sets
of categories, with labels associated with each category
• The advent of modern computers has allowed the
construction of large indexes automatically
• Libraries were among the first institutions to adopt IR
systems for retrieving information
• Initially, such systems consisted of an automation of
existing processes such as card catalogs searching
• Increased search functionality was then added
• Ex: subject headings, keywords, query operators
• Nowadays, the focus has been on improved graphical
interfaces, electronic forms, hypertext features
IR at the Center of the Stage
• Until recently, IR was an area of interest restricted
mainly to librarians and information experts
• A single fact changed these perceptions—the
introduction of the Web, which has become the largest
repository of knowledge in human history
• Due to its enormous size, finding useful information on
the Web usually requires running a search
• And searching on the Web is all about IR and its
technologies
• Thus, almost overnight, IR has gained a place with
other technologies at the center of the stage
The IR Problem
• The IR Problem
• The key goal of an IR system is to retrieve all
the items that are relevant to a user query,
while retrieving as few non relevant items as
possible
• That is, the IR system must rank the
information items according to a degree of
relevance to the user query
The User’s Task
• Consider a user who seeks information on a topic of their
interest
• This user first translates their information need into a
query, which requires specifying the words that compose
the query
• In this case, we say that the user is searching or querying
for information of their interest
• Consider now a user who has an interest that is either
poorly defined or inherently broad
• For instance, the user has an interest in car racing and
wants to browse documents on Formula 1 In this case, we
say that the user is browsing or navigating the documents
of the collection
The User’s Task
Information × Data Retrieval
• Data retrieval: the task of determining which
documents of a collection contain the keywords
in the user query Data retrieval system Ex:
relational databases
• Deals with data that has a well defined structure
and semantics
• A single erroneous object among a thousand
retrieved objects means total failure
• Data retrieval does not solve the problem of
retrieving information about a subject or topic

More Related Content

What's hot (20)

PPT
Reference Interview ppt by Arun Joseph
Arun Joseph (Librarian), MLISc, UGC NET
 
PDF
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
Daniel Katz
 
PPTX
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
ODP
DDC Number Building for shelf arrangement
Sreeja Ramachandran
 
PPTX
Ppt evaluation of information retrieval system
silambu111
 
PPTX
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
PPTX
Archival Science - Provenance, Original Order and Respect des Fonds
Roxanne Peña
 
PPT
Collection Analysis and Evaluation: Fundamentals of Collection-Centered Asse...
Philippine Association of Academic/Research Librarians
 
PDF
Information Retrieval Fundamentals - An introduction
Grace Hui Yang
 
PPTX
Cataloging with RDA: An Overview
Emily Nimsakont
 
PPTX
Data warehouse,data mining & Big Data
Ravinder Kamboj
 
PDF
Data mining 1 - Introduction (cheat sheet - printable)
yesheeka
 
PPTX
Probabilistic retrieval model
baradhimarch81
 
PPT
FRBR
Thomas Meehan
 
PPTX
Group 3 Final Presentation
Constance Standish
 
PPT
Electronic Resource Management
catherin preethi
 
PPTX
Fundamentals of e-resource licensing
NASIG
 
PDF
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
`Shweta Bhavsar
 
PPTX
Information retrieval (introduction)
Primya Tamil
 
Reference Interview ppt by Arun Joseph
Arun Joseph (Librarian), MLISc, UGC NET
 
Legal Analytics versus Empirical Legal Studies - or - Causal Inference vs Pre...
Daniel Katz
 
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
DDC Number Building for shelf arrangement
Sreeja Ramachandran
 
Ppt evaluation of information retrieval system
silambu111
 
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
Archival Science - Provenance, Original Order and Respect des Fonds
Roxanne Peña
 
Collection Analysis and Evaluation: Fundamentals of Collection-Centered Asse...
Philippine Association of Academic/Research Librarians
 
Information Retrieval Fundamentals - An introduction
Grace Hui Yang
 
Cataloging with RDA: An Overview
Emily Nimsakont
 
Data warehouse,data mining & Big Data
Ravinder Kamboj
 
Data mining 1 - Introduction (cheat sheet - printable)
yesheeka
 
Probabilistic retrieval model
baradhimarch81
 
Group 3 Final Presentation
Constance Standish
 
Electronic Resource Management
catherin preethi
 
Fundamentals of e-resource licensing
NASIG
 
WHAT IS DIGITAL PRESERVATION? DISCUSS ITS SIGNIFICANCE IN TODAY’S INFORMATIO...
`Shweta Bhavsar
 
Information retrieval (introduction)
Primya Tamil
 

Similar to Information retrieval 1 introduction to ir (20)

PPTX
IRT Unit_I.pptx
thenmozhip8
 
DOCX
unit 1 INTRODUCTION
karthiksmart21
 
PDF
CS8080_IRT__UNIT_I_NOTES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
PDF
CS8080 IRT UNIT I NOTES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
PPTX
Information retrieval introduction
nimmyjans4
 
PPTX
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
shumawakjira26
 
PPTX
Ir 01
Mohammed Romi
 
PPTX
Informationa Retrieval Techniques .pptx
lekhacce
 
PPT
Unit 1
karthiksmart21
 
PPTX
JM Information Retrieval Techniques Unit I
JeyamohanHAsstProfCS
 
PDF
slides_chap01.pdf
lekhacce
 
PPTX
Introduction to Information Retrieval (concepts and principles)
ImtithalSaeed1
 
PPTX
Chap1
Shahriar Rafee
 
PPTX
Informationa Retrieval Techniques .pptx.pptx
lekhacce
 
PDF
Chapter 1: Introduction to Information Storage and Retrieval
captainmactavish1996
 
PDF
A Simple Information Retrieval Technique
idescitation
 
PDF
A survey on various architectures, models and methodologies for information r...
IAEME Publication
 
PPTX
Chapter 1 Intro Information Rerieval.pptx
bekidea
 
PPTX
Chap 1 general introduction of information retrieval
Malobe Lottin Cyrille Marcel
 
PPTX
Introduction.pptx
Mahsadelavari
 
IRT Unit_I.pptx
thenmozhip8
 
unit 1 INTRODUCTION
karthiksmart21
 
CS8080_IRT__UNIT_I_NOTES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Information retrieval introduction
nimmyjans4
 
Chapter 1 - Introduction to IR Information retrieval ch1 Information retrieva...
shumawakjira26
 
Informationa Retrieval Techniques .pptx
lekhacce
 
JM Information Retrieval Techniques Unit I
JeyamohanHAsstProfCS
 
slides_chap01.pdf
lekhacce
 
Introduction to Information Retrieval (concepts and principles)
ImtithalSaeed1
 
Informationa Retrieval Techniques .pptx.pptx
lekhacce
 
Chapter 1: Introduction to Information Storage and Retrieval
captainmactavish1996
 
A Simple Information Retrieval Technique
idescitation
 
A survey on various architectures, models and methodologies for information r...
IAEME Publication
 
Chapter 1 Intro Information Rerieval.pptx
bekidea
 
Chap 1 general introduction of information retrieval
Malobe Lottin Cyrille Marcel
 
Introduction.pptx
Mahsadelavari
 
Ad

More from Vaibhav Khanna (20)

PPTX
Information and network security 47 authentication applications
Vaibhav Khanna
 
PPTX
Information and network security 46 digital signature algorithm
Vaibhav Khanna
 
PPTX
Information and network security 45 digital signature standard
Vaibhav Khanna
 
PPTX
Information and network security 44 direct digital signatures
Vaibhav Khanna
 
PPTX
Information and network security 43 digital signatures
Vaibhav Khanna
 
PPTX
Information and network security 42 security of message authentication code
Vaibhav Khanna
 
PPTX
Information and network security 41 message authentication code
Vaibhav Khanna
 
PPTX
Information and network security 40 sha3 secure hash algorithm
Vaibhav Khanna
 
PPTX
Information and network security 39 secure hash algorithm
Vaibhav Khanna
 
PPTX
Information and network security 38 birthday attacks and security of hash fun...
Vaibhav Khanna
 
PPTX
Information and network security 37 hash functions and message authentication
Vaibhav Khanna
 
PPTX
Information and network security 35 the chinese remainder theorem
Vaibhav Khanna
 
PPTX
Information and network security 34 primality
Vaibhav Khanna
 
PPTX
Information and network security 33 rsa algorithm
Vaibhav Khanna
 
PPTX
Information and network security 32 principles of public key cryptosystems
Vaibhav Khanna
 
PPTX
Information and network security 31 public key cryptography
Vaibhav Khanna
 
PPTX
Information and network security 30 random numbers
Vaibhav Khanna
 
PPTX
Information and network security 29 international data encryption algorithm
Vaibhav Khanna
 
PPTX
Information and network security 28 blowfish
Vaibhav Khanna
 
PPTX
Information and network security 27 triple des
Vaibhav Khanna
 
Information and network security 47 authentication applications
Vaibhav Khanna
 
Information and network security 46 digital signature algorithm
Vaibhav Khanna
 
Information and network security 45 digital signature standard
Vaibhav Khanna
 
Information and network security 44 direct digital signatures
Vaibhav Khanna
 
Information and network security 43 digital signatures
Vaibhav Khanna
 
Information and network security 42 security of message authentication code
Vaibhav Khanna
 
Information and network security 41 message authentication code
Vaibhav Khanna
 
Information and network security 40 sha3 secure hash algorithm
Vaibhav Khanna
 
Information and network security 39 secure hash algorithm
Vaibhav Khanna
 
Information and network security 38 birthday attacks and security of hash fun...
Vaibhav Khanna
 
Information and network security 37 hash functions and message authentication
Vaibhav Khanna
 
Information and network security 35 the chinese remainder theorem
Vaibhav Khanna
 
Information and network security 34 primality
Vaibhav Khanna
 
Information and network security 33 rsa algorithm
Vaibhav Khanna
 
Information and network security 32 principles of public key cryptosystems
Vaibhav Khanna
 
Information and network security 31 public key cryptography
Vaibhav Khanna
 
Information and network security 30 random numbers
Vaibhav Khanna
 
Information and network security 29 international data encryption algorithm
Vaibhav Khanna
 
Information and network security 28 blowfish
Vaibhav Khanna
 
Information and network security 27 triple des
Vaibhav Khanna
 
Ad

Recently uploaded (20)

PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 

Information retrieval 1 introduction to ir

  • 1. Information Retrieval : 1 Introduction to IR Prof Neeraj Bhargava Vaibhav Khanna Department of Computer Science School of Engineering and Systems Sciences Maharshi Dayanand Saraswati University Ajmer
  • 2. Learning objectives of IR Series • Introduction: Motivation, Basic concepts, past, present, and future, the retrieval process. • • Modeling: Introduction, A taxonomy of information retrieval models, retrieval: ad hoc and filtering, a formal characterization of IR models, classic information retrieval, alternative set theoretic models, alternative algebraic models, alternative probabilistic models, structured text retrieval models, models for browsing. • • Retrieval Evaluation: Introduction, retrieval performance evaluation, reference collections. query • Languages: Introduction, keyword-based querying, Pattern matching, Structural queries, Query protocols. • • Query Operations: Introduction, user relevance feedback, automatic local analysis, automatic global analysis. • • Text and multimedia languages and Properties: Introduction, metadata, text, markup languages, • Indexing and searching: Introduction; inverted files; other indices for text; Boolean queries; sequential searching; pattern matching; structural queries; compression. • • Searching the Web: Introduction, challenges, characterizing the web, search engines, browsing, meta searchers, finding the needle in the haystack, searching using hyperlinks.
  • 4. Information Retrieval (IR) • IR deals with the representation, storage, organization of, and access to information items • Types of information items: documents, Web pages, online catalogs, structured records, multimedia objects • Early goals of the IR area: indexing text and searching for useful documents in a collection • Nowadays, research in IR includes: – Modeling, Web search, text classification, systems architecture, user interfaces, data visualization, filtering and languages.
  • 5. Early Developments • For more than 5,000 years, man has organized information for later retrieval and searching • This has been done by compiling, storing, organizing, and indexing papyrus, hieroglyphics, and books • For holding the various items, special purpose buildings called libraries, or bibliothekes, are used • The oldest known library was created in Elba, in the Fertile Crescent, between 3,000 and 2,500 BC • Since the volume of information in libraries is always growing, it is necessary to build specialized data structures for fast search — the indexes
  • 6. Libraries and Digital Libraries • For centuries indexes have been created manually as sets of categories, with labels associated with each category • The advent of modern computers has allowed the construction of large indexes automatically • Libraries were among the first institutions to adopt IR systems for retrieving information • Initially, such systems consisted of an automation of existing processes such as card catalogs searching • Increased search functionality was then added • Ex: subject headings, keywords, query operators • Nowadays, the focus has been on improved graphical interfaces, electronic forms, hypertext features
  • 7. IR at the Center of the Stage • Until recently, IR was an area of interest restricted mainly to librarians and information experts • A single fact changed these perceptions—the introduction of the Web, which has become the largest repository of knowledge in human history • Due to its enormous size, finding useful information on the Web usually requires running a search • And searching on the Web is all about IR and its technologies • Thus, almost overnight, IR has gained a place with other technologies at the center of the stage
  • 8. The IR Problem • The IR Problem • The key goal of an IR system is to retrieve all the items that are relevant to a user query, while retrieving as few non relevant items as possible • That is, the IR system must rank the information items according to a degree of relevance to the user query
  • 9. The User’s Task • Consider a user who seeks information on a topic of their interest • This user first translates their information need into a query, which requires specifying the words that compose the query • In this case, we say that the user is searching or querying for information of their interest • Consider now a user who has an interest that is either poorly defined or inherently broad • For instance, the user has an interest in car racing and wants to browse documents on Formula 1 In this case, we say that the user is browsing or navigating the documents of the collection
  • 11. Information × Data Retrieval • Data retrieval: the task of determining which documents of a collection contain the keywords in the user query Data retrieval system Ex: relational databases • Deals with data that has a well defined structure and semantics • A single erroneous object among a thousand retrieved objects means total failure • Data retrieval does not solve the problem of retrieving information about a subject or topic