SlideShare a Scribd company logo
Review: “A taxonomy of web search” by Audrei Broder, IBM Research
Bhavesh Singh
2010CS50281
1. Summary
1.1 Motivation
There are three stages in the evolution of web search engines. The very first generation
is too close to classic IR (Information Retrieval). A central tenet of classical information
retrieval is that the user is driven by an “information need” which is defined as “the perceived
need for information that leads to someone using an information retrieval system in the first
place”. But the intent behind a web search is not information, it might be navigational (show
me the URL of the site I want to reach) or transactional (show me the sites where I can
perform a certain transaction, e.g. shop, download a file, or find a map). So the main aim of
this paper was to point out this difference and introduce and analyze the classification of web
searches. The development of search engines has significant relation with these analysis.

Fig. The classic model of IR, augmented for the web.
1|Page
1.2 Contribution
Their contributions are as follows 



Classified web queries according to the intent into three classes:
1. Navigational: The immediate intent is to reach a particular site that the user has
in mind, either because they visited it in the past or because they assume that
such a site exists.
2. Informational: The intent is to acquire some information assumed to be present
on one or more web pages in a static form. By static form, they mean that the
target document is not created in response to the user query.
3. Transactional: The intent is to perform some web-mediated activity, i.e. the user
wants to reach a site where further interaction will happen. This interaction
constitutes the transaction defining these queries. The main categories for such
queries are shopping, finding various web-mediated services, downloading
various type of file etc.
In view of the above taxonomy(classification), they identified three stages in the evolution
of we search engines:
1. First generation: those uses mostly on-page data (text and formatting) and is very
close to classic IR. Supports mostly informational queries.
2. Second generation: those uses off-page, web specific data such as link analysis,
anchor-text and click-through data. This generation supports both informational
and navigational queries and started in 1998-1999. Google was the first to use link
analysis as a primary ranking factor. By now, all major engines use all these types
of data.
3. Third generation: This is the emerging generation for the author. It involves
attempt to blend data from multiple sources in order to try to answer “the need
behind the query”. For instance on a query like ‘San Francisco’ the engine might
present direct links to a hotel reservation page for San Francisco, a map server, a
weather server, etc. Thus third generation engines go beyond the limitation of a
fixed corpus, via semantic analysis, context determination, dynamic data base
selection, etc. The aim is to support informational, navigational, and transactional
queries. This is a rapidly changing landscape.

1.3 Methodology
Author used two methods to determine the prevalence of various types of queries: a survey
of Alta Vista users, and an analysis of the query log at Alta Vista.


User Survey: The survey window was presented to random users and achieved a response
ratio of about 10%. The data discussed there was collected between June 26 and
2|Page
November 3, 2001 and consisted of 3190 valid returns. The survey questions relevant to
this paper and its result were as followsQ2. Which of the following describes best what you are trying to do?
24.53% I want to get to a specific website that I already have in mind
68.41% I want a good site on this topic, but I don’t have a specific site in mind
Q3. Which of the following best describes why you conducted this search?
8.16% I am shopping for something to buy on the Internet
5.46% I am shopping for something to buy elsewhere than on the Internet
22.55% I want to download a file (e.g., music, images, programs, etc.)
57.19% None of these reasons
Q4. Which of the following describes best what you are looking for?
14.83% A site which is a collection of links to other sites regarding this topic
76.62% The best site regarding this topic
Q2 was used to distinguish between navigational and non-navigational queries. The
percentage of queries identified as navigational was 24.5%, non-navigational queries
accounted for 68.4%, and 7.1% of the surveys did not answer Q1. Thus among
respondents to Q2, the percentage of navigational queries was 26.4%.
The survey had some additional questions, in particular Q7 was ‘In your own words, please
describe the exact piece of information you are seeking.’ Based on sample of 200 queries,
the query text, and the explanation provided at Q7, we estimate however that the
number of transactional queries among survey respondents is about 36%.
Queries that are neither transactional, nor navigational, are assumed to be informational.


Log analysis: They selected at random a set of 1000 queries from the daily AltaVista log.
From this set we removed non-English queries and sexually oriented queries. (The latter
being about 10% of the English queries). From the remaining set the first 400 queries were
inspected. Queries that were neither transactional, nor navigational, were assumed to be
informational in intent.
Type of query
Navigational
Informational
Transactional

User Survey
24.5%
?? (estimated 39%)
> 22% (estimated 36%)

Query Log Analysis
20%
48%
30%

3|Page
1.4 Conclusion
 An understanding of this classification based on ‘the need behind the query’ is essential
to the development of successful web search.
 The search engines of that time (when the paper was written) only deal well with
informational and navigational queries, but transactional queries were satisfied only
indirectly and hence a third generation in search engines began to emerge.
 According to author, the main aim of the third generation was to deal efficiently with
transactional queries mostly via semantic analyses (understanding what the query is about) and
blending of various external databases.

4|Page

More Related Content

PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
PDF
Social Data Mining
Mahesh Meniya
 
PDF
Personalization of the Web Search
IJMER
 
DOC
Introduction abstract
Sanghvi Innovative Academy
 
PDF
[IJCT-V3I2P30] Authors: Sunny Sharma
IJET - International Journal of Engineering and Techniques
 
PDF
A Review: Text Classification on Social Media Data
IOSR Journals
 
PDF
Www04 -rose
Shirisha Devarakonda
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
Social Data Mining
Mahesh Meniya
 
Personalization of the Web Search
IJMER
 
Introduction abstract
Sanghvi Innovative Academy
 
A Review: Text Classification on Social Media Data
IOSR Journals
 

What's hot (20)

PDF
Data collection thru social media
i4box Anon
 
PDF
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
ijcsa
 
PDF
The Predictive Effects of Communication and Search Quality on Behavioral Inte...
Russ Merz, Ph.D.
 
PPTX
Data mining for social media
rangesharp
 
POT
Data mining on Social Media
home
 
PPT
Content tagging and recommender systems
mettadata
 
PDF
Personalized web search using browsing history and domain knowledge
Rishikesh Pathak
 
PDF
Kammerer How The Interface Design Influences Users Spontaneous Trustworthines...
Kalle
 
PDF
UProRevs-User Profile Relevant Results
Royston Olivera
 
PDF
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
ijtsrd
 
PDF
Classification of search_engine
BookStoreLib
 
PDF
Data mining in web search engine optimization
BookStoreLib
 
PDF
Recommendation generation by integrating sequential pattern mining and semantics
eSAT Journals
 
PDF
Recommendation generation by integrating sequential
eSAT Publishing House
 
PDF
Data mining in social network
akash_mishra
 
PPT
The Use of Query Reformulation to Predict Future User Actions
Jim Jansen
 
PDF
[Wikisym2013] serp revised_apa_notice
Hanteng Liao
 
PDF
P11 goonetilleke
Rahul Yadav
 
PDF
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Editor IJCATR
 
PPT
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 
Data collection thru social media
i4box Anon
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
ijcsa
 
The Predictive Effects of Communication and Search Quality on Behavioral Inte...
Russ Merz, Ph.D.
 
Data mining for social media
rangesharp
 
Data mining on Social Media
home
 
Content tagging and recommender systems
mettadata
 
Personalized web search using browsing history and domain knowledge
Rishikesh Pathak
 
Kammerer How The Interface Design Influences Users Spontaneous Trustworthines...
Kalle
 
UProRevs-User Profile Relevant Results
Royston Olivera
 
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
ijtsrd
 
Classification of search_engine
BookStoreLib
 
Data mining in web search engine optimization
BookStoreLib
 
Recommendation generation by integrating sequential pattern mining and semantics
eSAT Journals
 
Recommendation generation by integrating sequential
eSAT Publishing House
 
Data mining in social network
akash_mishra
 
The Use of Query Reformulation to Predict Future User Actions
Jim Jansen
 
[Wikisym2013] serp revised_apa_notice
Hanteng Liao
 
P11 goonetilleke
Rahul Yadav
 
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Editor IJCATR
 
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 
Ad

Similar to Summary of Paper : Taxonomy of websearch by Broder (20)

PDF
Logminingsurvey
drewz lin
 
PDF
`A Survey on approaches of Web Mining in Varied Areas
inventionjournals
 
PDF
IJRET : International Journal of Research in Engineering and TechnologyImprov...
eSAT Publishing House
 
PDF
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
IJSRD
 
PDF
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
PDF
50320140501002
IAEME Publication
 
PDF
Classification-based Retrieval Methods to Enhance Information Discovery on th...
IJMIT JOURNAL
 
PDF
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
PPTX
CHAPTER -12 it.pptx
Koteswari Kasireddy
 
PDF
International conference On Computer Science And technology
anchalsinghdm
 
PDF
H018135054
IOSR Journals
 
PPT
Session 2 Slideshare
Fantastic1234
 
PDF
Paper24
Aman A. Slamaa
 
PDF
Comparative Analysis of Collaborative Filtering Technique
IOSR Journals
 
PDF
Building efficient and effective metasearch engines
unyil96
 
PDF
A detail survey of page re ranking various web features and techniques
ijctet
 
PDF
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
PDF
Literature Survey on Web Mining
IOSR Journals
 
PDF
Analysis on Recommended System for Web Information Retrieval Using HMM
IJERA Editor
 
PDF
Query Recommendation by using Collaborative Filtering Approach
IRJET Journal
 
Logminingsurvey
drewz lin
 
`A Survey on approaches of Web Mining in Varied Areas
inventionjournals
 
IJRET : International Journal of Research in Engineering and TechnologyImprov...
eSAT Publishing House
 
An Enhanced Approach for Detecting User's Behavior Applying Country-Wise Loca...
IJSRD
 
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
ijsrd.com
 
50320140501002
IAEME Publication
 
Classification-based Retrieval Methods to Enhance Information Discovery on th...
IJMIT JOURNAL
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
CHAPTER -12 it.pptx
Koteswari Kasireddy
 
International conference On Computer Science And technology
anchalsinghdm
 
H018135054
IOSR Journals
 
Session 2 Slideshare
Fantastic1234
 
Comparative Analysis of Collaborative Filtering Technique
IOSR Journals
 
Building efficient and effective metasearch engines
unyil96
 
A detail survey of page re ranking various web features and techniques
ijctet
 
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
Literature Survey on Web Mining
IOSR Journals
 
Analysis on Recommended System for Web Information Retrieval Using HMM
IJERA Editor
 
Query Recommendation by using Collaborative Filtering Approach
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
The Future of Artificial Intelligence (AI)
Mukul
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Software Development Methodologies in 2025
KodekX
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 

Summary of Paper : Taxonomy of websearch by Broder

  • 1. Review: “A taxonomy of web search” by Audrei Broder, IBM Research Bhavesh Singh 2010CS50281 1. Summary 1.1 Motivation There are three stages in the evolution of web search engines. The very first generation is too close to classic IR (Information Retrieval). A central tenet of classical information retrieval is that the user is driven by an “information need” which is defined as “the perceived need for information that leads to someone using an information retrieval system in the first place”. But the intent behind a web search is not information, it might be navigational (show me the URL of the site I want to reach) or transactional (show me the sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). So the main aim of this paper was to point out this difference and introduce and analyze the classification of web searches. The development of search engines has significant relation with these analysis. Fig. The classic model of IR, augmented for the web. 1|Page
  • 2. 1.2 Contribution Their contributions are as follows   Classified web queries according to the intent into three classes: 1. Navigational: The immediate intent is to reach a particular site that the user has in mind, either because they visited it in the past or because they assume that such a site exists. 2. Informational: The intent is to acquire some information assumed to be present on one or more web pages in a static form. By static form, they mean that the target document is not created in response to the user query. 3. Transactional: The intent is to perform some web-mediated activity, i.e. the user wants to reach a site where further interaction will happen. This interaction constitutes the transaction defining these queries. The main categories for such queries are shopping, finding various web-mediated services, downloading various type of file etc. In view of the above taxonomy(classification), they identified three stages in the evolution of we search engines: 1. First generation: those uses mostly on-page data (text and formatting) and is very close to classic IR. Supports mostly informational queries. 2. Second generation: those uses off-page, web specific data such as link analysis, anchor-text and click-through data. This generation supports both informational and navigational queries and started in 1998-1999. Google was the first to use link analysis as a primary ranking factor. By now, all major engines use all these types of data. 3. Third generation: This is the emerging generation for the author. It involves attempt to blend data from multiple sources in order to try to answer “the need behind the query”. For instance on a query like ‘San Francisco’ the engine might present direct links to a hotel reservation page for San Francisco, a map server, a weather server, etc. Thus third generation engines go beyond the limitation of a fixed corpus, via semantic analysis, context determination, dynamic data base selection, etc. The aim is to support informational, navigational, and transactional queries. This is a rapidly changing landscape. 1.3 Methodology Author used two methods to determine the prevalence of various types of queries: a survey of Alta Vista users, and an analysis of the query log at Alta Vista.  User Survey: The survey window was presented to random users and achieved a response ratio of about 10%. The data discussed there was collected between June 26 and 2|Page
  • 3. November 3, 2001 and consisted of 3190 valid returns. The survey questions relevant to this paper and its result were as followsQ2. Which of the following describes best what you are trying to do? 24.53% I want to get to a specific website that I already have in mind 68.41% I want a good site on this topic, but I don’t have a specific site in mind Q3. Which of the following best describes why you conducted this search? 8.16% I am shopping for something to buy on the Internet 5.46% I am shopping for something to buy elsewhere than on the Internet 22.55% I want to download a file (e.g., music, images, programs, etc.) 57.19% None of these reasons Q4. Which of the following describes best what you are looking for? 14.83% A site which is a collection of links to other sites regarding this topic 76.62% The best site regarding this topic Q2 was used to distinguish between navigational and non-navigational queries. The percentage of queries identified as navigational was 24.5%, non-navigational queries accounted for 68.4%, and 7.1% of the surveys did not answer Q1. Thus among respondents to Q2, the percentage of navigational queries was 26.4%. The survey had some additional questions, in particular Q7 was ‘In your own words, please describe the exact piece of information you are seeking.’ Based on sample of 200 queries, the query text, and the explanation provided at Q7, we estimate however that the number of transactional queries among survey respondents is about 36%. Queries that are neither transactional, nor navigational, are assumed to be informational.  Log analysis: They selected at random a set of 1000 queries from the daily AltaVista log. From this set we removed non-English queries and sexually oriented queries. (The latter being about 10% of the English queries). From the remaining set the first 400 queries were inspected. Queries that were neither transactional, nor navigational, were assumed to be informational in intent. Type of query Navigational Informational Transactional User Survey 24.5% ?? (estimated 39%) > 22% (estimated 36%) Query Log Analysis 20% 48% 30% 3|Page
  • 4. 1.4 Conclusion  An understanding of this classification based on ‘the need behind the query’ is essential to the development of successful web search.  The search engines of that time (when the paper was written) only deal well with informational and navigational queries, but transactional queries were satisfied only indirectly and hence a third generation in search engines began to emerge.  According to author, the main aim of the third generation was to deal efficiently with transactional queries mostly via semantic analyses (understanding what the query is about) and blending of various external databases. 4|Page