SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 136
COMPARATIVE ANALYSIS OF RELATIVE AND EXACT SEARCH
FOR WEB INFORMATION RETRIEVAL
Yagnesh D. Dave1
, Bijendra S. Agrawal2
1
Associate Professor, Shri Chimanbhai Patel Post Graduate Institute of Computer Applications,Gujarat,India
2
Director, Kalol Institute of Management, Gujarat, India
Abstract
The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge.
Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the
authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact
Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results
are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured.
The researchers have experienced the significant importance on exact search and relative search. The focused of the research
paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be
done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time.
Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE.
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
The search engine is a common tool used for information
retrieval from web repository. Information for multiple
types are available on the public channels. The difficulties
are to get exact or relative expected search responses
depending on the searchers criteria. The web information
retrieval involves three types of search systems: exact,
relative and adaptive search. In this work relative and exact
searches are analyzed for information retrieval.
The relative search will produce a resultant web data
depending upon the likeliness from web repository. This
method works on probabilistic model [1] which can
summarize the ranks from either web page or web data.
With the use of this, it can produce the nearest result.
The exact method compares the word with the already
stored word in database. In the database, there works on two
fields: web page name and keyword which consists of some
certain kinds of keywords. The keyword belongs to
classification dictionary which built from the domain
sitemap [2]. This approach is also work on classification
dictionary. Each entry of the classification dictionary
contains a term-category pair, the contingency table for that
pair and its calculated strength of association. The dictionary
also consists of all possible term-category pairs with at least
one searching content, category outcome.
2. RELATED WORK
Thangaraj,et. al. have indicated the ontology repository and
thesaurus to get semantic web search for relative search in
Information Retrieval[3]. Zhao et. al have given a
framework emphasized on targeted data with the use of web
crawled web pages and also for performing retrieved data
depended on the location base rank. They focused on web
locations and keywords from web information repository
and compared keywords with its specific web page location
in a pair[4]. Lin,et.al have targeted for retrieving web data
depended on the retrieval time from specific web page by
using temporal-textual Web queries. They proposed Time-
Aware Search Engine [5]. Roy et. al. have proposed web IR
technique content and intend for topic of query and explicit
use of the word respectively[6]. Francès et. al. have
suggested a technique for improving the document
replication in a concern to web distribution techniques on
the basis of cost and time effectiveness[7].
3. PROBLEM STATEMENT
From the understanding of the literature review, it is found
that for obtaining the optimize and relevant retrieval
searcher has to choose the right technique. The significance
part of the research work has dealt with the comparison of
two different techniques. The authors focused on retrieval
results and also experimenting techniques to summarize
with the retrieval results.
4. PROPOSED WORK
The purposed work has targeted to compare and analyzed
the two different information retrieval methods for
identifying the efficient retrieval from web information
repository.
4.1 Methodology
The undertaken relative search methodology searches the
word by opening the file before comparing all the word in
the file. If the search word matches with the word in the file
then it will be listed to the file. In database there be two field
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 137
that are URL field which would be a filename and keyword
which consists some certain kinds of keywords.
With the use of Term match, it produces a rank for retrieval
depending upon the different available web source
categories, which finds similar term in the web files
available on web site or web domain. It also generates the
relevance number of nodes for retrieving ranked based web
retrieval in listed manner:
1. For identifying related categories,
a. Determine the number of no duplicate term
based on different category in a query.
b. Determine the total frequency of no duplicate
term from different title from the entire
available web domain.
c. Calculate the ratio of available web domain
with the use of related term from the different
categories.
2. Based on the above steps, assign rank from the step a,
b and c in descending order for obtaining result from
relative search.
It is to point that the retrieval rank retrieved by different web
categories and these may contain similar types of relative
data which the searcher is looking for. In this research
paper, the authors have used categories available on web file
are like document, paragraphs, sentences, files. Based on
that ranking approach is quite producing a same result how
the website is producing. Only difference is that this uses a
combination of category and content match. The expansion
from original retrieval query containing the identification
depending on the basis of maximum retrieval occurrences,
and also consisting titles and descriptions from the top three
retrieval occurrences from most retrieving categories. The
term weight of the expanded query vector, which is
computed by multiplying term category.
Another methodology is based on the exact search with the
objective of comparative analysis searches the word by
compare with the already stored word in database. If the
search word matches with database word then it will be
listed to the file. In implemented database there are two
fields: one is URL field which is being a filename and
second one is keyword which consist some certain kinds of
keyboards.
To use this method for retrieving web data, the classification
method (CD) should be defined from the web domain on
which the web pages are resides. This methods works on the
association model which can be better for handling
performance issue in all criteria. To find the exact retrieval,
the searcher has to move from the web content to web page
and its categories. The following conditions are defined
below:
[Searching Content,
Category]
[Searching Content] ¬
[Category]
¬[Searching Content],
[Category]
¬[Searching Content]¬
[Category]
Each entry of the classification dictionary contains a term-
category pair, the contingency table for that pair, and its
calculated strength of association. The dictionary entries
consist of all possible term-category pairs with at least one
Searching Content, Category outcome.
5. RESULTS AND ANALYSIS
5.1 Exact Search Result
To retrieve data from the web, the first approach that the
researcher has used is exact search, where the researcher has
identified data by the keyword. All the results shown under
the head of exact search are retrieved based on the keyword
retrieved. Researcher retrieved results from all the phases
i.e. documents, paragraphs, sentences and files.
Fig-1 Agglomeration order after relatively
The Fig- 1 summarizes the relative frequencies retrieved
keywords from the selected data and it summarizes to the
maximum to minimum relative keyword found from the
resultant web data. As seen in classification, the retrieval
results are vast but the authors have removed clusters from
the table which retrieved in figure 1.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 138
Fig-2 Classifications of Most Frequent Data
The Fig- 2 retrieved a classification of relative data from the keywords and it shows the directions and action name have more
weight age then any one have.
Fig- 3 Classifications of Occurrences from Relevance Data
Fig- 3 shows the classification of occurrences of retrieval data depending upon the relevance. It identified the
likelihood of retrieval based on the relevance.
Fig- 4 Classifications of Frequencies of Relevance Data
Fig- 4 summarizes the classification chart of a frequency obtained by each case. It shows that the activity name occurred more
than 75% in vol_2_module which colored as red and the blue colored activity obtained 4% in class_1_outline.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 139
Fig- 5 Rate of Nonverbal per 1000 words
The Fig- 5 summarized that the vol_1_modules retrieved more non verbal resultant data i.e. 4.63%. it summarized that with the
relative search it also retrieved non verbal keyword on the basis of relevant search.
Fig- 6 Rate of verbal per 1000 words
The Fig- 6 summarized that the resultant data retrieved in the chart found the verb from the test data and found that good writing
guide and the good writing guide sociology have more verbs than any case has.
Fig- 7 Occurrences By Files
The Fig- 7 retrieved most occurrences keyword by each file and found that the different colored defined as keyword which
mapped in each different file. It identified that the vol_2_module.and Act_communication have more occurrences then
critical_writing test case.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 140
Fig- 8 Exact Searches With Case Occurrences By File
The Fig- 8 shows occurrences of retrieval data depending
upon the exact search retrieval and found that the four web
retrieval have common value i.e.1.
Fig- 9 Exact Searches With Word Frequency By File
Fig- 9 retrieved web information from the sentences
containing in a file and focused that the interpersonal
module have more frequencies than the rest. It achieves
more than 200%.
Fig- 10 Exact Searches By Case Occurrences From
Sentences
Fig- 10 containing the exact searches from the case
occurrence from the sentences containing in a test data and
found that the black colored are the file which is in active
and summarized that the all the mentioned web data files are
obtaining 100% result for a specified keyword.
Fig- 11 Exact Searches With Rate Per 10000 By File
The Fig- 11 shown a result based on rate per 10000 words
by file. It found that more than 180% achieves in the file
from interpersonal module and then it immediately move
down to 45% in nqimch08 and then increases in
vol_2_module in 60%.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 141
Fig- 12 Exact Searches By Case Occurrences From
Sentences
The Fig- 12 retrieved web resultant data and found number
of case occurrences and found that the test files are
obtaining 100% exact result.
Fig- 13 Exact Searches By Category Percentage From
Sentences
The Fig- 13 shows that retrieval based on the category
percentage and from the selected test data, the vertical bar
shown the files (test file) which obtained 100% result based
on the category.
Fig- 14 Exact Searches By Case Occurrences From
Sentences
The above Fig- 14 shows that case occurrences found by the
sentences. It is clearly indicating that these sentences are
classified by predicted classes and based on that the case
occurrences are identified by the paragraphs.
6. CONCLUSION
The result analysis led to conclude that to exact search,
searcher is expected to find the minimal retrieval in fewer
times, where as in the relative search the searcher get vast
retrieval and will also take bit more time than exact search.
In the exact search will not work on uncertain or unclear
keyword. In contrast to this, relative search gives
comparatively improved response option and is expected to
more possible values that the searcher is looking for from
the web information repository. While using the relative
search it also require more time to filter out the relevant
resultant data. Both the approaches are useful in text and
link based retrieval process. The extensibility of relative
search may give more options to retrieve from web domain.
The exact search produces better outcomes from different
sources like web documents, paragraphs containing in the
web documents, sentences containing the paragraphs. Thus
with the use these option searcher retrieves the better
outcome as compare to relative search. The obtained result
has justified and validated that the targeted outcomes tht
has been attained with integration/fusion of approaches
rather than their individual uses.
REFERENCES
[1] Robertson, S.E. Maron & Cooper (1982), Probability of
Relevance: A unification of two competing models for
document retrieval. Information Technology: Research
and Development,1,1-21.
[2] McCallum, A. Rosenfeld, R.,Mitchell & A.Y.
Improving text classification by shrinkage in a
hierarchy of classes. Proceeding of the 15th
international
conference on Machine Learning,359-367.
[3] Thangaraj, M., and G. Sujatha. "An architectural design
for effective information retrieval in semantic web."
Expert Systems with Applications 41.18 (2014): 8225-
8233.
[4] Zhao, Jie, et al. "Exploiting location information for
web search." Computers in Human Behavior 30 (2014):
378-388.
[5] Lin, Sheng, et al. "Exploiting temporal information in
Web search." Expert Systems with Applications 41.2
(2014): 331-341.
[6] Roy, Rishiraj Saha, et al. "Discovering and
understanding word level user intent in Web search
queries." Web Semantics: Science, Services and Agents
on the World Wide Web 30 (2015): 22-38.
[7] Francès, Guillem, et al. "Improving the efficiency of
multi-site web search engines." Proceedings of the 7th
ACM international conference on Web search and data
mining. ACM, 2014.

More Related Content

What's hot (18)

PDF
Syntactic Indexes for Text Retrieval
ITIIIndustries
 
PDF
A Survey on Automatically Mining Facets for Queries from their Search Results
IRJET Journal
 
PDF
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET Journal
 
PDF
A novel method to search information through multi agent search and retrie
IAEME Publication
 
PDF
At33264269
IJERA Editor
 
PDF
Query Recommendation by using Collaborative Filtering Approach
IRJET Journal
 
PDF
A comprehensive study of mining web data
eSAT Publishing House
 
PDF
Context Sensitive Search String Composition Algorithm using User Intention to...
IJECEIAES
 
PDF
Volume 2-issue-6-2016-2020
Editor IJARCET
 
PDF
P036401020107
theijes
 
PDF
Overview of Indexing In Object Oriented Database
Editor IJMTER
 
PDF
Annotation for query result records based on domain specific ontology
ijnlc
 
PDF
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
PDF
Meta documents and query extension to enhance information retrieval process
eSAT Journals
 
PDF
50120140503003 2
IAEME Publication
 
PDF
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
pharmaindexing
 
PDF
F0362036045
theijes
 
Syntactic Indexes for Text Retrieval
ITIIIndustries
 
A Survey on Automatically Mining Facets for Queries from their Search Results
IRJET Journal
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET Journal
 
A novel method to search information through multi agent search and retrie
IAEME Publication
 
At33264269
IJERA Editor
 
Query Recommendation by using Collaborative Filtering Approach
IRJET Journal
 
A comprehensive study of mining web data
eSAT Publishing House
 
Context Sensitive Search String Composition Algorithm using User Intention to...
IJECEIAES
 
Volume 2-issue-6-2016-2020
Editor IJARCET
 
P036401020107
theijes
 
Overview of Indexing In Object Oriented Database
Editor IJMTER
 
Annotation for query result records based on domain specific ontology
ijnlc
 
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Meta documents and query extension to enhance information retrieval process
eSAT Journals
 
50120140503003 2
IAEME Publication
 
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
pharmaindexing
 
F0362036045
theijes
 

Similar to Comparative analysis of relative and exact search for web information retrieval (20)

PDF
Chemread – a chemical informant
eSAT Publishing House
 
PDF
Effective Performance of Information Retrieval on Web by Using Web Crawling  
dannyijwest
 
PDF
Adaptive focused crawling strategy for maximising the relevance
eSAT Journals
 
PDF
Review on an automatic extraction of educational digital objects and metadata...
IRJET Journal
 
PDF
50120140502013
IAEME Publication
 
PDF
50120140502013
IAEME Publication
 
PDF
IRJET-Computational model for the processing of documents and support to the ...
IRJET Journal
 
PDF
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET Journal
 
PDF
Optimization of Search Results with Duplicate Page Elimination using Usage Data
IDES Editor
 
PDF
An investigative scheme for keyword search using inverted key tactic
eSAT Publishing House
 
PDF
Recommendation generation by integrating sequential
eSAT Publishing House
 
PDF
Recommendation generation by integrating sequential pattern mining and semantics
eSAT Journals
 
PDF
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET Journal
 
PDF
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
PDF
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
IRJET Journal
 
PDF
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
IJCSIS Research Publications
 
PDF
IRJET- Foster Hashtag from Image and Text
IRJET Journal
 
PDF
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET Journal
 
PDF
An efficient information retrieval ontology system based indexing for context
eSAT Journals
 
PDF
`A Survey on approaches of Web Mining in Varied Areas
inventionjournals
 
Chemread – a chemical informant
eSAT Publishing House
 
Effective Performance of Information Retrieval on Web by Using Web Crawling  
dannyijwest
 
Adaptive focused crawling strategy for maximising the relevance
eSAT Journals
 
Review on an automatic extraction of educational digital objects and metadata...
IRJET Journal
 
50120140502013
IAEME Publication
 
50120140502013
IAEME Publication
 
IRJET-Computational model for the processing of documents and support to the ...
IRJET Journal
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET Journal
 
Optimization of Search Results with Duplicate Page Elimination using Usage Data
IDES Editor
 
An investigative scheme for keyword search using inverted key tactic
eSAT Publishing House
 
Recommendation generation by integrating sequential
eSAT Publishing House
 
Recommendation generation by integrating sequential pattern mining and semantics
eSAT Journals
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET Journal
 
IRJET - Re-Ranking of Google Search Results
IRJET Journal
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
IRJET Journal
 
Proactive Approach to Estimate the Re-crawl Period for Resource Minimization ...
IJCSIS Research Publications
 
IRJET- Foster Hashtag from Image and Text
IRJET Journal
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET Journal
 
An efficient information retrieval ontology system based indexing for context
eSAT Journals
 
`A Survey on approaches of Web Mining in Varied Areas
inventionjournals
 
Ad

More from eSAT Journals (20)

PDF
Mechanical properties of hybrid fiber reinforced concrete for pavements
eSAT Journals
 
PDF
Material management in construction – a case study
eSAT Journals
 
PDF
Managing drought short term strategies in semi arid regions a case study
eSAT Journals
 
PDF
Life cycle cost analysis of overlay for an urban road in bangalore
eSAT Journals
 
PDF
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
 
PDF
Laboratory investigation of expansive soil stabilized with natural inorganic ...
eSAT Journals
 
PDF
Influence of reinforcement on the behavior of hollow concrete block masonry p...
eSAT Journals
 
PDF
Influence of compaction energy on soil stabilized with chemical stabilizer
eSAT Journals
 
PDF
Geographical information system (gis) for water resources management
eSAT Journals
 
PDF
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
eSAT Journals
 
PDF
Factors influencing compressive strength of geopolymer concrete
eSAT Journals
 
PDF
Experimental investigation on circular hollow steel columns in filled with li...
eSAT Journals
 
PDF
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
eSAT Journals
 
PDF
Evaluation of punching shear in flat slabs
eSAT Journals
 
PDF
Evaluation of performance of intake tower dam for recent earthquake in india
eSAT Journals
 
PDF
Evaluation of operational efficiency of urban road network using travel time ...
eSAT Journals
 
PDF
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
 
PDF
Estimation of morphometric parameters and runoff using rs & gis techniques
eSAT Journals
 
PDF
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
 
PDF
Effect of use of recycled materials on indirect tensile strength of asphalt c...
eSAT Journals
 
Mechanical properties of hybrid fiber reinforced concrete for pavements
eSAT Journals
 
Material management in construction – a case study
eSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
eSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
eSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
eSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
eSAT Journals
 
Geographical information system (gis) for water resources management
eSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
eSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
eSAT Journals
 
Evaluation of punching shear in flat slabs
eSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
eSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
eSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
eSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
eSAT Journals
 
Ad

Recently uploaded (20)

PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PDF
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PDF
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PPTX
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
PPTX
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
PPTX
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
Information Retrieval and Extraction - Module 7
premSankar19
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
sunil mishra pptmmmmmmmmmmmmmmmmmmmmmmmmm
singhamit111
 
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 

Comparative analysis of relative and exact search for web information retrieval

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 136 COMPARATIVE ANALYSIS OF RELATIVE AND EXACT SEARCH FOR WEB INFORMATION RETRIEVAL Yagnesh D. Dave1 , Bijendra S. Agrawal2 1 Associate Professor, Shri Chimanbhai Patel Post Graduate Institute of Computer Applications,Gujarat,India 2 Director, Kalol Institute of Management, Gujarat, India Abstract The volume of data on web repository is huge. To get specific and precise information for the web repository is a big challenge. Existing Information Retrieval (IR) techniques, given by contemporary researchers, are very useful in field of IR. Here, the authors have implemented and tested two of the techniques from the fields of IR. The authors dealt with Relative Search and Exact Search techniques one by one. Initially relative search tested on web repository data using web mining tool and then its results are analyzed. In the same manner, the exact search technique of IR tested on web repository data and the results are measured. The researchers have experienced the significant importance on exact search and relative search. The focused of the research paper is to retrieve relevant information from the web information repository. With the use of two searching criteria these can be done. With the use of the suggested methods the searchers may retrieve a relevant web data in a fewer time. Key Words: Web data Mining, Exact Search, Relative Search, PR, TM, CD, VSM and TASE. --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION The search engine is a common tool used for information retrieval from web repository. Information for multiple types are available on the public channels. The difficulties are to get exact or relative expected search responses depending on the searchers criteria. The web information retrieval involves three types of search systems: exact, relative and adaptive search. In this work relative and exact searches are analyzed for information retrieval. The relative search will produce a resultant web data depending upon the likeliness from web repository. This method works on probabilistic model [1] which can summarize the ranks from either web page or web data. With the use of this, it can produce the nearest result. The exact method compares the word with the already stored word in database. In the database, there works on two fields: web page name and keyword which consists of some certain kinds of keywords. The keyword belongs to classification dictionary which built from the domain sitemap [2]. This approach is also work on classification dictionary. Each entry of the classification dictionary contains a term-category pair, the contingency table for that pair and its calculated strength of association. The dictionary also consists of all possible term-category pairs with at least one searching content, category outcome. 2. RELATED WORK Thangaraj,et. al. have indicated the ontology repository and thesaurus to get semantic web search for relative search in Information Retrieval[3]. Zhao et. al have given a framework emphasized on targeted data with the use of web crawled web pages and also for performing retrieved data depended on the location base rank. They focused on web locations and keywords from web information repository and compared keywords with its specific web page location in a pair[4]. Lin,et.al have targeted for retrieving web data depended on the retrieval time from specific web page by using temporal-textual Web queries. They proposed Time- Aware Search Engine [5]. Roy et. al. have proposed web IR technique content and intend for topic of query and explicit use of the word respectively[6]. Francès et. al. have suggested a technique for improving the document replication in a concern to web distribution techniques on the basis of cost and time effectiveness[7]. 3. PROBLEM STATEMENT From the understanding of the literature review, it is found that for obtaining the optimize and relevant retrieval searcher has to choose the right technique. The significance part of the research work has dealt with the comparison of two different techniques. The authors focused on retrieval results and also experimenting techniques to summarize with the retrieval results. 4. PROPOSED WORK The purposed work has targeted to compare and analyzed the two different information retrieval methods for identifying the efficient retrieval from web information repository. 4.1 Methodology The undertaken relative search methodology searches the word by opening the file before comparing all the word in the file. If the search word matches with the word in the file then it will be listed to the file. In database there be two field
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 137 that are URL field which would be a filename and keyword which consists some certain kinds of keywords. With the use of Term match, it produces a rank for retrieval depending upon the different available web source categories, which finds similar term in the web files available on web site or web domain. It also generates the relevance number of nodes for retrieving ranked based web retrieval in listed manner: 1. For identifying related categories, a. Determine the number of no duplicate term based on different category in a query. b. Determine the total frequency of no duplicate term from different title from the entire available web domain. c. Calculate the ratio of available web domain with the use of related term from the different categories. 2. Based on the above steps, assign rank from the step a, b and c in descending order for obtaining result from relative search. It is to point that the retrieval rank retrieved by different web categories and these may contain similar types of relative data which the searcher is looking for. In this research paper, the authors have used categories available on web file are like document, paragraphs, sentences, files. Based on that ranking approach is quite producing a same result how the website is producing. Only difference is that this uses a combination of category and content match. The expansion from original retrieval query containing the identification depending on the basis of maximum retrieval occurrences, and also consisting titles and descriptions from the top three retrieval occurrences from most retrieving categories. The term weight of the expanded query vector, which is computed by multiplying term category. Another methodology is based on the exact search with the objective of comparative analysis searches the word by compare with the already stored word in database. If the search word matches with database word then it will be listed to the file. In implemented database there are two fields: one is URL field which is being a filename and second one is keyword which consist some certain kinds of keyboards. To use this method for retrieving web data, the classification method (CD) should be defined from the web domain on which the web pages are resides. This methods works on the association model which can be better for handling performance issue in all criteria. To find the exact retrieval, the searcher has to move from the web content to web page and its categories. The following conditions are defined below: [Searching Content, Category] [Searching Content] ¬ [Category] ¬[Searching Content], [Category] ¬[Searching Content]¬ [Category] Each entry of the classification dictionary contains a term- category pair, the contingency table for that pair, and its calculated strength of association. The dictionary entries consist of all possible term-category pairs with at least one Searching Content, Category outcome. 5. RESULTS AND ANALYSIS 5.1 Exact Search Result To retrieve data from the web, the first approach that the researcher has used is exact search, where the researcher has identified data by the keyword. All the results shown under the head of exact search are retrieved based on the keyword retrieved. Researcher retrieved results from all the phases i.e. documents, paragraphs, sentences and files. Fig-1 Agglomeration order after relatively The Fig- 1 summarizes the relative frequencies retrieved keywords from the selected data and it summarizes to the maximum to minimum relative keyword found from the resultant web data. As seen in classification, the retrieval results are vast but the authors have removed clusters from the table which retrieved in figure 1.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 138 Fig-2 Classifications of Most Frequent Data The Fig- 2 retrieved a classification of relative data from the keywords and it shows the directions and action name have more weight age then any one have. Fig- 3 Classifications of Occurrences from Relevance Data Fig- 3 shows the classification of occurrences of retrieval data depending upon the relevance. It identified the likelihood of retrieval based on the relevance. Fig- 4 Classifications of Frequencies of Relevance Data Fig- 4 summarizes the classification chart of a frequency obtained by each case. It shows that the activity name occurred more than 75% in vol_2_module which colored as red and the blue colored activity obtained 4% in class_1_outline.
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 139 Fig- 5 Rate of Nonverbal per 1000 words The Fig- 5 summarized that the vol_1_modules retrieved more non verbal resultant data i.e. 4.63%. it summarized that with the relative search it also retrieved non verbal keyword on the basis of relevant search. Fig- 6 Rate of verbal per 1000 words The Fig- 6 summarized that the resultant data retrieved in the chart found the verb from the test data and found that good writing guide and the good writing guide sociology have more verbs than any case has. Fig- 7 Occurrences By Files The Fig- 7 retrieved most occurrences keyword by each file and found that the different colored defined as keyword which mapped in each different file. It identified that the vol_2_module.and Act_communication have more occurrences then critical_writing test case.
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 140 Fig- 8 Exact Searches With Case Occurrences By File The Fig- 8 shows occurrences of retrieval data depending upon the exact search retrieval and found that the four web retrieval have common value i.e.1. Fig- 9 Exact Searches With Word Frequency By File Fig- 9 retrieved web information from the sentences containing in a file and focused that the interpersonal module have more frequencies than the rest. It achieves more than 200%. Fig- 10 Exact Searches By Case Occurrences From Sentences Fig- 10 containing the exact searches from the case occurrence from the sentences containing in a test data and found that the black colored are the file which is in active and summarized that the all the mentioned web data files are obtaining 100% result for a specified keyword. Fig- 11 Exact Searches With Rate Per 10000 By File The Fig- 11 shown a result based on rate per 10000 words by file. It found that more than 180% achieves in the file from interpersonal module and then it immediately move down to 45% in nqimch08 and then increases in vol_2_module in 60%.
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 09 | September-2015, Available @ http://www.ijret.org 141 Fig- 12 Exact Searches By Case Occurrences From Sentences The Fig- 12 retrieved web resultant data and found number of case occurrences and found that the test files are obtaining 100% exact result. Fig- 13 Exact Searches By Category Percentage From Sentences The Fig- 13 shows that retrieval based on the category percentage and from the selected test data, the vertical bar shown the files (test file) which obtained 100% result based on the category. Fig- 14 Exact Searches By Case Occurrences From Sentences The above Fig- 14 shows that case occurrences found by the sentences. It is clearly indicating that these sentences are classified by predicted classes and based on that the case occurrences are identified by the paragraphs. 6. CONCLUSION The result analysis led to conclude that to exact search, searcher is expected to find the minimal retrieval in fewer times, where as in the relative search the searcher get vast retrieval and will also take bit more time than exact search. In the exact search will not work on uncertain or unclear keyword. In contrast to this, relative search gives comparatively improved response option and is expected to more possible values that the searcher is looking for from the web information repository. While using the relative search it also require more time to filter out the relevant resultant data. Both the approaches are useful in text and link based retrieval process. The extensibility of relative search may give more options to retrieve from web domain. The exact search produces better outcomes from different sources like web documents, paragraphs containing in the web documents, sentences containing the paragraphs. Thus with the use these option searcher retrieves the better outcome as compare to relative search. The obtained result has justified and validated that the targeted outcomes tht has been attained with integration/fusion of approaches rather than their individual uses. REFERENCES [1] Robertson, S.E. Maron & Cooper (1982), Probability of Relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development,1,1-21. [2] McCallum, A. Rosenfeld, R.,Mitchell & A.Y. Improving text classification by shrinkage in a hierarchy of classes. Proceeding of the 15th international conference on Machine Learning,359-367. [3] Thangaraj, M., and G. Sujatha. "An architectural design for effective information retrieval in semantic web." Expert Systems with Applications 41.18 (2014): 8225- 8233. [4] Zhao, Jie, et al. "Exploiting location information for web search." Computers in Human Behavior 30 (2014): 378-388. [5] Lin, Sheng, et al. "Exploiting temporal information in Web search." Expert Systems with Applications 41.2 (2014): 331-341. [6] Roy, Rishiraj Saha, et al. "Discovering and understanding word level user intent in Web search queries." Web Semantics: Science, Services and Agents on the World Wide Web 30 (2015): 22-38. [7] Francès, Guillem, et al. "Improving the efficiency of multi-site web search engines." Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 2014.