SlideShare a Scribd company logo
In Search of a Semantic Book
Search Engine on the Web:
Are We There Yet?
By
Irfan Ullah and Shah Khusro
University of Peshawar, Pakistan
5th Computer Science On-line Conference 2016
ComputerScienceOnline
Conference2016
1
In this Presentation
• Abstract
• Introduction
• Survey of the Literature
• Extracting Structure & Indexing Books
• Searching and Ranking Books
• Book Recommendations
• Fine-grained Access to Information in Books
• Discussion and Analysis
• Conclusions
• References
ComputerScienceOnline
Conference2016
2
Abstract
• Books – Valuable source of knowledge and learning
• Position
• Web Information Retrieval (IR) techniques for book retrieval
• Existing searching solutions treat books as plaintext collections
• Inaccurate and imprecise book search results
• Solution
• Books are different from web pages
• Structural semantics and logical connections in their content for
searching, ranking and recommendations
• Fine-grained access to information in books e.g. tables, figures
ComputerScienceOnline
Conference2016
3
Introduction
• Web Information Retrieval
• Rich text collections with explicit hypertextual structure
• Used in searching and ranking web pages
• Books lack this graph-like structure – Problem
• Books are well-organized and logically connected
• Presenting a graph-like structure – can be used in searching,
ranking, and recommending books
• But visible to Human readers only
• Problem – Need to be machine understandable and processable
http://talk.payloadz.com/wp-
content/uploads/2013/10/Selling-Books-Online-660x320.jpg
ComputerScienceOnline
Conference2016
4
Introduction
• Solution – Semantic Book Search Engine
• What is Required?
• A more in-depth and comprehensive book structure ontology
• Domain level ontologies to understand book contents in different
domains
• Connecting books in graph-like manner
• Why?
• Better searching, ranking, and recommendations
• Increase user satisfaction
• Promoting objectives of other stakeholders
ComputerScienceOnline
Conference2016
5
Survey of the Literature
• Extracting Structure & Indexing Books
• Many Research Initiatives and Conferences
• INEX, ICDAR, and BooksOnline
• Indexing books’ valuable parts [2].
• Book layout analysis for extracting TOC [3] and other parts [8]
• Resurgence software for detecting different parts [4-6]
• Rule-based and SVM-based methods extracting TOC [7]
• Detecting and parsing TOC pages [9], index pages [9] through
classical methods [10, 11] and using trailing page whitespace
methods [9]
• Required
• Connecting book title with other parts
• Better book indexing, ranking and recommendations
ComputerScienceOnline
Conference2016
6
Survey of the Literature
• Searching and Ranking Books
• Ranking authors by expert finding to rank books [12]
• “Authors capture an important aspect of relevance [12]”
• Read books written by popular experts in the field
• No bags-of-words models
• Ranking by what is actually inside books [13]
• Thesaurus, reference works and ontologies
• Helping readers in getting useful insights into text and decide about
the relevancy of the book
ComputerScienceOnline
Conference2016
7
www.vectastock.com
Survey of the Literature
• Searching and Ranking Books
• Digitized Books
• By combining and comparing scores for book headings, TOC and
book titles [2].
• Digitization Projects – Limited/No Ranking
• Project Gutenberg – sorting results
• Google Books – 100 (unknown) ranking signals [1]
• Google Patents [15,16] – Not implemented YET
• Books could be connected through references [14] – Limited
• Need
• Using Semantic Web and Ontologies
ComputerScienceOnline
Conference2016
8
prepa3.sems.udg.mx
Survey of the Literature
• Book Recommendations
• Available Recommenders
• BReK12 – readability levels of K-12 readers + book contents [21]
• BReT – K-12 teachers in finding relevant books for K-12 students [22]
• K3Rec – K-3 readers, their parents, and teachers [23]
• Using near and partial duplicates, citation analysis, and metadata
similarities [24].
• User modeling – information from Social Web [17].
• Book reviews [18, 19].
• Semantic Web and ontologies [25-27]
• Limited – Use only book descriptions not the actual content
• Required
• True content-based semantic book recommender
ComputerScienceOnline
Conference2016
9
bookshelvesofdoom.blogs.com
Survey of the Literature
• Fine-grained access to information in books
• Retrieving similar and related tables, figures, images, algorithms,
equations, quotations, and passages
• Augmenting tables with different data sources to restore back the
lost semantics [28].
• Same is the case with figures and images
• CiteSeer – document, author, and table search
• Need
• Exploitation of book structural semantics and logical connections
ComputerScienceOnline
Conference2016
10
2.bp.blogspot.com
Discussion & Analysis
• Indexing books
• Multi-field inverted index should be used [29].
• Book search engine should be able to understand
• The nature of books, their contents, and user intensions
• E.g., fiction and novels, readers may be interested in different stratas
including the plot, the idea, and the composition of work [30].
• Required
• Semantic indexing by exploiting book structural semantics
• Indexing fictions/novels, and
• Indexing books using metadata
• Book reviews
ComputerScienceOnline
Conference2016
11
Discussion & Analysis
• Searching books
• Search Engine Results Page (SERP)
• Too many relevant and irrelevant results – Information Overload [31]
• Required – User Interface
• Provide more relevant results
• Robust, non-ambiguous, understandable and relevant to information
need
• Present results in a manner that augments user understanding
ComputerScienceOnline
Conference2016
12
davidpoulos.com
Discussion & Analysis
• Ranking and recommending books
• Using ontologies and the actual book contents
• Exploiting structural semantics and logical connections in book
contentss
• Problem
• Existing ontologies (JeromeDL, and DocBook) are limited in fully
describing books
• Required
• Comprehensive book structure and several domain-level
ontologies
• Ontology Engineering and Ontology Learning [32] along with
involving domain experts
ComputerScienceOnline
Conference2016
13
Discussion & Analysis
• Finding Related tables and figures
• Table extraction and searching
• Summarize, elaborate and compare tables
• Interpret tables accurately
• Structure and semantic characteristics of book tables of all possible layout
variations
• Using online knowledge sources in annotating tables [28]
• Using ontologies in indexing, searching, and ranking tables
• Figure extraction and searching
• Relating figures using visual similarities and contextual clues
• To retrieve books that present images and figures on a certain
concept or topic
ComputerScienceOnline
Conference2016
14
Conclusions
• Book Search and Retrieval
• Has been focused by research initiatives and academic research
• Several retrieval methods have been proposed
• Several book ontologies have been developed for indexing,
ranking, and recommending books
• Still we are miles away from the ideal system
• Need
• Further research initiatives for discovering book structural
semantics and its use in searching, ranking, and recommending
books
ComputerScienceOnline
Conference2016
15
Conclusions
• Need – Semantic book search engine
• Treat books different from other web documents
• Use their structural semantics and logical connections in
searching, ranking, and recommendations
• Comprehensive book structure ontology
• Domain-level ontologies
• To process book contents in different domains
• To create a graph-like structure of books to be used by PageRank
type algorithms
• To allow fine-grained access to information in books like tables,
figures, algorithms, equations, similar passages etc.
• To fulfill the information needs of readers and other stakeholders
ComputerScienceOnline
Conference2016
16
ComputerScienceOnline
Conference2016
17

More Related Content

PPTX
MCCP 7012 Effective Literature Searching 2015-2016
HKBU Library
 
PPTX
Chap1
Shahriar Rafee
 
PPTX
Info 2402 irt-chapter_2
Shahriar Rafee
 
PPTX
Three flavours of taxonomy tools Joyce van Aalten
Joyce van Aalten
 
PPT
Reference Sources: Origin, Evaluation and Use
Prince Raja
 
PPTX
Doing research and getting published – challenges and possible solutions
Anabela Mesquita
 
PPT
Eco4132 Spring 2010
lindahauck
 
PPT
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES
 
MCCP 7012 Effective Literature Searching 2015-2016
HKBU Library
 
Info 2402 irt-chapter_2
Shahriar Rafee
 
Three flavours of taxonomy tools Joyce van Aalten
Joyce van Aalten
 
Reference Sources: Origin, Evaluation and Use
Prince Raja
 
Doing research and getting published – challenges and possible solutions
Anabela Mesquita
 
Eco4132 Spring 2010
lindahauck
 
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES
 

What's hot (6)

PPT
Dr Jalaluddin Haider
iamlibrarian
 
PPTX
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
PDF
Publishing with IEEE Workshop February 2019
uoblibraries
 
PPT
Ajay swayam
AjayRaj139
 
PPTX
Mining Virtual Reference Data for an Iterative Assessment Cycle
Amanda Clay Powers
 
PDF
Analysis as KM
Malcolm Ryder
 
Dr Jalaluddin Haider
iamlibrarian
 
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
Publishing with IEEE Workshop February 2019
uoblibraries
 
Ajay swayam
AjayRaj139
 
Mining Virtual Reference Data for an Iterative Assessment Cycle
Amanda Clay Powers
 
Analysis as KM
Malcolm Ryder
 
Ad

Viewers also liked (20)

PPTX
Search Engines After The Semanatic Web
samar_slideshare
 
PDF
A Survey of Entity Ranking over RDF Graphs
Intelligent Search Systems and Semantic Technologies lab at ITIS KFU
 
PDF
Demo: Profiling & Exploration of Linked Open Data
Stefan Dietze
 
PDF
Knowledge Patterns SSSW2016
Aldo Gangemi
 
PPTX
SemTech 2011 Semantic Search tutorial
Peter Mika
 
PDF
PhD Dissertation Supporting tools for automated generation and visual editing...
Álvaro Sicilia
 
PPTX
School intro
José Ramón Ríos Viqueira
 
PDF
Tutorial Knowledge Discovery
SSSW
 
PDF
Ontological approach for improving semantic web search results
eSAT Journals
 
PPT
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Leon Derczynski
 
PPTX
Intriduction to Ontotext's KIM platform
toncho11
 
PPT
Semantic Search Engines
Atul Shridhar
 
PPTX
Adding Semantic Edge to Your Content – From Authoring to Delivery
Ontotext
 
PPTX
A Taxonomy of Semantic Web data Retrieval Techniques
NUST School of Electrical Engineering and Computer Science
 
PPTX
WOTS2E: A Search Engine for a Semantic Web of Things
Andreas Kamilaris
 
PPTX
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Mauro Dragoni
 
PDF
Semantics And Search
Vestforsk.no
 
PDF
Semantic data mining: an ontology based approach
Agnieszka Ławrynowicz
 
PDF
Text Analysis and Semantic Search with GATE
Diana Maynard
 
PDF
Semantic security framework and context-aware role-based access control ontol...
Natalia Díaz Rodríguez
 
Search Engines After The Semanatic Web
samar_slideshare
 
Demo: Profiling & Exploration of Linked Open Data
Stefan Dietze
 
Knowledge Patterns SSSW2016
Aldo Gangemi
 
SemTech 2011 Semantic Search tutorial
Peter Mika
 
PhD Dissertation Supporting tools for automated generation and visual editing...
Álvaro Sicilia
 
Tutorial Knowledge Discovery
SSSW
 
Ontological approach for improving semantic web search results
eSAT Journals
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Leon Derczynski
 
Intriduction to Ontotext's KIM platform
toncho11
 
Semantic Search Engines
Atul Shridhar
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Ontotext
 
A Taxonomy of Semantic Web data Retrieval Techniques
NUST School of Electrical Engineering and Computer Science
 
WOTS2E: A Search Engine for a Semantic Web of Things
Andreas Kamilaris
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Mauro Dragoni
 
Semantics And Search
Vestforsk.no
 
Semantic data mining: an ontology based approach
Agnieszka Ławrynowicz
 
Text Analysis and Semantic Search with GATE
Diana Maynard
 
Semantic security framework and context-aware role-based access control ontol...
Natalia Díaz Rodríguez
 
Ad

Similar to In Search of a Semantic Book Search Engine: Are We There Yet? (20)

PPTX
Building a Semantic search Engine in a library
SEECS NUST
 
PPTX
Making things findable
Peter Mika
 
PDF
G Antoniou Frank Van Harmelen A Semantic Web Primer
uintvenka15
 
PDF
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET Journal
 
PDF
Paper id 25201463
IJRAT
 
PPTX
Semtech bizsemanticsearchtutorial
Barbara Starr
 
PPTX
SWT Lecture Session 1 - Introduction
Mariano Rodriguez-Muro
 
PPTX
Semantic Search on the Rise
Peter Mika
 
PPT
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Frank van Harmelen
 
PDF
PDF Annotation for the semantic web 1st Edition S. Handschuh download
garpaojdl21
 
PPTX
Semantic Search at Yahoo
Peter Mika
 
PDF
Books and Webs: Pulling the Down Rows
Peter Brantley
 
PPT
Related Entity Finding on the Web
Peter Mika
 
PDF
Annotation for the semantic web 1st Edition S. Handschuh
madyymesgen
 
PPT
Slawek Korea
Slawek
 
PDF
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
ISAR Publications
 
KEY
The Live OWL Documentation Environment: a tool for the automatic generation o...
University of Bologna
 
PPTX
Large-Scale Semantic Search
Roi Blanco
 
PDF
Annotation for the semantic web 1st Edition S. Handschuh
ourthyoshik8
 
PPTX
Semantic mark-up with schema.org: helping search engines understand the Web
Peter Mika
 
Building a Semantic search Engine in a library
SEECS NUST
 
Making things findable
Peter Mika
 
G Antoniou Frank Van Harmelen A Semantic Web Primer
uintvenka15
 
IRJET- Semantic Web Mining and Semantic Search Engine: A Review
IRJET Journal
 
Paper id 25201463
IJRAT
 
Semtech bizsemanticsearchtutorial
Barbara Starr
 
SWT Lecture Session 1 - Introduction
Mariano Rodriguez-Muro
 
Semantic Search on the Rise
Peter Mika
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Frank van Harmelen
 
PDF Annotation for the semantic web 1st Edition S. Handschuh download
garpaojdl21
 
Semantic Search at Yahoo
Peter Mika
 
Books and Webs: Pulling the Down Rows
Peter Brantley
 
Related Entity Finding on the Web
Peter Mika
 
Annotation for the semantic web 1st Edition S. Handschuh
madyymesgen
 
Slawek Korea
Slawek
 
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...
ISAR Publications
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
University of Bologna
 
Large-Scale Semantic Search
Roi Blanco
 
Annotation for the semantic web 1st Edition S. Handschuh
ourthyoshik8
 
Semantic mark-up with schema.org: helping search engines understand the Web
Peter Mika
 

Recently uploaded (20)

PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PDF
Immersive experiences: what Pharo users do!
ESUG
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Presentation about variables and constant.pptx
safalsingh810
 
Immersive experiences: what Pharo users do!
ESUG
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 

In Search of a Semantic Book Search Engine: Are We There Yet?

  • 1. In Search of a Semantic Book Search Engine on the Web: Are We There Yet? By Irfan Ullah and Shah Khusro University of Peshawar, Pakistan 5th Computer Science On-line Conference 2016 ComputerScienceOnline Conference2016 1
  • 2. In this Presentation • Abstract • Introduction • Survey of the Literature • Extracting Structure & Indexing Books • Searching and Ranking Books • Book Recommendations • Fine-grained Access to Information in Books • Discussion and Analysis • Conclusions • References ComputerScienceOnline Conference2016 2
  • 3. Abstract • Books – Valuable source of knowledge and learning • Position • Web Information Retrieval (IR) techniques for book retrieval • Existing searching solutions treat books as plaintext collections • Inaccurate and imprecise book search results • Solution • Books are different from web pages • Structural semantics and logical connections in their content for searching, ranking and recommendations • Fine-grained access to information in books e.g. tables, figures ComputerScienceOnline Conference2016 3
  • 4. Introduction • Web Information Retrieval • Rich text collections with explicit hypertextual structure • Used in searching and ranking web pages • Books lack this graph-like structure – Problem • Books are well-organized and logically connected • Presenting a graph-like structure – can be used in searching, ranking, and recommending books • But visible to Human readers only • Problem – Need to be machine understandable and processable http://talk.payloadz.com/wp- content/uploads/2013/10/Selling-Books-Online-660x320.jpg ComputerScienceOnline Conference2016 4
  • 5. Introduction • Solution – Semantic Book Search Engine • What is Required? • A more in-depth and comprehensive book structure ontology • Domain level ontologies to understand book contents in different domains • Connecting books in graph-like manner • Why? • Better searching, ranking, and recommendations • Increase user satisfaction • Promoting objectives of other stakeholders ComputerScienceOnline Conference2016 5
  • 6. Survey of the Literature • Extracting Structure & Indexing Books • Many Research Initiatives and Conferences • INEX, ICDAR, and BooksOnline • Indexing books’ valuable parts [2]. • Book layout analysis for extracting TOC [3] and other parts [8] • Resurgence software for detecting different parts [4-6] • Rule-based and SVM-based methods extracting TOC [7] • Detecting and parsing TOC pages [9], index pages [9] through classical methods [10, 11] and using trailing page whitespace methods [9] • Required • Connecting book title with other parts • Better book indexing, ranking and recommendations ComputerScienceOnline Conference2016 6
  • 7. Survey of the Literature • Searching and Ranking Books • Ranking authors by expert finding to rank books [12] • “Authors capture an important aspect of relevance [12]” • Read books written by popular experts in the field • No bags-of-words models • Ranking by what is actually inside books [13] • Thesaurus, reference works and ontologies • Helping readers in getting useful insights into text and decide about the relevancy of the book ComputerScienceOnline Conference2016 7 www.vectastock.com
  • 8. Survey of the Literature • Searching and Ranking Books • Digitized Books • By combining and comparing scores for book headings, TOC and book titles [2]. • Digitization Projects – Limited/No Ranking • Project Gutenberg – sorting results • Google Books – 100 (unknown) ranking signals [1] • Google Patents [15,16] – Not implemented YET • Books could be connected through references [14] – Limited • Need • Using Semantic Web and Ontologies ComputerScienceOnline Conference2016 8 prepa3.sems.udg.mx
  • 9. Survey of the Literature • Book Recommendations • Available Recommenders • BReK12 – readability levels of K-12 readers + book contents [21] • BReT – K-12 teachers in finding relevant books for K-12 students [22] • K3Rec – K-3 readers, their parents, and teachers [23] • Using near and partial duplicates, citation analysis, and metadata similarities [24]. • User modeling – information from Social Web [17]. • Book reviews [18, 19]. • Semantic Web and ontologies [25-27] • Limited – Use only book descriptions not the actual content • Required • True content-based semantic book recommender ComputerScienceOnline Conference2016 9 bookshelvesofdoom.blogs.com
  • 10. Survey of the Literature • Fine-grained access to information in books • Retrieving similar and related tables, figures, images, algorithms, equations, quotations, and passages • Augmenting tables with different data sources to restore back the lost semantics [28]. • Same is the case with figures and images • CiteSeer – document, author, and table search • Need • Exploitation of book structural semantics and logical connections ComputerScienceOnline Conference2016 10 2.bp.blogspot.com
  • 11. Discussion & Analysis • Indexing books • Multi-field inverted index should be used [29]. • Book search engine should be able to understand • The nature of books, their contents, and user intensions • E.g., fiction and novels, readers may be interested in different stratas including the plot, the idea, and the composition of work [30]. • Required • Semantic indexing by exploiting book structural semantics • Indexing fictions/novels, and • Indexing books using metadata • Book reviews ComputerScienceOnline Conference2016 11
  • 12. Discussion & Analysis • Searching books • Search Engine Results Page (SERP) • Too many relevant and irrelevant results – Information Overload [31] • Required – User Interface • Provide more relevant results • Robust, non-ambiguous, understandable and relevant to information need • Present results in a manner that augments user understanding ComputerScienceOnline Conference2016 12 davidpoulos.com
  • 13. Discussion & Analysis • Ranking and recommending books • Using ontologies and the actual book contents • Exploiting structural semantics and logical connections in book contentss • Problem • Existing ontologies (JeromeDL, and DocBook) are limited in fully describing books • Required • Comprehensive book structure and several domain-level ontologies • Ontology Engineering and Ontology Learning [32] along with involving domain experts ComputerScienceOnline Conference2016 13
  • 14. Discussion & Analysis • Finding Related tables and figures • Table extraction and searching • Summarize, elaborate and compare tables • Interpret tables accurately • Structure and semantic characteristics of book tables of all possible layout variations • Using online knowledge sources in annotating tables [28] • Using ontologies in indexing, searching, and ranking tables • Figure extraction and searching • Relating figures using visual similarities and contextual clues • To retrieve books that present images and figures on a certain concept or topic ComputerScienceOnline Conference2016 14
  • 15. Conclusions • Book Search and Retrieval • Has been focused by research initiatives and academic research • Several retrieval methods have been proposed • Several book ontologies have been developed for indexing, ranking, and recommending books • Still we are miles away from the ideal system • Need • Further research initiatives for discovering book structural semantics and its use in searching, ranking, and recommending books ComputerScienceOnline Conference2016 15
  • 16. Conclusions • Need – Semantic book search engine • Treat books different from other web documents • Use their structural semantics and logical connections in searching, ranking, and recommendations • Comprehensive book structure ontology • Domain-level ontologies • To process book contents in different domains • To create a graph-like structure of books to be used by PageRank type algorithms • To allow fine-grained access to information in books like tables, figures, algorithms, equations, similar passages etc. • To fulfill the information needs of readers and other stakeholders ComputerScienceOnline Conference2016 16

Editor's Notes

  • #7: Indexing books’ valuable parts e.g., chapter, section and subsection headings, table of contents (TOC), index pages and book titles that are obtained from book metadata [2]. Title: first line in the document except the page number TOC and index pages: Looking for key terms e.g., “table of contents”, “contents”, “page”, “index”, and long number of lines that are ending with digits. Failure: first 3000 characters and last 10 pages of the book [2]. What is Required: Needs further research for greater precision and accuracy in book structure detection and extraction Book title can be connected with TOC, chapters, sections, subsections, tables, images, figures, algorithms, procedures, mathematical equations and different related concepts. Resulting in a connected graph Better search, ranking, and recommendations using contextual clues than using simple bags-of-words models and ordinary ranking methods