iBioSearch: The Integrated Biological Database Search
Ritu Khare and Yuan An
PROBLEM
Presence, of a very large number of biological Web databases and
their interfaces, makes it difficult for biologists to search for any
biological entity (See Fig. 1). Currently, the only option biologists
have is to search each of these numerous interfaces individually.

WI Metamodel: We observe that all input Web Interfaces (WIs) have an
underlying global model. We created this global model manually and termed
it as the "WI Metamodel". See Fig. 2.
WI: Every Web Interface (WI) can be represented as an instance of the
metamodel.

Fig. 1: Problem - biologist
searching for an entity

META-SEARCH
INTERFACE

GENERATION OF
GLOBAL
BIOLOGICAL WI
SCHEMA

RE
VE
RS

CLUSTERING
SEARCH ENTITIES
AND LABELS

FUTURE WORK

EE
INE
NG

In future, we intend to dynamically update biological databases
repository, maintain semantic mappings when base
databases evolve, translate user queries, and consolidate,
reconcile, and rank the query results using data cleansing and
relevance computing algorithms. In addition to this, our plan
includes performing usability testing of iBioSearch system with
the help of biologists.

ER

MAPPING WI
WITH
METAMODEL

WI MetaModel

ING

We aim to provide a unified search interface with capability of
searching multiple (1000+) biological databases. This interface
would be a representation of the biological search interface
ontology. For finding the global search ontology, we take a novel
approach of reverse engineering individual search interface into a
conceptual model, and then finding an integrated model that would
be consistent with all the interfaces up to a level of significance.

HYPOTHESIS & ASSUMPTIONS

Fig.2: WI Metamodel

www.ischool.drexel.edu

INFORMATION
RETRIEVAL

INFORMATION
EXTRACTION

OUR SOLUTION

OLDB

OLDB

OLDB

The GBWS or ontology could be represented as a meta-search
interface for biologists wherein they can search for most of the
biological entities on several search criteria available on
different databases.
Eventually, we aim to find the answers to other research
questions such as:
1. Differences between commercial and biological databases.
2. Automatic identification of biological search interfaces.
3. Reverse Engineering of a WI into an ER diagram.
4. Integration of multiple ER diagrams
5. Extracting relationships between biological search entities.

METHODOLOGY
Which interface to search?
Which database to access?
What all search criteria do I have?
How many sources to consider?

CURRENT AND PREDICTED RESULTS

OLDB

OLDB

Fig. 3: Methodology

REFERENCES
1. Web Interface (Wis) Collection: Collect WIs to biological databases.
2. Information Extraction: For each WI, extract attributes corresponding to
the WI metamodel. Broadly, a WI can be represented as a collection of
search entities and their respective labels (search criteria).
3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate
the instances of the metamodel. Then, we have a list of search entities and
their respective criteria (labels). For a given search entity Si , there will be
label set (li1, li2, li3,…, lim).
4. Clustering: Find non-overlapping classes of search entities representing
synonyms, and for each class, find a list of non-redundant labels.
5. Generation of GBWS: Eventually, we generate another conceptual model
that we call as a “Global Biological WI Schema“ (GBWS). It would represent
all possible input WIs in a non-redundant manner, and capture matchings
between individual instances of the WI metamodel.

1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from
web pages. Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data , San Diego, California. 337-348.
2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing
a directory of molecular biology databases. Proceedings of the
International Workshop on Data Integration in the Life Sciences 2007
(DILS), Philadelphia, PA.
3. He, B., & Chang, K. C. (2003). Statistical schema matching across web
query interfaces. 2003 ACM SIGMOD International Conference on
Management of Data , San Diego, Californi. 217-228.
4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based
schema matching for web databases by domain-specific query probing.
Thirtieth International Conference on very Large Data Bases, 30, 408 419.

More Related Content

PPTX
Model Organism Linked Data
PPTX
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
TXT
bio data
PPTX
2016 bmdid-mappings
PDF
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
PDF
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
PDF
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
PPTX
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Model Organism Linked Data
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
bio data
2016 bmdid-mappings
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB

What's hot (20)

PPTX
DAS game: how a programmer thinks
PDF
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
PDF
2015 Summer - Araport Project Overview Leaflet
PPTX
Presentation from Code Camp 2017
PDF
FAIR data and the Etsin service
PPTX
Biositemaps: A Framework for Biomedical Resource Discovery
PDF
ICAR 2015 Poster - Araport
PDF
GARNet workshop on Integrating Large Data into Plant Science
PPTX
The Uniform Resource Layer
PDF
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
PPT
NCBO Overview and Biositemaps
DOCX
2016 Summer - Araport Project Overview Leaflet
PPTX
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
PDF
From data to knowledge – the Ondex System for integrating Life Sciences data ...
PPTX
Citing data in research articles: principles, implementation, challenges - an...
PPTX
BibBase Linked Data Triplification Challenge 2010 Presentation
PPTX
Can machines understand the scientific literature
PPT
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
PDF
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
PPTX
Neuroscience as networked science
DAS game: how a programmer thinks
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
2015 Summer - Araport Project Overview Leaflet
Presentation from Code Camp 2017
FAIR data and the Etsin service
Biositemaps: A Framework for Biomedical Resource Discovery
ICAR 2015 Poster - Araport
GARNet workshop on Integrating Large Data into Plant Science
The Uniform Resource Layer
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
NCBO Overview and Biositemaps
2016 Summer - Araport Project Overview Leaflet
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
Citing data in research articles: principles, implementation, challenges - an...
BibBase Linked Data Triplification Challenge 2010 Presentation
Can machines understand the scientific literature
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
Neuroscience as networked science
Ad

Viewers also liked (20)

DOC
Word Document Format
PPTX
Trust or Control ?
PPTX
Dn13 u3 a18_hbra
PPTX
Unwrapping a standard2
PDF
WebShoppers 22ª Edição
PDF
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
PPTX
Summary to cv
PDF
2 bra aktier för den långsiktige
PPTX
Eclass Model
PDF
Collaborative and agile development of mobile applications
PDF
An atlas of_predicted_exotic_gravitational_lenses
PDF
2001 mnras 32-452-instabregions
PDF
1988 a+a 203-355-vrot-massloss
PDF
Star formation history_in_the_smc_the_case_of_ngc602
PDF
Three newly discovered_globular_clusters_in_ngc6822
PPTX
Publicar banners (wordpress)
PPTX
Versão 1.66
Word Document Format
Trust or Control ?
Dn13 u3 a18_hbra
Unwrapping a standard2
WebShoppers 22ª Edição
Exploiting Semantic Structure for Mapping Clinician-specified Form Terms to S...
Summary to cv
2 bra aktier för den långsiktige
Eclass Model
Collaborative and agile development of mobile applications
An atlas of_predicted_exotic_gravitational_lenses
2001 mnras 32-452-instabregions
1988 a+a 203-355-vrot-massloss
Star formation history_in_the_smc_the_case_of_ngc602
Three newly discovered_globular_clusters_in_ngc6822
Publicar banners (wordpress)
Versão 1.66
Ad

Similar to iBioSearch: The Integrated Biological Database Search (20)

PPTX
Web based servers and softwares for genome analysis
PDF
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PDF
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PPT
PPTX
Presentationonline
PDF
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
PDF
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
PDF
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
PDF
NLP_BioAssayPoster
PPTX
2013 nas-ehs-data-integration-dc
PDF
A consistent and efficient graphical User Interface Design and Querying Organ...
PDF
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
PDF
BITS: Overview of important biological databases beyond sequences
PDF
Bioinformatics data mining
PDF
Data Retrieval Systems
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
PPTX
Data retriveal ,srg and dbget
PPTX
DATABASES...............................pptx
Web based servers and softwares for genome analysis
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
Presentationonline
Semantic Conflicts and Solutions in Integration of Fuzzy Relational Databases
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
TWO LEVEL SELF-SUPERVISED RELATION EXTRACTION FROM MEDLINE USING UMLS
NLP_BioAssayPoster
2013 nas-ehs-data-integration-dc
A consistent and efficient graphical User Interface Design and Querying Organ...
IU Data Visualization Class Final Project: Visualizing Missing Species Intera...
BITS: Overview of important biological databases beyond sequences
Bioinformatics data mining
Data Retrieval Systems
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD(www.ijerd.com)International Journal of Engineering Research and Develop...
Data retriveal ,srg and dbget
DATABASES...............................pptx

Recently uploaded (20)

PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PPTX
Microsoft User Copilot Training Slide Deck
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Module 1 Introduction to Web Programming .pptx
future_of_ai_comprehensive_20250822032121.pptx
Training Program for knowledge in solar cell and solar industry
LMS bot: enhanced learning management systems for improved student learning e...
4 layer Arch & Reference Arch of IoT.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Early detection and classification of bone marrow changes in lumbar vertebrae...
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Connector Corner: Transform Unstructured Documents with Agentic Automation
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
SGT Report The Beast Plan and Cyberphysical Systems of Control
Microsoft User Copilot Training Slide Deck
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
Electrocardiogram sequences data analytics and classification using unsupervi...
A symptom-driven medical diagnosis support model based on machine learning te...

iBioSearch: The Integrated Biological Database Search

  • 1. iBioSearch: The Integrated Biological Database Search Ritu Khare and Yuan An PROBLEM Presence, of a very large number of biological Web databases and their interfaces, makes it difficult for biologists to search for any biological entity (See Fig. 1). Currently, the only option biologists have is to search each of these numerous interfaces individually. WI Metamodel: We observe that all input Web Interfaces (WIs) have an underlying global model. We created this global model manually and termed it as the "WI Metamodel". See Fig. 2. WI: Every Web Interface (WI) can be represented as an instance of the metamodel. Fig. 1: Problem - biologist searching for an entity META-SEARCH INTERFACE GENERATION OF GLOBAL BIOLOGICAL WI SCHEMA RE VE RS CLUSTERING SEARCH ENTITIES AND LABELS FUTURE WORK EE INE NG In future, we intend to dynamically update biological databases repository, maintain semantic mappings when base databases evolve, translate user queries, and consolidate, reconcile, and rank the query results using data cleansing and relevance computing algorithms. In addition to this, our plan includes performing usability testing of iBioSearch system with the help of biologists. ER MAPPING WI WITH METAMODEL WI MetaModel ING We aim to provide a unified search interface with capability of searching multiple (1000+) biological databases. This interface would be a representation of the biological search interface ontology. For finding the global search ontology, we take a novel approach of reverse engineering individual search interface into a conceptual model, and then finding an integrated model that would be consistent with all the interfaces up to a level of significance. HYPOTHESIS & ASSUMPTIONS Fig.2: WI Metamodel www.ischool.drexel.edu INFORMATION RETRIEVAL INFORMATION EXTRACTION OUR SOLUTION OLDB OLDB OLDB The GBWS or ontology could be represented as a meta-search interface for biologists wherein they can search for most of the biological entities on several search criteria available on different databases. Eventually, we aim to find the answers to other research questions such as: 1. Differences between commercial and biological databases. 2. Automatic identification of biological search interfaces. 3. Reverse Engineering of a WI into an ER diagram. 4. Integration of multiple ER diagrams 5. Extracting relationships between biological search entities. METHODOLOGY Which interface to search? Which database to access? What all search criteria do I have? How many sources to consider? CURRENT AND PREDICTED RESULTS OLDB OLDB Fig. 3: Methodology REFERENCES 1. Web Interface (Wis) Collection: Collect WIs to biological databases. 2. Information Extraction: For each WI, extract attributes corresponding to the WI metamodel. Broadly, a WI can be represented as a collection of search entities and their respective labels (search criteria). 3. Mapping WI- metamodel: Map each WI to the WI metamodel to generate the instances of the metamodel. Then, we have a list of search entities and their respective criteria (labels). For a given search entity Si , there will be label set (li1, li2, li3,…, lim). 4. Clustering: Find non-overlapping classes of search entities representing synonyms, and for each class, find a list of non-redundant labels. 5. Generation of GBWS: Eventually, we generate another conceptual model that we call as a “Global Biological WI Schema“ (GBWS). It would represent all possible input WIs in a non-redundant manner, and capture matchings between individual instances of the WI metamodel. 1. Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data , San Diego, California. 337-348. 2. Barbosa, L., Tandon, S., & Freire, J. (2007). Automatically constructing a directory of molecular biology databases. Proceedings of the International Workshop on Data Integration in the Life Sciences 2007 (DILS), Philadelphia, PA. 3. He, B., & Chang, K. C. (2003). Statistical schema matching across web query interfaces. 2003 ACM SIGMOD International Conference on Management of Data , San Diego, Californi. 217-228. 4. Wang, J., Wen, J., Lochovsky, F., & Ma, W. (2004). Instance-based schema matching for web databases by domain-specific query probing. Thirtieth International Conference on very Large Data Bases, 30, 408 419.