Biological database

BIOLOGICAL DATABASE
Dr. Nusaifa Beevi.P
Associate Professor & HOD,
PG Department of Botany,
Iqbal College, Peringammala

 BIOLOGICAL DATABASES
 Collection of files containing records of
biological data in machine readable form
 Can be accessed, added, retrieved,
manipulated and modified
 Store, manage, connect and distribute
data
 Data are arranged by sets of rules which
are programmed into software that
manages the data called Database
Management System or DBMS.

 Primary Databases: Contain original data in the
form of primary sequence data or structural data
as submitted by the scientific community.
 Secondary Databases: Contain information that
has been processed and derived from the raw
data available in primary database.eg: PROSITE,
PRINTS, BLOCKS etc..
 Composite Databases: Collect and present data
after comparing and filtering them from different
primary databases and exhibit only the non-
redundant sequences

 Nucleic acid databases: Gen Bank, EMBL,DDBJ
 Protein sequence databases: PIR, Swiss-Prot,
UNIPROT
 Protein structure database: PDB
 Metabolic databases: KEGG

 Composed of a group of nucleotide sequence
entries.
 Data repositories that accept nucleic acid
sequence data and make it freely available to
the public.
 GenBank, EMBL,DDBJ are principal nucleotide
databases.
 All the three are members of the International
Nucleotide Sequence Database Consortium
(INSDC) and interchange data.

 Hosted by National Centre for Biotechnology Information
(NCBI), situated at the campus of US National Institute of
Health, USA.
 Gen Bank offers all publicly available nucleotide
sequences, their protein translation, and their annotated
information.
 It also facilitate direct submission of sequence data by a
user friendly process.
 Researchers from anywhere can submit their data to Gen
Bank.
 An accession number is given to the submitted sequence
and then released to the public database after the quality
assurance check.
 This information can be retrieved using the Entrez
retrieval system.
 We can access the data in NCBI over the internet through
their site, http://www.ncbi.nlm.nih.gov/genbank

 Started in 1986, hosted now at National Institute of
Genetics, Japan.
 Gather data mainly from scientists in Japan and from
researchers all over the world.
 This can also share nucleotide sequence data with Gen
Bank and EMBL.
 About 99% of the nucleotide data in INSDC submitted by
Japanese researchers through DDBJ, and enhances the
quality of INSDC.
 It includes details of sequences, submitters details,
biological significance , and the scientific name and
taxonomy of the organism. In addition, features that
identify coding region, transcription units, mutation sites
etc. are also displayed in a feature table.

 Major activities of the DDBJ include, providing
internationally recognized accession numbers
to sequences, bioinformatics database
management, developing tools for the analysis
and visualization of biological data, and also
conducting courses for beginners to reduce
the complexity in the biological data analysis.
 DDBJ can be accessed through homepage,
http://www.ddbj.nig.ac.jp/.

 European Molecular Biology Laboratory Nucleotide Sequence
Database, first established in 1974.
 Hosted at UK by the EMBL European Bioinformatics Institute.
 EMBL is a non-profit research institution supported by 20
European countries and Australia, for Molecular Biology
Research.
 EMBL collects nucleotide sequence data from individual
researchers, genome sequence projects and patent
applications.
 Sequences are stored in this database as they would exist in
the biological state.
 The stored data correspond to wild type sequences without
mutation or genetic manipulation.
 Accessed through the URL, http://www.ebi.ac.uk/embl

 An array of amino acid sequence entries
arranged according to the identification
number.
 Well known protein sequence databases
available on www are
◦ Swiss-Prot
◦ PIR
◦ UNIPROT

 Developed by the Swiss Institute of Bioinformatics (SIB) and
European Bioinformatics Institute(EBI).
 High quality, manually annotated protein sequence
database created in 1986.
 It provides high level annotations with functions of protein
and post transcriptional modifications.
 It provide all known relevant information about a particular
protein.
 Consists of two sections:- UniProt KB/Swiss-Prot, which is
manually annotated and is reviewed, and Uni
ProtKB/TrEMBL, which is automatically annotated and not
reviewed.
 Available at http://www.expasy.ch/sprot

 Protein Information Resource database
 Established in 1984, by National Biomedical Research
Foundation (NBRF).
 It is an integrated public bioinformatics resource that
support genomic and proteomic research, and scientific
studies.
 It assists researchers in the identification and
interpretation of protein sequence information.
 PIR can be searched for entries or sequence similarity
searches.
 Can be downloaded at
http://www.pir.georgetown.edu/.
 PIR offers a variety of resources mainly oriented to
assist the propagation and standardization of protein
annotation.

 It provide a comprehensive, high quality and
freely accessible resource of protein
sequence.
 Entries are derived from genome sequencing
projects.
 The Uniprot consortium comprises the
European Bioinformatics Institute(EBI),the
Swiss Institute of Bioinformatics(SIB), And the
Protein Information Resourse(PIR).
 Uniprot is composed of four components,
each optimized for different uses.

 1. UniProt Knowledge Base (UniProtKB)- For
extensive curated protein information with
two sections-UniProt KB/Swiss-Prot, which
is manually annotated and is reviewed,
and Uni ProtKB/TrEMBL, which is
automatically annotated and not reviewed.
 2. UniProt Reference Clusters (UniRef)
 3. UniProt Archive (UniParc)
 4. UniProt Metagenomic and Environmental
Sequences (UniMes)

 Many proteins which exhibit a common
evolutionary origin, show structural
similarities.
 Dissimilar proteins exhibit changes in
primary, secondary, teritiary and
quarternary structures.
 Similar or dissimilar protein structure
can be predicted with structure
database.
 These databases store a collection of
three dimensional structures of proteins.

 Understanding the shape of a molecule helps to
understand how it works.
 PDB is the main primary database used for the
prediction of 3D Structures of proteins and nucleic
acids.
 The single world wide archive of structural data.
 Maintained by the Research Collaboratory for structural
bioinformatics (RCSB)
 The data obtained from X-ray chrystallography and
NMR-spectroscopy, are submitted to the PDB.
 Then, these structures are annotated as per the
depositors specifications.
 Freely available and accessed through URL
http://www.pdb.org/

 MODs are also called Organism – specific databases.
They describe genome and other information about
well studied experimental organisms in life sciences.
 They store large volumes of data and allow users to
analyse results and interpret datasets and data they
generated. ( organism of their own interest).
 Examples:
 Fly Base- database of Drosophilla melanogaster
 SGD- Sacharomyces Genome Database
 AGR- Arabidopsis Genome Resource
 HGP- Human Genome Project
 RGD- Rat Genome Database etc…

 Provide information on the biodiversity of a particular
area or group of living organisms.
 They may store genus level information, species level
information, information on nomenclature or any
combination of the three.
 Species 2000
◦ Established in September 1994, by the International Union of
Biological Sciences(IUBS), in co-operation with the committee on
Data for science and technology(CODATA) and the International
Union of Microbiological Sciences(IUMS).
◦ It is a Federation of database organizations working closely with
users, taxonomists and sponsoring agencies.
◦ It plans to create an array of participant global species databases
covering each of the major groups of organisms(plants, animals,
fungi and microbes)
◦ The goal of species 2000 is to provide a uniform and validated
quality index of names of all known species for use as a practical
tool.

Biological database

More Related Content

What's hot (20)

Similar to Biological database (20)

Recently uploaded (20)

Biological database