Data retrieval tools
Dedicated to access information for molecular biologists.
Most widely used are,
1. Entrez
2. DBGET
3. SRS
Each of these allows,
- Text based searching of a no. of linked DBs.(Data Bases)
- Sequence searching.
They differ in,
- The DBs they cover
- How the retrieved information is accessed and presented.
Entrez
- WWW-based data retrieval system.
- Developed by NCBI (National Centre for Biotechnology Information).
- Integrates information held in different DBs.
Data bases covered by Entrez are,
 Nucleic acid - GenBank, RefSeq, PDB.
 Protein seqs - SWISS-PROT, PIR.
 3D structures – MMDB
 Genomes – Many sources
 PopSet – From GenBank
 OMIM – OMIM
 Taxonomy – NCBI taxonomy database
 Books- Bookshelf
 ProbeSet – GEO (Gene Expression Omnibus)
 Literature - PubMed
SRS
SRS is a Sequence Retrieval System
- Data retrieval tool developed by EBI
- Integrates 80 molecular biology DBs
- An Open source software (Can be installed locally)
SRS has an associated scripting language called Icarus
Central resource for molecular biology data
- more than 250 databanks have been indexed. More than 35 SRS servers over the
WWW(world wide)
Data analysis applications server
- 11 protein applications
- 6 nucleic acid applications
- Uniform query interface on the web
History of SRS
1990 - Main author Dr. Thure Etzold
– Development started in EMBL, Heidelberg
1997
– Moved to EBI in Cambridge. Development work was supported by various
grants amongst others from the EMBnet.
1998
– Etzold and his group join LionBiosciences
Information retrieval
– Easy way to retrieve information from sequence and sequence-related
databases
– Possibility to search for multiple words/other criteria
Linkage between different databases
– E.g. Find all primary structures with known three-dimensional structure.
Different types of database in SRS
Sequence & structure
– DNA, protein, three-dimensional structures
Sequence-related
Gene-related
– Genome, mapping, mutations, transcription factors
– SNP
Bibliographic
– Medline, enzyme
User-defined
SRS main toolbar tabs:
Top Page: displays databases in different database groups
Query: displays either the standard or extended query form
Results or “the query manager”: maintains a history of all the results obtained
during a session
Projects or “the project manager”: maintains a history of all queries and views
used during a session
Views: allows a user to define a user specific view for one or more databases
Databanks: contains a list and some facts about the databases available in the
system
Search terms in SRS
SRS indexed fields can be searched using any of the following:
– Single word search
– Multiple word phrases
– Numbers and dates
– Regular expressions
– Wildcards
Data retrieval tools

Data retrieval tools

  • 1.
    Data retrieval tools Dedicatedto access information for molecular biologists. Most widely used are, 1. Entrez 2. DBGET 3. SRS Each of these allows, - Text based searching of a no. of linked DBs.(Data Bases) - Sequence searching. They differ in, - The DBs they cover - How the retrieved information is accessed and presented. Entrez - WWW-based data retrieval system. - Developed by NCBI (National Centre for Biotechnology Information). - Integrates information held in different DBs. Data bases covered by Entrez are,  Nucleic acid - GenBank, RefSeq, PDB.  Protein seqs - SWISS-PROT, PIR.  3D structures – MMDB  Genomes – Many sources  PopSet – From GenBank  OMIM – OMIM  Taxonomy – NCBI taxonomy database  Books- Bookshelf  ProbeSet – GEO (Gene Expression Omnibus)  Literature - PubMed
  • 2.
    SRS SRS is aSequence Retrieval System
  • 3.
    - Data retrievaltool developed by EBI - Integrates 80 molecular biology DBs - An Open source software (Can be installed locally) SRS has an associated scripting language called Icarus Central resource for molecular biology data - more than 250 databanks have been indexed. More than 35 SRS servers over the WWW(world wide) Data analysis applications server - 11 protein applications - 6 nucleic acid applications - Uniform query interface on the web History of SRS 1990 - Main author Dr. Thure Etzold – Development started in EMBL, Heidelberg 1997 – Moved to EBI in Cambridge. Development work was supported by various grants amongst others from the EMBnet. 1998 – Etzold and his group join LionBiosciences Information retrieval – Easy way to retrieve information from sequence and sequence-related databases – Possibility to search for multiple words/other criteria Linkage between different databases – E.g. Find all primary structures with known three-dimensional structure. Different types of database in SRS Sequence & structure – DNA, protein, three-dimensional structures Sequence-related Gene-related – Genome, mapping, mutations, transcription factors – SNP Bibliographic
  • 4.
    – Medline, enzyme User-defined SRSmain toolbar tabs: Top Page: displays databases in different database groups Query: displays either the standard or extended query form Results or “the query manager”: maintains a history of all the results obtained during a session Projects or “the project manager”: maintains a history of all queries and views used during a session Views: allows a user to define a user specific view for one or more databases Databanks: contains a list and some facts about the databases available in the system Search terms in SRS SRS indexed fields can be searched using any of the following: – Single word search – Multiple word phrases – Numbers and dates – Regular expressions – Wildcards