SlideShare a Scribd company logo
LECTURE TOPIC: PROTEIN DATABASE
T. ASHOK KUMART. ASHOK KUMAR
HEAD, DEPARTMENT OF BIOINFORMATICSHEAD, DEPARTMENT OF BIOINFORMATICS
NOORUL ISLAM COLLEGE OF ARTS ANDNOORUL ISLAM COLLEGE OF ARTS AND
SCIENCESCIENCE
KUMARACOIL, THUCKALAY - 629180KUMARACOIL, THUCKALAY - 629180
TOPICS COVERED
• Protein Terms & Definitions – Computational biology aspect of protein
• ExPASy – SIB Bioinformatics Resource Portal (http://www.expasy.org)
• UniProt/Swiss-Prot – A comprehensive, non-redundant, expert manually annotated protein
sequence database (http://www.uniprot.org/)
• NBRF/PIR– A comprehensive, non-redundant, expertly manually annotated, fully classified and
extensively cross-referenced protein sequence database (http://pir.georgetown.edu/)
• PDB– A single worldwide repository of information about the 3D structures of large biological
molecules, including proteins and nucleic acids (http://rcsb.org/pdb)
• SCOP– Knowledge-based expert analysis and classification of proteins that are structurally
characterized and deposited in the Protein Data Bank (http://scop2.mrc-lmb.cam.ac.uk/)
• CATH– A hierarchical domain classification of protein structures in the Protein Data Bank
(http://www.cathdb.info/)
• MOTIF – Finds sequence motifs in a query sequence, also provides functional and genomic
information of the found motifs using DBGET and LinkDB as the hyperlinked annotations
(http://www.genome.jp/tools/motif/)
• Pfam – Database of protein HMM profiles that define domain families (http://pfam.xfam.org/)
• PROSITE – Database of protein motifs expressed as patterns or profiles
PROTEIN TERMS & DEFINITIONS
• Protein Sequence – 20 a.a. characters [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] in
sequence
• Protein Structure – 3D of atomic co-ordinates [x-axis, y-axis, z-axis]
• Types of Biological Databases – [Raw Database = Plain text, Object-oriented Database = Table
(Records), Relational Database = Table of tables]
• 3D Atom Model – [Sphere = Atom, Cylinder = Bond, Dotted Line = Bond Interaction]
• Sequence Alignment – [Match = Similar Character, Mismatch = Dissimilar Character, Gap = No
Substitute Character, Word = Sub-string, Sequence = Super-string, Score = Rating, Identity =
Similar in function]
• Motif – Short, conserved sequence associated with a distinct function.
• Domain – Evolutionarily conserved sequence region that corresponds to a structurally
independent 3D unit associated with a particular functional role. It is usually much larger than a
motif.
• Pattern – Sequence with symbol representation for a expression. Example: N{P}[ST]{P}
• Regular Expression – Representation format for a sequence motif, which includes positional
information for conserved and partly conserved residues. Similar to Pattern, but applies to MSA.
• Profile – Scoring matrix that represents a multiple sequence alignment. It contains probability or
EXPASY
• ExPASy (Expert Protein Analysis System) is a bioinformatics resource portal operated by the
Swiss Institute of Bioinformatics (SIB).
• ExPASy was the first website of the life sciences.
• Extensible and integrative portal for accessing many scientific resources, databases and
software tools.
• Wide range of resources in many different domains, such as proteomics, genomics,
phylogeny/evolution, systems biology, population genetics, transcriptomics, etc.
• Proteomics server to analyze protein sequences and structures and 2D Page gel
electrophoresis.
• Databases, online and offline software tools are hosted by different groups of the SIB and
partner institutions. --- CFSSP
• ExPASy references the protein sequence knowledgebase, UniProtKB/Swiss-Prot, and its
computer annotated supplement, UniProtKB/Trembl.
ARCHITECTURE OF UNIPROT/SWISS-PROT
• Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and
annotation data
• The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference
Clusters (UniRef), and the UniProt Archive (UniParc)
• UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository
specifically developed for metagenomic and environmental data
BACKGROUND OF UNIPROT/SWISS-PROT
• UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the Swiss
Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR)
• EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced the
Protein Sequence Database (PIR-PSD)
• Translated EMBL Nucleotide Sequence Data Library (TrEMBL) was originally created because
sequence data was being generated at a pace that exceeded Swiss-Prot's ability to keep up
• PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein
sequences and curated families
http://www.uniprot.org/
UNIPROT/SWISS-PROT
UNIPROT/SWISS-PROT
UNIPROT/SWISS-PROT
UNIPROT/SWISS-PROT FILE FORMAT
Line code Content Occurrence in an entry
ID Identification Once; starts the entry
AC Accession number(s) Once or more
DT Date Three times
DE Description Once or more
GN Gene name(s) Optional
OS Organism species Once or more
OG Organelle Optional
OC Organism classification Once or more
OX Taxonomy cross-reference Once
OH Organism host Optional
RN Reference number Once or more
RP Reference position Once or more
RC Reference comment(s) Optional
RX Reference cross-reference(s) Optional
RG Reference group Once or more (Optional if RA line)
RA Reference authors Once or more (Optional if RG line)
RT Reference title Optional
RL Reference location Once or more
CC Comments or notes Optional
DR Database cross-references Optional
PE Protein existence Once
KW Keywords Optional
FT
Feature table data Once or more in Swiss-Prot, optional in
TrEMBL
SQ Sequence header Once
(blanks) Sequence data Once or more
NBRF/PIR
• The Protein Information Resource (PIR) was established in 1984 by the National Biomedical
Research Foundation (NBRF) as a resource to assist researchers in the identification and
interpretation of protein sequence information.
• In 2002 PIR, along with its international partners, EBI and SIB, were awarded a grant from
NIH to create UniProt, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases.
• As of 2010, PIR offers a wide variety of resources mainly oriented to assist the propagation
and standardization of protein annotation: PRO, iProClass, iProLINK.
http://pir.georgetown.edu/
NBRF/PIR
NBRF/PIR
NBRF/PIR
SEQUENCE RETRIEVED FROM NBRF/PIR IN FASTA FILE
FORMAT
>F7VJQ1 APRIO_HUMAN Alternative prion protein [Homo
sapiens]
MEHWGQPIPGAGQPWRQPLPTSGRWWLGAASWWWLGAASWWWLGAAPWWWLGTASWWWL
G
SRRWHPQSVEQAE
PDB
• The Protein Data Bank (PDB) archive is the single worldwide repository of information about
the 3D structures of large biological molecules, including proteins and nucleic acids.
• The PDB was established in 1971 at Brookhaven National Laboratory (BNL) under the
leadership of Walter Hamilton and originally contained 7 structures.
• In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became
responsible for the management of the PDB.
• In 2003, the wwPDBwas formed to maintain a single PDB archive of macromolecular
structural data that is freely and publicly available to the global community.
• The RCSB PDB supports a website where visitors can perform simple and complex queries
on the data, analyze, and visualize the results.
• Members of wwPDB are: RCSBPDB(USA), PDBe (Europe) and PDBj (Japan), and
Biological Magnetic Resonance Data Bank BMRB(USA).
http://rcsb.org/pdb/
PDB
PDB
PDB
SCOP
• The Structural Classification of Proteins (SCOP) database is a largely manual classification of
protein structural domains based on similarities of their structures and amino acid sequences.
• A motivation for this classification is to determine the evolutionary relationship between
proteins.
• Proteins with the same shapes but having little sequence or functional similarity are placed in
different "superfamilies", and are assumed to have only a very distant common ancestor.
• Proteins having the same shape and some similarity of sequence and/or function are placed in
"families", and are assumed to have a closer common ancestor.
• SCOP has been discontinued and the last official version of SCOP is 1.75. SCOP1.75 is also
known as SCOP2.
• SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph.
• SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node
information.
• SCOP2-graph is a graph-based web tool for display and navigation.
• The source of protein structures is the Protein Data Bank.
HIERARCHICAL STRUCTURE OF SCOP
• The unit of classification of structure in SCOP is the protein domain.
• The levels of SCOP are as follows.
1. Class: Types of folds, e.g., all α, all β, α/β, α+β, α&β, etc.
2. Fold: The different shapes of domains within a class, e.g., 2 helices; antiparallel hairpin, left-handed
twist, etc.
3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant
common ancestor.
4. Family: The domains in a superfamily are grouped into families, which have more recent common
ancestor.
5. Protein domain: The domains in families are grouped into protein domains, which are essentially the
same protein.
6. Species: The domains in "protein domains" are grouped according to species.
7. Domain: It is part of a protein. For simple proteins, it can be the entire protein.
http://scop2.mrc-lmb.cam.ac.uk/
HIERARCHICAL STRUCTURE OF SCOP
OUTPUT OF SCOP
OUTPUT OF SCOP
CATH
• The CATH (Class, Architecture, Topology, and Homologous superfamily) is a semi-
automatic, hierarchical classification of protein domains.
• CATH shares many broad features with its principal rival, SCOP.
• The four main levels of the CATH hierarchy are as follows:
• Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β, α+β, α&β, etc.
• Architecture: high structural similarity but no evidence of homology. Equivalent to a fold
in SCOP.
• Topology: a large-scale grouping of topologies which share particular structural features
• Homologous superfamily: indicative of a demonstrable evolutionary relationship. Equivalent to
the superfamily level of SCOP.
http://www.cathdb.info/
CATH
CATH
MOTIF
• Motif is a search service provided by GenomeNet to search with a protein query
sequence against Motif Libraries.
• Supports several motif databases such as Prosite, BLOCKS, ProDom, Pfam, and
PRINTS.
• Allows you to search protein sequence libraries with your patterns.
• Each residue must be separated with - (minus sign).
• x represents any amino acids.
• [DE] means either D or E.
• {FWY} means any amino acids except for F, W and Y
• A(2,3) means that A appears 2 to 3 times consecutively.
• The pattern string must be terminated with . (period).
For example, C-x-{C}-[DN]-x(2)-C-x(5)-C-C.
• Generates a profile from a set of multiple aligned sequences using PFMake or
HMMBuild, respectively.http://www.genome.jp/tools/motif/
MOTIF
MATCHING MOTIF HITS
PATTERN OF MATCHING MOTIF HITS
PFAM
• The Pfam database is a large collection of protein families, each represented by
multiple sequence alignments and hidden Markov models (HMMs).
• Pfam version 27.0 was produced at the European Bioinformatics Institute using a
sequence database called Pfamseq, which is based on UniProt.
• The descriptions of Pfam families are managed by the general public using
Wikipedia.
• The Pfam database contains information about protein domains and families.
• Pfam-A is the manually curated portion of the database that contains over 10,000
entries.
• Pfam-B contains a large number of small families derived from clusters produced by
an algorithm called ADDA (for automatic generation).
• Pfam-B families can be useful when no Pfam-A families are found (but lower
quality).
http://pfam.xfam.org/
PFAM
PFAM
PROSITE
• PROSITE, a protein domain database for functional characterization and annotation.
• PROSITE consists of entries describing the protein families, domains and functional
sites as well as amino acid patterns and profiles in them.
• PROSITE is manually curated by a team of the Swiss Institute of Bioinformatics and
tightly integrated into Swiss-Prot protein annotation.
• PROSITE is complemented by ProRule, a collection of rules based on profiles and
patterns.
• The rules contain information about biologically meaningful residues, like active
sites, substrate- or co-factor-binding sites, posttranslational modification sites or
disulfide bonds, to help function determination.
http://prosite.expasy.org/
PROSITE
PROSITE
Protein Database
Protein Database

More Related Content

PPTX
Bioinformatics introduction
Hafiz Muhammad Zeeshan Raza
 
PPTX
Kegg
msfbi1521
 
PPTX
Cath
Ramya S
 
PPTX
Protein data bank
Yogesh Joshi
 
PPTX
Chou fasman algorithm for protein structure prediction
Roshan Karunarathna
 
PPTX
Pymol
BioCode Ltd
 
Bioinformatics introduction
Hafiz Muhammad Zeeshan Raza
 
Kegg
msfbi1521
 
Cath
Ramya S
 
Protein data bank
Yogesh Joshi
 
Chou fasman algorithm for protein structure prediction
Roshan Karunarathna
 

What's hot (20)

PPT
Clustal
Benittabenny
 
PPTX
TrEMBL
Ankit Alankar
 
PPTX
Protein database
Rajpal Choudhary
 
PPTX
Structure alignment methods
Samvartika Majumdar
 
PPTX
Biological database
Iqbal college Peringammala TVM
 
PPTX
blast bioinformatics
Sardar Harpreet Kalsi
 
PPTX
Biological databases
Tamanna Syeda
 
PPTX
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
PPTX
Protein data bank
Alichy Sowmya
 
PDF
Sequence alignment
Vidya Kalaivani Rajkumar
 
PDF
Molecular modeling database
Jayati Shrivastava
 
PPTX
Introduction to databases.pptx
sworna kumari chithiraivelu
 
PPTX
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
PPTX
Sequence comparison techniques
ruchibioinfo
 
PPTX
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
PDF
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
SPHStudy
 
PPT
Protein database
KAUSHAL SAHU
 
Clustal
Benittabenny
 
Protein database
Rajpal Choudhary
 
Structure alignment methods
Samvartika Majumdar
 
Biological database
Iqbal college Peringammala TVM
 
blast bioinformatics
Sardar Harpreet Kalsi
 
Biological databases
Tamanna Syeda
 
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
Protein data bank
Alichy Sowmya
 
Sequence alignment
Vidya Kalaivani Rajkumar
 
Molecular modeling database
Jayati Shrivastava
 
Introduction to databases.pptx
sworna kumari chithiraivelu
 
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Sequence comparison techniques
ruchibioinfo
 
Sequence alig Sequence Alignment Pairwise alignment:-
naveed ul mushtaq
 
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
SPHStudy
 
Protein database
KAUSHAL SAHU
 
Ad

Viewers also liked (20)

PPTX
Protein databases
sarumalay
 
PPTX
PROTEIN DATABASE
naveed ul mushtaq
 
PPT
Protein Structure, Databases and Structural Alignment
Saramita De Chakravarti
 
PPTX
databases in bioinformatics
nadeem akhter
 
DOC
Protein databases
bansalaman80
 
PPTX
Protein database ..... of NCBI
Alagppa University
 
PDF
BITS: Basics of sequence analysis
BITS
 
PPT
Biological databases
Prasanthperceptron
 
PPT
Biological databases
Malla Reddy College of Pharmacy
 
PPTX
protein data bank
Mahrosh Un Nisah
 
PPT
PROTEIN STRUCTURE DATABANK
Malvika Bansal
 
PPT
NCBI
Kavisa Ghosh
 
PPTX
Protein Data Bank
Mahrosh Un Nisah
 
PPTX
Interview with NCBI Staff Scientist Carol Scott
wookyluvr
 
PDF
BT631-8-Folds_proteins
Rajesh G
 
PPTX
Protein structure
Pooja Pawar
 
PPTX
Smart Print & Hybrid Database
akipower
 
PDF
UniProtKB/Swiss-Prot:Why sparql?
Jerven Bolleman
 
PPTX
Pistoia Alliance webinar on Antibody structures in the PDB
Pistoia Alliance
 
PPTX
Bioinformatics t2-databases v2014
Prof. Wim Van Criekinge
 
Protein databases
sarumalay
 
PROTEIN DATABASE
naveed ul mushtaq
 
Protein Structure, Databases and Structural Alignment
Saramita De Chakravarti
 
databases in bioinformatics
nadeem akhter
 
Protein databases
bansalaman80
 
Protein database ..... of NCBI
Alagppa University
 
BITS: Basics of sequence analysis
BITS
 
Biological databases
Prasanthperceptron
 
Biological databases
Malla Reddy College of Pharmacy
 
protein data bank
Mahrosh Un Nisah
 
PROTEIN STRUCTURE DATABANK
Malvika Bansal
 
Protein Data Bank
Mahrosh Un Nisah
 
Interview with NCBI Staff Scientist Carol Scott
wookyluvr
 
BT631-8-Folds_proteins
Rajesh G
 
Protein structure
Pooja Pawar
 
Smart Print & Hybrid Database
akipower
 
UniProtKB/Swiss-Prot:Why sparql?
Jerven Bolleman
 
Pistoia Alliance webinar on Antibody structures in the PDB
Pistoia Alliance
 
Bioinformatics t2-databases v2014
Prof. Wim Van Criekinge
 
Ad

Similar to Protein Database (20)

PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
BibiQuinah
 
PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
PPTX
Protein sequence data bases in animals.pptx
MUzairKhan7
 
PPTX
Important protein databases and proteomics softwares
PUNJAB AGRICULTURAL UNIVERSITY, LUDHIANA, 141004, PUNJAB (INDIA)
 
PPTX
Protein databases in Bioinformatics.pptx
SARWATSALEEM1
 
DOCX
Protein sequence databases
Vidya Kalaivani Rajkumar
 
PPT
Bioinformatic_Databases and Sequence Analysis
MohamedHasan816582
 
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
A Biodiction : A Unit of Dr. Divya Sharma
 
PPT
The uni prot knowledgebase
Kew Sama
 
PPT
bioinfomatics
nguyenpg
 
PPT
Bioinformatic_Databases_2.ppt
NaglaaFathy42
 
PPT
Bioinformatic databases 2
Razzaqe
 
PPT
Bioinformatic databases 2
Razzaqe
 
PPT
Bioinformatic_Databases_2xcxzczxcxzxcxzc
AdiM27
 
PPTX
Biological databases
SEKHARREDDYAMBATI
 
PPTX
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
Jaleelkabdul Jaleel
 
PDF
protein sequence database bioinformatics.pdf
nikhilkaliao8
 
PPT
Bioinformatics and Databases in Biological Science
MohamedHasan816582
 
PDF
PDF文档.pdf
SanaKhan250785
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
BibiQuinah
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
Protein sequence data bases in animals.pptx
MUzairKhan7
 
Important protein databases and proteomics softwares
PUNJAB AGRICULTURAL UNIVERSITY, LUDHIANA, 141004, PUNJAB (INDIA)
 
Protein databases in Bioinformatics.pptx
SARWATSALEEM1
 
Protein sequence databases
Vidya Kalaivani Rajkumar
 
Bioinformatic_Databases and Sequence Analysis
MohamedHasan816582
 
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
A Biodiction : A Unit of Dr. Divya Sharma
 
The uni prot knowledgebase
Kew Sama
 
bioinfomatics
nguyenpg
 
Bioinformatic_Databases_2.ppt
NaglaaFathy42
 
Bioinformatic databases 2
Razzaqe
 
Bioinformatic databases 2
Razzaqe
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
AdiM27
 
Biological databases
SEKHARREDDYAMBATI
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Elufer Akram
 
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
Jaleelkabdul Jaleel
 
protein sequence database bioinformatics.pdf
nikhilkaliao8
 
Bioinformatics and Databases in Biological Science
MohamedHasan816582
 
PDF文档.pdf
SanaKhan250785
 

Recently uploaded (20)

PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
CDH. pptx
AneetaSharma15
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
CDH. pptx
AneetaSharma15
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 

Protein Database

  • 1. LECTURE TOPIC: PROTEIN DATABASE T. ASHOK KUMART. ASHOK KUMAR HEAD, DEPARTMENT OF BIOINFORMATICSHEAD, DEPARTMENT OF BIOINFORMATICS NOORUL ISLAM COLLEGE OF ARTS ANDNOORUL ISLAM COLLEGE OF ARTS AND SCIENCESCIENCE KUMARACOIL, THUCKALAY - 629180KUMARACOIL, THUCKALAY - 629180
  • 2. TOPICS COVERED • Protein Terms & Definitions – Computational biology aspect of protein • ExPASy – SIB Bioinformatics Resource Portal (http://www.expasy.org) • UniProt/Swiss-Prot – A comprehensive, non-redundant, expert manually annotated protein sequence database (http://www.uniprot.org/) • NBRF/PIR– A comprehensive, non-redundant, expertly manually annotated, fully classified and extensively cross-referenced protein sequence database (http://pir.georgetown.edu/) • PDB– A single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids (http://rcsb.org/pdb) • SCOP– Knowledge-based expert analysis and classification of proteins that are structurally characterized and deposited in the Protein Data Bank (http://scop2.mrc-lmb.cam.ac.uk/) • CATH– A hierarchical domain classification of protein structures in the Protein Data Bank (http://www.cathdb.info/) • MOTIF – Finds sequence motifs in a query sequence, also provides functional and genomic information of the found motifs using DBGET and LinkDB as the hyperlinked annotations (http://www.genome.jp/tools/motif/) • Pfam – Database of protein HMM profiles that define domain families (http://pfam.xfam.org/) • PROSITE – Database of protein motifs expressed as patterns or profiles
  • 3. PROTEIN TERMS & DEFINITIONS • Protein Sequence – 20 a.a. characters [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] in sequence • Protein Structure – 3D of atomic co-ordinates [x-axis, y-axis, z-axis] • Types of Biological Databases – [Raw Database = Plain text, Object-oriented Database = Table (Records), Relational Database = Table of tables] • 3D Atom Model – [Sphere = Atom, Cylinder = Bond, Dotted Line = Bond Interaction] • Sequence Alignment – [Match = Similar Character, Mismatch = Dissimilar Character, Gap = No Substitute Character, Word = Sub-string, Sequence = Super-string, Score = Rating, Identity = Similar in function] • Motif – Short, conserved sequence associated with a distinct function. • Domain – Evolutionarily conserved sequence region that corresponds to a structurally independent 3D unit associated with a particular functional role. It is usually much larger than a motif. • Pattern – Sequence with symbol representation for a expression. Example: N{P}[ST]{P} • Regular Expression – Representation format for a sequence motif, which includes positional information for conserved and partly conserved residues. Similar to Pattern, but applies to MSA. • Profile – Scoring matrix that represents a multiple sequence alignment. It contains probability or
  • 4. EXPASY • ExPASy (Expert Protein Analysis System) is a bioinformatics resource portal operated by the Swiss Institute of Bioinformatics (SIB). • ExPASy was the first website of the life sciences. • Extensible and integrative portal for accessing many scientific resources, databases and software tools. • Wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. • Proteomics server to analyze protein sequences and structures and 2D Page gel electrophoresis. • Databases, online and offline software tools are hosted by different groups of the SIB and partner institutions. --- CFSSP • ExPASy references the protein sequence knowledgebase, UniProtKB/Swiss-Prot, and its computer annotated supplement, UniProtKB/Trembl.
  • 5. ARCHITECTURE OF UNIPROT/SWISS-PROT • Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data • The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc) • UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data
  • 6. BACKGROUND OF UNIPROT/SWISS-PROT • UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR) • EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced the Protein Sequence Database (PIR-PSD) • Translated EMBL Nucleotide Sequence Data Library (TrEMBL) was originally created because sequence data was being generated at a pace that exceeded Swiss-Prot's ability to keep up • PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families http://www.uniprot.org/
  • 10. UNIPROT/SWISS-PROT FILE FORMAT Line code Content Occurrence in an entry ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if RG line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more
  • 11. NBRF/PIR • The Protein Information Resource (PIR) was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information. • In 2002 PIR, along with its international partners, EBI and SIB, were awarded a grant from NIH to create UniProt, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases. • As of 2010, PIR offers a wide variety of resources mainly oriented to assist the propagation and standardization of protein annotation: PRO, iProClass, iProLINK. http://pir.georgetown.edu/
  • 15. SEQUENCE RETRIEVED FROM NBRF/PIR IN FASTA FILE FORMAT >F7VJQ1 APRIO_HUMAN Alternative prion protein [Homo sapiens] MEHWGQPIPGAGQPWRQPLPTSGRWWLGAASWWWLGAASWWWLGAAPWWWLGTASWWWL G SRRWHPQSVEQAE
  • 16. PDB • The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. • The PDB was established in 1971 at Brookhaven National Laboratory (BNL) under the leadership of Walter Hamilton and originally contained 7 structures. • In 1998, the Research Collaboratory for Structural Bioinformatics (RCSB) became responsible for the management of the PDB. • In 2003, the wwPDBwas formed to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community. • The RCSB PDB supports a website where visitors can perform simple and complex queries on the data, analyze, and visualize the results. • Members of wwPDB are: RCSBPDB(USA), PDBe (Europe) and PDBj (Japan), and Biological Magnetic Resonance Data Bank BMRB(USA). http://rcsb.org/pdb/
  • 17. PDB
  • 18. PDB
  • 19. PDB
  • 20. SCOP • The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. • A motivation for this classification is to determine the evolutionary relationship between proteins. • Proteins with the same shapes but having little sequence or functional similarity are placed in different "superfamilies", and are assumed to have only a very distant common ancestor. • Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor. • SCOP has been discontinued and the last official version of SCOP is 1.75. SCOP1.75 is also known as SCOP2. • SCOP2 offers two different ways for accessing data: SCOP2-browser, and SCOP2-graph. • SCOP2-browser allows navigation in a traditional way by browsing pages displaying the node information. • SCOP2-graph is a graph-based web tool for display and navigation. • The source of protein structures is the Protein Data Bank.
  • 21. HIERARCHICAL STRUCTURE OF SCOP • The unit of classification of structure in SCOP is the protein domain. • The levels of SCOP are as follows. 1. Class: Types of folds, e.g., all α, all β, α/β, α+β, α&β, etc. 2. Fold: The different shapes of domains within a class, e.g., 2 helices; antiparallel hairpin, left-handed twist, etc. 3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor. 4. Family: The domains in a superfamily are grouped into families, which have more recent common ancestor. 5. Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein. 6. Species: The domains in "protein domains" are grouped according to species. 7. Domain: It is part of a protein. For simple proteins, it can be the entire protein. http://scop2.mrc-lmb.cam.ac.uk/
  • 25. CATH • The CATH (Class, Architecture, Topology, and Homologous superfamily) is a semi- automatic, hierarchical classification of protein domains. • CATH shares many broad features with its principal rival, SCOP. • The four main levels of the CATH hierarchy are as follows: • Class: the overall secondary-structure content of the domain. e.g., all α, all β, α/β, α+β, α&β, etc. • Architecture: high structural similarity but no evidence of homology. Equivalent to a fold in SCOP. • Topology: a large-scale grouping of topologies which share particular structural features • Homologous superfamily: indicative of a demonstrable evolutionary relationship. Equivalent to the superfamily level of SCOP. http://www.cathdb.info/
  • 26. CATH
  • 27. CATH
  • 28. MOTIF • Motif is a search service provided by GenomeNet to search with a protein query sequence against Motif Libraries. • Supports several motif databases such as Prosite, BLOCKS, ProDom, Pfam, and PRINTS. • Allows you to search protein sequence libraries with your patterns. • Each residue must be separated with - (minus sign). • x represents any amino acids. • [DE] means either D or E. • {FWY} means any amino acids except for F, W and Y • A(2,3) means that A appears 2 to 3 times consecutively. • The pattern string must be terminated with . (period). For example, C-x-{C}-[DN]-x(2)-C-x(5)-C-C. • Generates a profile from a set of multiple aligned sequences using PFMake or HMMBuild, respectively.http://www.genome.jp/tools/motif/
  • 29. MOTIF
  • 31. PATTERN OF MATCHING MOTIF HITS
  • 32. PFAM • The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). • Pfam version 27.0 was produced at the European Bioinformatics Institute using a sequence database called Pfamseq, which is based on UniProt. • The descriptions of Pfam families are managed by the general public using Wikipedia. • The Pfam database contains information about protein domains and families. • Pfam-A is the manually curated portion of the database that contains over 10,000 entries. • Pfam-B contains a large number of small families derived from clusters produced by an algorithm called ADDA (for automatic generation). • Pfam-B families can be useful when no Pfam-A families are found (but lower quality). http://pfam.xfam.org/
  • 33. PFAM
  • 34. PFAM
  • 35. PROSITE • PROSITE, a protein domain database for functional characterization and annotation. • PROSITE consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. • PROSITE is manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. • PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns. • The rules contain information about biologically meaningful residues, like active sites, substrate- or co-factor-binding sites, posttranslational modification sites or disulfide bonds, to help function determination. http://prosite.expasy.org/