SlideShare a Scribd company logo
4
Most read
5
Most read
6
Most read
GenBank Databases
Hafiz.M.Zeeshan.Raza
Research Associate_HEC_NRPU
hafizraza26@gmail.com
COMSATS UNIVERSITY SAHIWAL
Overview
• Introduction
• Sections of Database
• Importance of GenBank
Historical background
• The first major bioinformatics project was undertaken by Margaret Dayhoff in 1965,
who developed a first protein sequence database called Atlas of Protein Sequence
and Structure.
• Subsequently, in the early 1970s, the Brookhaven National Laboratory established
the Protein Data Bank for archiving three-dimensional protein structures.
• The first sequence alignment algorithm was developed by Needleman and Wunsch
in 1970. This was a fundamental step in the development of the field of
bioinformatics, which paved the way for the routine sequence comparisons and
database searching practiced by modern biologists.
• The 1980s saw the establishment of GenBank and the development of fast database
searching algorithms such as FASTA by William Pearson and BLAST by Stephen
Altschul and coworkers.
Introduction
• GenBank is the most complete collection of annotated nucleic acid
sequence data for almost every organism.
• The content includes genomic DNA, mRNA, cDNA, ESTs, high throughput
raw sequence data, and sequence polymorphisms.
• There is also a GenPept database for protein sequences, the majority of
which are conceptual translations from DNA sequences, although a small
number of the amino acid sequences are derived using peptide
sequencing techniques.
How to search GenBank
• There are two ways to search for
sequences in GenBank.
• One is using text-based keywords
similar to a PubMed search.
• The other is using molecular sequences
to search by sequence similarity using
BLAST.
GenBank Sequence Format
• To search GenBank effectively using the text-based method requires an
understanding of the GenBank sequence format.
• GenBank is a relational database. However, the search output for
sequence files is produced as flat files for easy reading.
• The resulting flat files contain three sections; Header, Features, and
Sequence entry.
• There are many fields in the Header and Features sections. Each field has
an unique identifier for easy indexing by computer software.
• Understanding the structure of the GenBank files helps in designing
effective search strategies.
Gen bank databases
1st section…Header Part
• The line, “DEFINITION,” provides the summary information for the
sequence record including the name of the sequence, the name and
taxonomy of the source organism if known, and whether the sequence is
complete or partial.
• This is followed by an accession number for the sequence, which is a
unique number assigned to a piece of DNA when it was first submitted to
GenBank and is permanently associated with that sequence.
• This is the number that should be cited in publications. It has two different
formats: two letters with five digits or one letter with six digits.
Continue…
• For a nucleotide sequence that has been translated into a protein sequence, a new
“accession number” is given in the form of a string of alphanumeric characters.
• In addition to the accession number, there is also a version number and a gene
index (gi) number. The purpose of these numbers is to identify the current version
of the sequence.
• If the sequence annotation is revised at a later date, the accession number
remains the same, but the version number is incremented as is the gi number.
• A translated protein sequence also has a different gi number from the DNA
sequence it is derived from.
Continue…
• The next line in the Header section is the “ORGANISM” field,
which includes the source of the organism with the scientific
name of the species and sometimes the tissue type.
• Along with the scientific name is the information of taxonomic
classification of the organism.
• Different levels of the classification are hyperlinked to the
NCBI taxonomy database with more detailed descriptions.
Continue…
• This is followed by the “REFERENCE” field, which provides the publication citation
related to the sequence entry.
• The REFERENCE part includes author and title information of the published work
(or tentative title for unpublished work).
• The “JOURNAL” field includes the citation information as well as the date of
sequence submission.
• The citation is often hyperlinked to the PubMed record for access to the original
literature information.
• The last part of the Header is the contact information of the sequence submitter.
2nd section…Features
• The “Features” section includes annotation information about the gene and gene product, as
well as regions of biological significance reported in the sequence, with identifiers and
qualifiers.
• The “Source” field provides the length of the sequence, the scientific name of the organism,
and the taxonomy identification number. Some optional information includes the clone
source, the tissue type and the cell line.
• The “gene” field is the information about the nucleotide coding sequence and its name. For
DNA entries, there is a “CDS” field, which is information about the boundaries of the
sequence that can be translated into amino acids.
• For eukaryotic DNA, this field also contains information of the locations of exons and
translated protein sequences is entered.
3rd section…Sequence
• The third section of the flat file is the sequence itself starting with the
label “ORIGIN”.
• The format of the sequence display can be changed by choosing options at
a Display pull-down menu at the upper left corner.
• For DNA entries, there is a BASE COUNT report that includes the numbers
of A, G, C, and T in the sequence.
• This section, for both DNA or protein sequences, ends with two forward
slashes (the “//” symbol).
Importance
• In retrieving DNA or protein sequences from GenBank, the search can be limited to
different fields of annotation such as “organism,” “accession number,” “authors,”
and “publication date.”
• One can use a combination of the “Limits” and “Preview/Index” options as
described. Alternatively, a number of search qualifiers can be used, each defining
one of the fields in a GenBank file.
• The qualifiers are similar to but not the same as the field tags in PubMed. For
example, in GenBank, [GENE] represents field for gene name, [AUTH] for author
name, and [ORGN] for organism name.
• Frequently used GenBank qualifiers, which have to be in uppercase and in brackets
Alternative sequence Formats
• In bioinformatics, FASTA format is a text-based format for representing
either nucleotide sequences or peptide sequences, in which nucleotides or amino
acids are represented using single-letter codes.
• FASTA is one of the simplest and the most popular sequence formats because it
contains plain sequence information that is readable by many bioinformatics
analysis programs.
• It has a single definition line that begins with a right angle bracket (>) followed by a
sequence name.
• Sometimes, extra information such as gi number or comments can be given, which
are separated from the sequence name by a “|” symbol.
FASTA Format Sequence
>E01306.1 DNA encoding human insulin-like growth factor I(IGFI)
GAATTCTAACGGTCCCGAAACTCTGTGCGGTG TGAATGGTTGACGCTCTGCAG
TTGTTTGCGGTGACCGTGGTTTTTATTTTAACAAACCCACTGGTTATGGTTCTT
TTCTCGTCGTGCTCCCCAGACTGGTATTGTTGA GAATGCTGCTTTCGTTCTTG
GACCTGCGTCGTCTGGAAATGTATTGCGCTCCCCTGAAACCCGC
• The extra information is considered optional and is ignored by sequence analysis
programs.
• The plain sequence in standard one-letter symbols starts in the second line.
• Each line of sequence data is limited to sixty to eighty characters in width.
• The drawback of this format is that much annotation information is lost.
Gen bank databases

More Related Content

PDF
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
PPTX
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
PPT
Gene bank by kk sahu
KAUSHAL SAHU
 
PDF
Nucleic Acid Sequence databases
Pranavathiyani G
 
PPTX
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
PPTX
History and scope in bioinformatics
KAUSHAL SAHU
 
NCBI National Center for Biotechnology Information
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
European molecular biology laboratory (EMBL)
Hafiz Muhammad Zeeshan Raza
 
Gene bank by kk sahu
KAUSHAL SAHU
 
Nucleic Acid Sequence databases
Pranavathiyani G
 
Introduction OF BIOLOGICAL DATABASE
PrashantSharma807
 
History and scope in bioinformatics
KAUSHAL SAHU
 

What's hot (20)

PPTX
Swiss prot database
sagrika chugh
 
PPTX
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
PPTX
Prosite
Rashi Srivastava
 
PPTX
Blast and fasta
ALLIENU
 
PPTX
Introduction to NCBI
geetikaJethra
 
PPTX
TrEMBL
Ankit Alankar
 
PPTX
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
PPT
Genome annotation 2013
Karan Veer Singh
 
PPTX
BLAST
Anushi Jain
 
DOCX
Protein structure visualization tools-RASMOL
Vidya Kalaivani Rajkumar
 
PPTX
Multiple sequence alignment
Ramya S
 
PDF
EMBL- European Molecular Biology Laboratory
Thapar Institute of Engineering & Technology, Patiala, Punjab, India
 
PPT
Primary and secondary database
KAUSHAL SAHU
 
PPTX
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
PDF
Sequence alignment
Vidya Kalaivani Rajkumar
 
PPTX
Protein Data Bank ( PDB ) - Bioinformatics
karmandeepkaur7
 
PPTX
Metagenomics
berciyalgolda1
 
PPTX
Database in bioinformatics
VinaKhan1
 
Swiss prot database
sagrika chugh
 
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
Blast and fasta
ALLIENU
 
Introduction to NCBI
geetikaJethra
 
Clustal W - Multiple Sequence alignment
The Oxford College Engineering
 
Genome annotation 2013
Karan Veer Singh
 
Protein structure visualization tools-RASMOL
Vidya Kalaivani Rajkumar
 
Multiple sequence alignment
Ramya S
 
Primary and secondary database
KAUSHAL SAHU
 
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Sequence alignment
Vidya Kalaivani Rajkumar
 
Protein Data Bank ( PDB ) - Bioinformatics
karmandeepkaur7
 
Metagenomics
berciyalgolda1
 
Database in bioinformatics
VinaKhan1
 
Ad

Similar to Gen bank databases (20)

PPTX
GenBank Database and its different sections (Bioinformatics)
RitabrataSarkar3
 
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
A Biodiction : A Unit of Dr. Divya Sharma
 
PPTX
Databases_L2.pptx
kigaruantony
 
PPTX
Bioinformatics
ShailendraSinghKhich
 
PPTX
Genomic Databases-.pptx
jyosthsnakattula
 
PPTX
Main bioinfomatics alignment tools.pptx
khadijarafiq2012
 
PPT
Biological databases
Sarfaraz Nasri
 
PPTX
Sequence submission tools ............pptx
Cherry
 
PPTX
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
PPTX
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
PPT
Introduction to Bioinformatics and DatabasesDay1.ppt
khadijarafiq2012
 
PPTX
Structural annotation................pptx
Cherry
 
PPTX
Basic Bioinformatics and Biotechnology.pptx
MohamedHasan816582
 
PPTX
Basic Bioinformatics and computational biology
MohamedHasan816582
 
PDF
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
savidhasam2001
 
PDF
Data Retrieval Systems
Saramita De Chakravarti
 
PPTX
Introduction to databases.pptx
sworna kumari chithiraivelu
 
PPTX
Sequencedatabases
Abhik Seal
 
PPTX
Data retreival system
Shikha Thakur
 
PPT
Intro to databases
bhargvi sharma
 
GenBank Database and its different sections (Bioinformatics)
RitabrataSarkar3
 
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
A Biodiction : A Unit of Dr. Divya Sharma
 
Databases_L2.pptx
kigaruantony
 
Bioinformatics
ShailendraSinghKhich
 
Genomic Databases-.pptx
jyosthsnakattula
 
Main bioinfomatics alignment tools.pptx
khadijarafiq2012
 
Biological databases
Sarfaraz Nasri
 
Sequence submission tools ............pptx
Cherry
 
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
Introduction to Bioinformatics and DatabasesDay1.ppt
khadijarafiq2012
 
Structural annotation................pptx
Cherry
 
Basic Bioinformatics and Biotechnology.pptx
MohamedHasan816582
 
Basic Bioinformatics and computational biology
MohamedHasan816582
 
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
savidhasam2001
 
Data Retrieval Systems
Saramita De Chakravarti
 
Introduction to databases.pptx
sworna kumari chithiraivelu
 
Sequencedatabases
Abhik Seal
 
Data retreival system
Shikha Thakur
 
Intro to databases
bhargvi sharma
 
Ad

More from Hafiz Muhammad Zeeshan Raza (13)

DOCX
Car manufacturing is a complex and fascinating industry that plays a signific...
Hafiz Muhammad Zeeshan Raza
 
DOCX
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Hafiz Muhammad Zeeshan Raza
 
DOCX
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
Hafiz Muhammad Zeeshan Raza
 
PDF
Quality control of sequencing with fast qc obtained with
Hafiz Muhammad Zeeshan Raza
 
PPTX
Cell organelles
Hafiz Muhammad Zeeshan Raza
 
PPTX
Human genome project
Hafiz Muhammad Zeeshan Raza
 
PPTX
Translation & Post Translational Modifications
Hafiz Muhammad Zeeshan Raza
 
PPT
DNA transcription & Post Transcriptional Modification
Hafiz Muhammad Zeeshan Raza
 
PPTX
Recombinant DNA technology
Hafiz Muhammad Zeeshan Raza
 
PPTX
Restriction Fragment Length Polymorphism (RFLP)
Hafiz Muhammad Zeeshan Raza
 
PPTX
Mendeley software beginers
Hafiz Muhammad Zeeshan Raza
 
PPTX
Bioinformatics introduction
Hafiz Muhammad Zeeshan Raza
 
Car manufacturing is a complex and fascinating industry that plays a signific...
Hafiz Muhammad Zeeshan Raza
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Hafiz Muhammad Zeeshan Raza
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
Hafiz Muhammad Zeeshan Raza
 
Quality control of sequencing with fast qc obtained with
Hafiz Muhammad Zeeshan Raza
 
Human genome project
Hafiz Muhammad Zeeshan Raza
 
Translation & Post Translational Modifications
Hafiz Muhammad Zeeshan Raza
 
DNA transcription & Post Transcriptional Modification
Hafiz Muhammad Zeeshan Raza
 
Recombinant DNA technology
Hafiz Muhammad Zeeshan Raza
 
Restriction Fragment Length Polymorphism (RFLP)
Hafiz Muhammad Zeeshan Raza
 
Mendeley software beginers
Hafiz Muhammad Zeeshan Raza
 
Bioinformatics introduction
Hafiz Muhammad Zeeshan Raza
 

Recently uploaded (20)

PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
CDH. pptx
AneetaSharma15
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 

Gen bank databases

  • 2. Overview • Introduction • Sections of Database • Importance of GenBank
  • 3. Historical background • The first major bioinformatics project was undertaken by Margaret Dayhoff in 1965, who developed a first protein sequence database called Atlas of Protein Sequence and Structure. • Subsequently, in the early 1970s, the Brookhaven National Laboratory established the Protein Data Bank for archiving three-dimensional protein structures. • The first sequence alignment algorithm was developed by Needleman and Wunsch in 1970. This was a fundamental step in the development of the field of bioinformatics, which paved the way for the routine sequence comparisons and database searching practiced by modern biologists. • The 1980s saw the establishment of GenBank and the development of fast database searching algorithms such as FASTA by William Pearson and BLAST by Stephen Altschul and coworkers.
  • 4. Introduction • GenBank is the most complete collection of annotated nucleic acid sequence data for almost every organism. • The content includes genomic DNA, mRNA, cDNA, ESTs, high throughput raw sequence data, and sequence polymorphisms. • There is also a GenPept database for protein sequences, the majority of which are conceptual translations from DNA sequences, although a small number of the amino acid sequences are derived using peptide sequencing techniques.
  • 5. How to search GenBank • There are two ways to search for sequences in GenBank. • One is using text-based keywords similar to a PubMed search. • The other is using molecular sequences to search by sequence similarity using BLAST.
  • 6. GenBank Sequence Format • To search GenBank effectively using the text-based method requires an understanding of the GenBank sequence format. • GenBank is a relational database. However, the search output for sequence files is produced as flat files for easy reading. • The resulting flat files contain three sections; Header, Features, and Sequence entry. • There are many fields in the Header and Features sections. Each field has an unique identifier for easy indexing by computer software. • Understanding the structure of the GenBank files helps in designing effective search strategies.
  • 8. 1st section…Header Part • The line, “DEFINITION,” provides the summary information for the sequence record including the name of the sequence, the name and taxonomy of the source organism if known, and whether the sequence is complete or partial. • This is followed by an accession number for the sequence, which is a unique number assigned to a piece of DNA when it was first submitted to GenBank and is permanently associated with that sequence. • This is the number that should be cited in publications. It has two different formats: two letters with five digits or one letter with six digits.
  • 9. Continue… • For a nucleotide sequence that has been translated into a protein sequence, a new “accession number” is given in the form of a string of alphanumeric characters. • In addition to the accession number, there is also a version number and a gene index (gi) number. The purpose of these numbers is to identify the current version of the sequence. • If the sequence annotation is revised at a later date, the accession number remains the same, but the version number is incremented as is the gi number. • A translated protein sequence also has a different gi number from the DNA sequence it is derived from.
  • 10. Continue… • The next line in the Header section is the “ORGANISM” field, which includes the source of the organism with the scientific name of the species and sometimes the tissue type. • Along with the scientific name is the information of taxonomic classification of the organism. • Different levels of the classification are hyperlinked to the NCBI taxonomy database with more detailed descriptions.
  • 11. Continue… • This is followed by the “REFERENCE” field, which provides the publication citation related to the sequence entry. • The REFERENCE part includes author and title information of the published work (or tentative title for unpublished work). • The “JOURNAL” field includes the citation information as well as the date of sequence submission. • The citation is often hyperlinked to the PubMed record for access to the original literature information. • The last part of the Header is the contact information of the sequence submitter.
  • 12. 2nd section…Features • The “Features” section includes annotation information about the gene and gene product, as well as regions of biological significance reported in the sequence, with identifiers and qualifiers. • The “Source” field provides the length of the sequence, the scientific name of the organism, and the taxonomy identification number. Some optional information includes the clone source, the tissue type and the cell line. • The “gene” field is the information about the nucleotide coding sequence and its name. For DNA entries, there is a “CDS” field, which is information about the boundaries of the sequence that can be translated into amino acids. • For eukaryotic DNA, this field also contains information of the locations of exons and translated protein sequences is entered.
  • 13. 3rd section…Sequence • The third section of the flat file is the sequence itself starting with the label “ORIGIN”. • The format of the sequence display can be changed by choosing options at a Display pull-down menu at the upper left corner. • For DNA entries, there is a BASE COUNT report that includes the numbers of A, G, C, and T in the sequence. • This section, for both DNA or protein sequences, ends with two forward slashes (the “//” symbol).
  • 14. Importance • In retrieving DNA or protein sequences from GenBank, the search can be limited to different fields of annotation such as “organism,” “accession number,” “authors,” and “publication date.” • One can use a combination of the “Limits” and “Preview/Index” options as described. Alternatively, a number of search qualifiers can be used, each defining one of the fields in a GenBank file. • The qualifiers are similar to but not the same as the field tags in PubMed. For example, in GenBank, [GENE] represents field for gene name, [AUTH] for author name, and [ORGN] for organism name. • Frequently used GenBank qualifiers, which have to be in uppercase and in brackets
  • 15. Alternative sequence Formats • In bioinformatics, FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. • FASTA is one of the simplest and the most popular sequence formats because it contains plain sequence information that is readable by many bioinformatics analysis programs. • It has a single definition line that begins with a right angle bracket (>) followed by a sequence name. • Sometimes, extra information such as gi number or comments can be given, which are separated from the sequence name by a “|” symbol.
  • 16. FASTA Format Sequence >E01306.1 DNA encoding human insulin-like growth factor I(IGFI) GAATTCTAACGGTCCCGAAACTCTGTGCGGTG TGAATGGTTGACGCTCTGCAG TTGTTTGCGGTGACCGTGGTTTTTATTTTAACAAACCCACTGGTTATGGTTCTT TTCTCGTCGTGCTCCCCAGACTGGTATTGTTGA GAATGCTGCTTTCGTTCTTG GACCTGCGTCGTCTGGAAATGTATTGCGCTCCCCTGAAACCCGC • The extra information is considered optional and is ignored by sequence analysis programs. • The plain sequence in standard one-letter symbols starts in the second line. • Each line of sequence data is limited to sixty to eighty characters in width. • The drawback of this format is that much annotation information is lost.