SlideShare a Scribd company logo
 
FBW 01-03-2012 Wim Van Criekinge RELOADED 2
Inhoud 09:00-11:00 (2s) (A: Theorie) Coup. Links., CL.A1057 16:00-19:00 (2s) (B: Practicum) Coup. Links., CL.PC-C Cursus: 40 € do 16 februari: Geen Les do 23 februari: Geen Les do 1 maart: Recap Bioinformatics I, RDBMS, (Bio)SQL do 8 maart: Web Application Developent (PHP) do 15 maart: MyGenBank do 22 maart: Genome Browsers do 29 maart: Galaxy do 5 april: Geen Les do 12 april: Geen Les do 19 april: Datamining (Tim De Meyer) do 26 april: Textmining (Maté Ongenaert) do 3 mei: Systems Biology (Bart Deplancke) do 10 mei: Les 10 (projectvoorstelling)
Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system  Eg. DAML+OIL Hybrid systems
4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
Rat versus  mouse RBP Rat versus  bacterial lipocalin
 
Sander-Schneider HSSP: homology derived secondary structure
 
About the Syllabus / CD In tegenstelling tot Bioinformatica I is het minder de bedoeling om een overzicht te geven van de verschillende (sub)domeinen in de bioinformatica. Bioinformatica II (Reloaded) schetst een zo accuraat mogelijk beeld van de huidige stand van zaken in het bioinformatica onderzoek. Hiervoor wordt er dieper ingegaan in het gebruik van relationele databeheersystemen en hun praktische implementaties, vandaar de term “ Reloaded ”. Deze methodologie laat toe om grote heterogene datasets, typische voor biologische experimenten die meer en meer op (meta)genoomschaal worden uitgevoerd, te beheersen en gestructureerd op te slaan voor verdere (statistische) analyse. Het tweede van de cursus focust op de verschillende methodieken uit machine learning en kunstmatige intelligentie en hoe deze kunnnen ingeschakelt worden om data, via het gebruik van databanken,  om te zetten tot nieuwe verifieerbare gefundeerde hypothesis die zo hopelijk leiden tot nieuwe kennis.
 
Usage of the databases Annotation searches  -  Search for keywords, authors, features
Usage of the databases Annotation searches  -  Search for keywords, authors, features What is the protein sequence for human insulin? How does the 3D structure of calmodulin look like? What is the genetic location of the cystic fibrosis gene? List all intron sequences in rat.
Usage of the databases Annotation searches  -  Search for keywords, authors, features
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  -  Search for similar sequences
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Is there any known protein sequence that is similar to x? Is this gene known in any other species? Has someone already cloned this sequence?
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Do my protein sequence contain any known motif  (that can give me a clue about the function)? Which known sequences contain this motif? Is any part of my nucleotide sequence recognized  by a transcriptional factor? List all known start, splice and stop signals in my  genomic sequence.
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Predictions  - Using the databases as knowledge databases
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Predictions  - Using the databases as knowledge databases   What may the structure of my protein be?  Secondary structure prediction. Modelling by homology. What is the gene structure of my genomic sequence? Which parts of my protein have a high antigenicity?
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Predictions  - Using the databases as knowledge databases
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Predictions  - Using the databases as knowledge databases   Comparisons
Usage of the databases Annotation searches  - Search for keywords, authors, features Homology (similarity) searches  - Search for similar sequences Pattern searches  - Search for occurrences of patterns Predictions  - Using the databases as knowledge databases   Comparisons Gene families Phylogenetic trees
Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system  Eg. DAML+OIL Hybrid systems
GenBank Format LOCUS  LISOD  756 bp  DNA  BCT  30-JUN-1993 DEFINITION  L.ivanovii sod gene for superoxide dismutase. ACCESSION  X64011.1  GI:37619753 NID  g44010 KEYWORDS  sod gene; superoxide dismutase. SOURCE  Listeria ivanovii. ORGANISM  Listeria ivanovii Eubacteria; Firmicutes; Low G+C gram-positive bacteria; Bacillaceae; Listeria. REFERENCE  1  (bases 1 to 756) AUTHORS  Haas,A. and Goebel,W. TITLE  Cloning of a superoxide dismutase gene from Listeria ivanovii  by functional complementation in Escherichia coli and  characterization of the gene product JOURNAL  Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE  92140371 REFERENCE  2  (bases 1 to 756) AUTHORS  Kreft,J. TITLE  Direct Submission JOURNAL  Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700  Wuerzburg, FRG
FEATURES  Location/Qualifiers source  1..756 /organism="Listeria ivanovii" /strain="ATCC 19119" /db_xref="taxon:1638" RBS  95..100 /gene="sod" gene  95..746 /gene="sod" CDS  109..717 /gene="sod" /EC_number="1.15.1.1" /codon_start=1 /product="superoxide dismutase" /db_xref="PID:g44011" /db_xref="SWISS-PROT:P28763" /transl_table=11 /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKL NEAVSGHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPN GGGAPTGNLKAAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVS TANQDSPLSEGKTPVLGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRF DAAK" terminator  723..746 /gene="sod"
Example of location descriptors Location Description 476 Points to a single base in the presented sequence 340..565   Points to a continuous range of bases bounded by and  including the starting and ending bases <345..500   The exact lower boundary point of a feature is unknown.  (102.110)  Indicates that the exact location is unknown but that it  is one of the bases between bases 102 and 110. (23.45)..600 Specifies that the starting point is one of the bases  between bases 23 and 45, inclusive, and the end base 600  123^124 Points to a site between bases 123 and 124 145^177 Points to a site anywhere between bases 145 and 177 J00193:hladr Points to a feature whose location is described in  another entry: the feature labeled 'hladr' in the  entry (in this database) with primary accession 'J00193'
BASE COUNT  247 a  136 c  151 g  222 t ORIGIN  1  cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat  61  gtaatttctt ttcacataaa taataaacaa tccgaggagg aatttttaat gacttacgaa  121 ttaccaaaat taccttatac ttatgatgct ttggagccga attttgataa agaaacaatg  181 gaaattcact atacaaagca ccacaatatt tatgtaacaa aactaaatga agcagtctca  241 ggacacgcag aacttgcaag taaacctggg gaagaattag ttgctaatct agatagcgtt  301 cctgaagaaa ttcgtggcgc agtacgtaac cacggtggtg gacatgctaa ccatacttta  361 ttctggtcta gtcttagccc aaatggtggt ggtgctccaa ctggtaactt aaaagcagca  421 atcgaaagcg aattcggcac atttgatgaa ttcaaagaaa aattcaatgc ggcagctgcg  481 gctcgttttg gttcaggatg ggcatggcta gtagtgaaca atggtaaact agaaattgtt  541 tccactgcta accaagattc tccacttagc gaaggtaaaa ctccagttct tggcttagat  601 gtttgggaac atgcttatta tcttaaattc caaaaccgtc gtcctgaata cattgacaca  661 ttttggaatg taattaactg ggatgaacga aataaacgct ttgacgcagc aaaataatta  721 tcgaaaggct cacttaggtg ggtcttttta tttcta //
EMBL format ID  LISOD  standard; DNA; PRO; 756 BP.  IDentification XX AC  X64011; S78972;  Accession (Axxxxx, Afxxxxxx), GUID XX NI  g44010  Nucleotide Identifier  --> x.x XX DT  28-APR-1992 (Rel. 31, Created)  DaTe DT  30-JUN-1993 (Rel. 36, Last updated, Version 6) XX DE  L.ivanovii sod gene for superoxide dismutase  DEscription XX. KW  sod gene; superoxide dismutase.  KeyWord XX OS  Listeria ivanovii  Organism Species OC  Eubacteria; Firmicutes; Low G+C gram-positive bacteria; Bacillaceae; OC  Listeria.  Organism Classification XX RN  [1] RA  Haas A., Goebel W.;  Reference RT  &quot;Cloning of a superoxide dismutase gene from Listeria ivanovii by  RT  functional complementation in Escherichia coli and  RT  characterization of the gene product.&quot;;  RL  Mol. Gen. Genet. 231:313-322(1992). XX
Example of a SwissProt entry ID  TNFA_HUMAN  STANDARD;  PRT;  233 AA.  IDentification AC  P01375;  ACcession DT  21-JUL-1986 (REL. 01, CREATED)  DaTe DT  21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE) DT  15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) DE  TUMOR NECROSIS FACTOR PRECURSOR (TNF-ALPHA) (CACHECTIN). GN  TNFA.  Gene name OS  HOMO SAPIENS (HUMAN).  Organism Species OC  EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; OC  EUTHERIA; PRIMATES.  Organism Classification RN  [1]  Reference RP  SEQUENCE FROM N.A. RX  MEDLINE; 87217060. RA  NEDOSPASOV S.A., SHAKHOV A.N., TURETSKAYA R.L., METT V.A., RA  AZIZOV M.M., GEORGIEV G.P., KOROBKO V.G., DOBRYNIN V.N., RA  FILIPPOV S.A., BYSTROV N.S., BOLDYREVA E.F., CHUVPILO S.A., RA  CHUMAKOV A.M., SHINGAROVA L.N., OVCHINNIKOV Y.A.; RL  COLD SPRING HARB. SYMP. QUANT. BIOL. 51:611-624(1986). RN  [2] RP  SEQUENCE FROM N.A. RX  MEDLINE; 85086244. RA  PENNICA D., NEDWIN G.E., HAYFLICK J.S., SEEBURG P.H., DERYNCK R., RA  PALLADINO M.A., KOHR W.J., AGGARWAL B.B., GOEDDEL D.V.; RL  NATURE 312:724-729(1984). ...
CC  -!- FUNCTION: CYTOKINE WITH A WIDE VARIETY OF FUNCTIONS: IT CAN CC  CAUSE CYTOLYSIS OF CERTAIN TUMOR CELL LINES, IT IS IMPLICATED CC  IN THE INDUCTION OF CACHEXIA, IT IS A POTENT PYROGEN CAUSING CC  FEVER BY DIRECT ACTION OR BY STIMULATION OF IL-1 SECRETION, IT CC  CAN STIMULATE CELL PROLIFERATION & INDUCE CELL DIFFERENTIATION CC  UNDER CERTAIN CONDITIONS.   Comments CC  -!- SUBUNIT: HOMOTRIMER. CC  -!- SUBCELLULAR LOCATION: TYPE II MEMBRANE PROTEIN. ALSO EXISTS AS CC  AN EXTRACELLULAR SOLUBLE FORM. CC  -!- PTM: THE SOLUBLE FORM DERIVES FROM THE MEMBRANE FORM BY CC  PROTEOLYTIC PROCESSING. CC  -!- DISEASE: CACHEXIA ACCOMPANIES A VARIETY OF DISEASES, INCLUDING CC  CANCER AND INFECTION, AND IS CHARACTERIZED BY GENERAL ILL CC  HEALTH AND MALNUTRITION. CC  -!- SIMILARITY: BELONGS TO THE TUMOR NECROSIS FACTOR FAMILY. DR  EMBL; X02910; G37210; -.  Database Cross-references DR  EMBL; M16441; G339741; -. DR  EMBL; X01394; G37220; -. DR  EMBL; M10988; G339738; -. DR  EMBL; M26331; G339764; -. DR  EMBL; Z15026; G37212; -. DR  PIR; B23784; QWHUN. DR  PIR; A44189; A44189. DR  PDB; 1TNF; 15-JAN-91. DR  PDB; 2TUN; 31-JAN-94.
KW  CYTOKINE; CYTOTOXIN; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL-ANCHOR; KW  MYRISTYLATION; 3D-STRUCTURE.  KeyWord FT  PROPEP  1  76  Feature Table FT  CHAIN  77  233  TUMOR NECROSIS FACTOR. FT  TRANSMEM  36  56  SIGNAL-ANCHOR (TYPE-II PROTEIN). FT  LIPID  19  19  MYRISTATE. FT  LIPID  20  20  MYRISTATE. FT  DISULFID  145  177 FT  MUTAGEN  105  105  L->S: LOW ACTIVITY. FT  MUTAGEN  108  108  R->W: BIOLOGICALLY INACTIVE. FT  MUTAGEN  112  112  L->F: BIOLOGICALLY INACTIVE. FT  MUTAGEN  162  162  S->F: BIOLOGICALLY INACTIVE. FT  MUTAGEN  167  167  V->A,D: BIOLOGICALLY INACTIVE. FT  MUTAGEN  222  222  E->K: BIOLOGICALLY INACTIVE. FT  CONFLICT  63  63  F -> S (IN REF. 5). FT  STRAND  89  93 FT  TURN  99  100 FT  TURN  109  110 FT  STRAND  112  113 FT  TURN  115  116 FT  STRAND  118  119 FT  STRAND  124  125
FT  STRAND  130  143 FT  STRAND  152  159 FT  STRAND  166  170 FT  STRAND  173  174 FT  TURN  183  184 FT  STRAND  189  202 FT  TURN  204  205 FT  STRAND  207  212 FT  HELIX  215  217 FT  STRAND  218  218 FT  STRAND  227  232 SQ  SEQUENCE  233 AA;  25644 MW;  666D7069 CRC32; MSTESMIRDV ELAEEALPKK TGGPQGSRRC LFLSLFSFLI VAGATTLFCL LHFGVIGPQR EEFPRDLSLI SPLAQAVRSS SRTPSDKPVA HVVANPQAEG QLQWLNRRAN ALLANGVELR DNQLVVPSEG LYLIYSQVLF KGQGCPSTHV LLTHTISRIA VSYQTKVNLL SAIKSPCQRE TPEGAEAKPW YEPIYLGGVF QLEKGDRLSA EINRPDYLDF AESGQVYFGI IAL //
Structure databases Protein Data Bank (PDB) Protein Data Bank  - http://www.rcsb.org/pdb Diffraction  7373 structures determined by X-ray diffraction NMR  388 structures determined by NMR spectroscopy Theoretical Model  201 structures proposed by modeling
PDB
PDB
PDB
PDB
Visualizing Structures Cn3D versie 4.0 (NCBI)
Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system  Eg. DAML+OIL Hybrid systems
Problems with Flat files … Wasted storage space Wasted processing time Data control problems Problems caused by changes to data structures  Access to data difficult Data out of date Constraints are system based Limited querying eg. all single exon GPCRs (<1000 bp)
What is a relational database ? Sets of tables and links (the data) A language to query the datanase (Structured Query Language) A program to manage the data (RDBMS) Flat files are not relational Data type (attribute) is part of the data Record order mateters Multiline records Massive duplication Bv Organism: Homo sapeinsm Eukaryota, … Some records are hierarchical Xrefs Records contain multiple “sub-records” Implecit “Key”
records fields linear file of homogeneous records name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address......................
Terms and concepts: tuple domain attribute key integrity rules
Introduction to Database Systems Historic Background Hierarchical databases (IMS) - IBM 1968 Hierarchical structures between file records Network databases - CODASYL Group 1969 Network structures of record types Linked chains between 'Owner' and 'Member' records Included in Cobol, procedural language - Manual navigation Relational Data Model - E. F. Codd 1970 Mathematical foundation of databases New non-procedural language SQL - Automatic navigation Object-relational databases Object-oriented databases
Relational The Relational model is not only very  mature , but it has developed a strong knowledge on how to make a relational back-end fast and reliable, and how to  exploit different technologies  such as massive SMP, Optical jukeboxes, clustering and etc. Object databases are nowhere near to this, and I do not expect then to get there in the short or medium term.  Relational Databases have a very well-known and proven underlying mathematical theory, a simple one  (the set theory)  that makes possible  automatic cost-based query optimization,  schema generation from high-level models and  many other features that are now vital for mission-critical Information Systems development and operations.
The Benefits of Databases Redundancy can be reduced Inconsistency can be avoid ed Conflicting requirements can be balanced Standards can be enforced Data can be shared Data independence Integrity can be maintained Security restrictions can be applied
Relational Terminology ID NAME PHONE EMP_ID 201 Unisports 55-2066101  12 202 Simms Atheletics 81-20101  14 203 Delhi Sports 91-10351  14 204 Womansport 1-206-104-0103  11 Row  (Tuple) Column  (Attribute) CUSTOMER Table (Relation)
Relational Database Terminology Each row of data in a table is uniquely identified by a  primary key (PK) Information in multiple tables can be logically related by  foreign keys (FK) ID LAST_NAME FIRST_NAME 10 Havel Marta 11 Magee Colin 12 Giljum Henry 14 Nguyen Mai ID NAME PHONE EMP_ID 201 Unisports 55-2066101  12 202 Simms Atheletics 81-20101  14 203 Delhi Sports 91-10351  14 204 Womansport 1-206-104-0103 11 Table Name:  CUSTOMER Table Name:  EMP Primary Key Foreign Key Primary Key
Relational Database Terminology Relational operators Relational select rel   WHERE   boolean-xpr project rel   [   attr-specs   ] join rel   JOIN   rel divide by rel   DIVIDEBY   rel Set-based  rel   UNION  rel    rel   INTERSECT  rel   \ rel   MINUS  rel    rel   TIMES  rel
Disadvantages  size complexity cost Additional hardware costs Higher impact of failure Recovery more difficult
RDBM products Free MySQL, very fast, widely usedm easy to jump into but limited non standard SQL PostrgreSQL – full SQLm limited OO, higher learning curve than MySQL Commercial MS Access – Great query builder, GUI interfaces MS SQL Server – full SQL, NT only Oracle, everything, including the kitchen sink IBM DB2, Sybase
Example 3-tier model in biological database http://www.bioinformatics.be Example of  different interface to the same back-end database (MySQL)
What is the Internet? A network of networks Based on TCP/IP (Transmission Control Protocol/Internet Protocol) Global A variety of services and tools
What is the World Wide Web? The Web presents information as a series of &quot;documents,&quot; often referred to as web pages, that are prepared using the Hypertext Markup Language (HTML).  Using HTML, the document's author can specially code sections of the document to &quot;point&quot; to other information resources. These specially coded sections are referred to as hypertext links.  Users viewing the webpage can select the hypertext link and retrieve or connect to the information resource that the link points to.
What is HTTP? In   Summary : HTTP is an acronym for Hypertext Transfer Protocol. HTTP is the set of rules, or protocol, that enables hypertext data to be transferred from one computer to another, and is based on the client/server principle. Hypertext is text that is coded using the Hypertext Markup Language. These codes and HTTP work together to link resources to each other. HTTP enables users to retrieve a wide variety of resources such as text, graphics, sound, animation and other hypertext documents, and allows hypertext access to other Internet protocols.
What is HTML? Standardized codes Web pages SGML Descriptive markup Tags
What is HTML? HTML stands for Hypertext Markup Language. HTML consists of standardized codes, or &quot;tags&quot;, that are used to define the structure of information on a web page. HTML is used to prepare documents for the World Wide Web. A web page is single a unit of information, often called a document, that is available on the World Wide Web.  HTML defines several aspects of a web page including heading levels, bold, italics, images, paragraph breaks and hypertext links to other resources.
What is HTML? HTML is a sub-language of SGML, or Standard Generalized Markup Language. SGML is a system that defines and standardizes the structure of documents.  Both SGML and HTML utilize descriptive markup to define the structure of an area of text. In general terms, descriptive markup does not specify a particular font or point size for an area of text. Instead, it describes an area of text as a heading or a caption, for example.  Therefore, in HTML, text is marked as a heading, subheading, numbered list, bold, italic, etc.
What is a URL? URLs consist of letters, numbers, and punctuation. The basic structure of a URL is hierarchical, and  the hierarchy moves from left to right: Examples: http://www.healthyway.com:8080/exercise/mtbike.html gopher://gopher.state.edu/ ftp://ftp.company.com/ protocol://server-name.domain-name.top-level domain:port/directory/filename
What is an IP Address? A way to identify machines on the Internet A number Unique Global Standardized
What is an IP Address? If you want to connect to another computer, transfer files to or from another computer, or send an e-mail message, you first need to know where the other computer is - you need the computer's &quot;address.&quot; An IP (Internet Protocol) address is an identifier for a particular machine on a particular network; it is part of a scheme to identify computers on the Internet. IP addresses are also referred to as IP numbers and Internet addresses. An IP address consists of four sections separated by periods. Each section contains a number ranging from 0 to 255. Example = 198.41.0.52
What is an IP Address? The diagram below compares Class A, Class B and Class C IP addresses. The blue numbers represent the network and the red numbers represent hosts on the network. Therefore, a Class A network can support many more hosts than a Class C network.
What is Internet Addressing? Most computers on the Internet have a unique domain name.  Special computers, called domain name servers, look up the domain name and match it to the corresponding IP address so that data can be properly routed to its destination on the Internet.  An example domain name is: healthyway.com Domain names are easier for most people to relate to than a numeric IP address.
What is Internet Addressing? URL stands for Uniform Resource Locator. URLs are used to identify specific sites and files available on the World Wide Web.  The structure of a URL is: protocol://server.subdomain.top-level-domain/directory/filename Not all URLs will have the directory and filename. Two examples: http://www.healthyway.com/exercise/mtbike.html gopher://gopher.state.edu/
What is TCP/IP? A suite of protocols Rules for sending and receiving data across  Networks Addressing Management and verification
What is TCP/IP? TCP/IP stands for Transmission Control Protocol/Internet Protocol. TCP/IP is actually a collection of protocols, or rules, that govern the way data travels from one machine to another across networks. The Internet is based on TCP/IP.
What is TCP/IP? TCP/IP has two major components: TCP and IP. IP: envelopes and addresses the data enables the network to read the envelope and forward the data to its destination defines how much data can fit in a single &quot;envelope&quot; (a packet)
What is TCP/IP? The relationship between data, IP, and networks is often compared to the relationship between a letter, its addressed envelope, and the postal system.
What is TCP/IP? TCP: breaks data up into packets that the network can handle efficiently verifies that all the packets arrive at their destination &quot;reassembles&quot; the data TCP/IP can be compared to moving across country.
What is a packet? Unit of information Data Header Information Routers TCP/IP
What is a packet? A packet is a single unit, or &quot;package&quot;, of data that is sent across a network. Data is broken into packets before it is sent across the Internet. Types of data that are sent across the Internet using packets include:  E-mail messages Files, via File Transfer Protocol (FTP) Web pages, via the World Wide Web (WWW)
What is a packet? In addition to the actual data, packets also contain header information.  The header of a packet contains both the originating and destination IP (Internet Protocol) address. The header also contains coding to handle transmission errors and keep packets flowing.  Header information can be compared to addressing an envelope. Like the header of a packet, an envelope contains the addresses of both the sender and the recipient, in order to keep track of who the envelope is from and who it is going to.
What is a packet? Header information is used by routers to send packets across a network. Routers are computers that are dedicated to &quot;reading&quot; header information and determining which router to send the packet to next. Packets move from router to router until they reach their final destination, in much the same way that an envelope travels between postal substations before reaching the recipient.  The packets that make up data, such as an e-mail message or a web page, will not necessarily all follow the same route to the final destination. The route that a packet travels depends on many variables, including network traffic at that particular moment and the size of the packet being sent.
What is a packet? Transmission Control Protocol/Internet Protocol (TCP/IP) is a set of rules that govern how data is transmitted across networks and the Internet.  TCP/IP utilizes packets to send information across the Internet. TCP and IP have different functions related to packets.
What is a packet? TCP completes the following:  Sends the packets in sequence, so they arrive at their destination in the correct order.  Ensures the integrity of packets. If a packet has been damaged, TCP will request that the damaged packet be resent.
What is a packet? IP completes the following:  Breaks the data into packets. Places header information into the packet, enabling the packet to be forwarded from router to router until it reaches the final destination. Determines how much data can fit into a single packet.
What is a packet? The following diagram illustrates an e-mail message being sent across a network.  1.  Data that makes up an e-mail message is split into packets by the IP portion of TCP/IP. IP also adds header information to each packet. 2.  Using header information in the packets, routers determine the best path for each packet to take to its final destination. 3.  The TCP portion of TCP/IP reassembles the packets in the correct order and ensures that all packets have arrived undamaged. Message is sent
What is Telnet? Telnet is a protocol, or set of rules, that enables one computer to connect to another computer.  This process is also referred to as remote login. The user's computer, which initiates the connection,  is referred to as the local computer, and the machine being connected to, which accepts the connection, is referred to as the remote, or host, computer.  The remote computer can be physically located in the next room, the next town, or in another country.
What is Telnet? Once connected, the user's computer emulates the remote computer.  When the user types in commands, they are executed on the remote computer.  The user's monitor displays what is taking place on the remote computer during the telnet session. The procedure for connecting to a remote computer will depend on how your Internet access is set-up.
What is Telnet? Once a connection to a remote computer is made, instructions or menus may appear.  Some remote machines may require a user to have an account on the machine, and may prompt users for a username and password. Many resources, such as library catalogs, are available via telnet without an account and password. Here is an example taken from a telnet session to Washington University in St. Louis, MO:
SSH Features Command line terminal connection tool Replacement for rsh, rcp, telnet, and others All traffic encrypted Both ends authenticate themselves to the other end Ability to carry and encrypt non-terminal traffic
Brief History SSH.com’s SSH1, originally completely free with source code, then license changed with version 1.2.13 SSH.com’s SSH2, originally only commercial, but now free for some uses. OpenSSH team took the last free SSH1 release, refixed bugs, added features, and added support for the SSH2 protocol.
SSH key background Old way: password stored on server, user supplied password compared to stored version New way: private key kept on client, public key stored on server.
What is FTP? Protocol File transfer Client/Server based  Anonymous FTP File types Compression
What is FTP? FTP stands for File Transfer Protocol, and is part of the TCP/IP protocol suite. It is the protocol, or set of rules, which enables files to be transferred between computers. FTP is a powerful tool which allows files to be transferred from “computer A” to “computer B”, or vice versa.
What is FTP? FTP works on the client/server principle. A client program enables the user to interact with a server in order to access information and services on the server computer.  Files that can be transferred are stored on special computers called FTP servers. To access these files, an FTP client program is used. This is an interface that allows the user to locate the file(s) to be transferred and initiate the transfer process.
What is FTP? The basic steps to use FTP are: Connect to the FTP server Navigate the file structure to find the file you want Transfer the file The specifics of each step will vary, depending on the client program being used and the type of Internet connection.
What is FTP? Anonymous FTP Anonymous FTP allows a user to access a wealth of publicly available information. No special account or password is needed. However, an anonymous FTP site will sometimes ask that users login with the name “anonymous” and use their e-mail address as the password.
What is FTP? There is a wide variety of files that are publicly available through anonymous FTP: Shareware - software that you can use free for a trial period but then pay a fee for Freeware - completely free software, for example fonts, clipart and games Upgrades & Patches - upgrades to current software and “fixes” for software problems Documents - examples include research papers, articles and Internet documentation
What is FTP? Files on FTP servers are often compressed. Compression decreases file size. This enables more files to be stored on the server and makes file transfer times shorter. In order to use a compressed file it needs to be decompressed using appropriate software.  It is a good idea to have current virus checking software on the computer before files are downloaded to it.
 
 
 
 
BioSQL
Conclusions A database is a central component of any contemporary information system The operations on the database and the mainenance of database consistency is handled by a DBMS There exist stand alone query languages or embedded languages but both deal with definition (DDL) and manipulation (DML) aspects The structural properties, constraints and operations permitted within a DBMS are defined by a data model - hierarchical, network, relational Recovery and concurrency control are essential Linking of heterogebous datasources is central theme in modern bioinformatics

More Related Content

What's hot (20)

DOCX
Bioinformatics Final Report
Shruthi Choudary
 
PPT
RML NCBI Resources
Jackie Wirz, PhD
 
PPTX
Ncbi
richierich1011
 
PPTX
Thesis def
Jay Vyas
 
PPTX
NCBI Boot Camp for Beginners Slides
Jackie Wirz, PhD
 
PPTX
T1 2018 bioinformatics
Prof. Wim Van Criekinge
 
PDF
Article
MisbahAlwi
 
PPT
Use of data
Chris Evelo
 
PPTX
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
PPT
NCBI
Kavisa Ghosh
 
PPTX
02.databases slides
Itsme148
 
PPTX
BLAST
Ambika Prajapati
 
PPT
BITs: Genome browsers and interpretation of gene lists.
BITS
 
PPTX
Using biological network approaches for dynamic extension of micronutrient re...
Chris Evelo
 
PPTX
Tools of bioinforformatics by kk
KAUSHAL SAHU
 
PDF
The UCSC genome browser: A Neuroscience focused overview
Victoria Perreau
 
PPTX
BITS training - UCSC Genome Browser - Part 2
BITS
 
PDF
Introduction to Bioinformatics.
Elena Sügis
 
PPTX
PROTEIN DATABASE
naveed ul mushtaq
 
Bioinformatics Final Report
Shruthi Choudary
 
RML NCBI Resources
Jackie Wirz, PhD
 
Thesis def
Jay Vyas
 
NCBI Boot Camp for Beginners Slides
Jackie Wirz, PhD
 
T1 2018 bioinformatics
Prof. Wim Van Criekinge
 
Article
MisbahAlwi
 
Use of data
Chris Evelo
 
WikiPathways: how open source and open data can make omics technology more us...
Chris Evelo
 
02.databases slides
Itsme148
 
BITs: Genome browsers and interpretation of gene lists.
BITS
 
Using biological network approaches for dynamic extension of micronutrient re...
Chris Evelo
 
Tools of bioinforformatics by kk
KAUSHAL SAHU
 
The UCSC genome browser: A Neuroscience focused overview
Victoria Perreau
 
BITS training - UCSC Genome Browser - Part 2
BITS
 
Introduction to Bioinformatics.
Elena Sügis
 
PROTEIN DATABASE
naveed ul mushtaq
 

Viewers also liked (9)

PPTX
Bioinformatica t7-protein structure
Prof. Wim Van Criekinge
 
PPTX
E-Business - Gruppo Beta (Esercitazione)
Enrico Ganzerla
 
PPT
Anuj saxena
anuj_lucky
 
PPTX
Lead Management
gaurav_11
 
PPT
Marriott reward travel style game promo
Lizzi Pix
 
PDF
Hotelcom by channel mobile
ChannelMobile
 
PPT
Presentation mithila saraf
mds535
 
PPT
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
PPT
Bioinformatica 08-12-2011-t8-go-hmm
Prof. Wim Van Criekinge
 
Bioinformatica t7-protein structure
Prof. Wim Van Criekinge
 
E-Business - Gruppo Beta (Esercitazione)
Enrico Ganzerla
 
Anuj saxena
anuj_lucky
 
Lead Management
gaurav_11
 
Marriott reward travel style game promo
Lizzi Pix
 
Hotelcom by channel mobile
ChannelMobile
 
Presentation mithila saraf
mds535
 
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
Bioinformatica 08-12-2011-t8-go-hmm
Prof. Wim Van Criekinge
 
Ad

Similar to 2012 03 01_bioinformatics_ii_les1 (20)

PPTX
2016 02 23_biological_databases_part1
Prof. Wim Van Criekinge
 
PPTX
2019 02 12_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
PPTX
Bioinformatics t2-databases v2014
Prof. Wim Van Criekinge
 
PPTX
Bioinformatica t2-databases
Prof. Wim Van Criekinge
 
PPTX
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
PPT
Project report-on-bio-informatics
Daniela Rotariu
 
PPTX
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
PPTX
Informal presentation on bioinformatics
Atai Rabby
 
PPTX
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
PPTX
2020 02 11_biological_databases_part1
Prof. Wim Van Criekinge
 
PDF
BITS: Overview of important biological databases beyond sequences
BITS
 
PPT
Bioinformatic_Databases and Sequence Analysis
MohamedHasan816582
 
PPTX
Bioinformatics
Arockiyajainmary
 
PPTX
Bioinformatics final
Rainu Rajeev
 
PPT
Bioinformatic_Databases_2.ppt Bioinformatics
MohamedHasan816582
 
PPT
Bioinformatic databases 2
Razzaqe
 
PPT
Bioinformatic databases 2
Razzaqe
 
PPT
Bioinformatic_Databases_2.ppt
NaglaaFathy42
 
PPT
Bioinformatic_Databases_2xcxzczxcxzxcxzc
AdiM27
 
PDF
PDF文档.pdf
SanaKhan250785
 
2016 02 23_biological_databases_part1
Prof. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
Bioinformatics t2-databases v2014
Prof. Wim Van Criekinge
 
Bioinformatica t2-databases
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
Prof. Wim Van Criekinge
 
Project report-on-bio-informatics
Daniela Rotariu
 
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
Informal presentation on bioinformatics
Atai Rabby
 
Major databases in bioinformatics
Vidya Kalaivani Rajkumar
 
2020 02 11_biological_databases_part1
Prof. Wim Van Criekinge
 
BITS: Overview of important biological databases beyond sequences
BITS
 
Bioinformatic_Databases and Sequence Analysis
MohamedHasan816582
 
Bioinformatics
Arockiyajainmary
 
Bioinformatics final
Rainu Rajeev
 
Bioinformatic_Databases_2.ppt Bioinformatics
MohamedHasan816582
 
Bioinformatic databases 2
Razzaqe
 
Bioinformatic databases 2
Razzaqe
 
Bioinformatic_Databases_2.ppt
NaglaaFathy42
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
AdiM27
 
PDF文档.pdf
SanaKhan250785
 
Ad

More from Prof. Wim Van Criekinge (20)

PPTX
2019 03 05_biological_databases_part5_v_upload
Prof. Wim Van Criekinge
 
PPTX
2019 03 05_biological_databases_part3_v_upload
Prof. Wim Van Criekinge
 
PPTX
2019 02 21_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
PPTX
P7 2018 biopython3
Prof. Wim Van Criekinge
 
PPTX
P6 2018 biopython2b
Prof. Wim Van Criekinge
 
PPTX
P4 2018 io_functions
Prof. Wim Van Criekinge
 
PPTX
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
PPTX
P1 2018 python
Prof. Wim Van Criekinge
 
PDF
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
PPTX
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
PPTX
2018 03 27_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
PPTX
2018 03 20_biological_databases_part3
Prof. Wim Van Criekinge
 
PPTX
2018 02 20_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
PPTX
P7 2017 biopython3
Prof. Wim Van Criekinge
 
PPTX
P6 2017 biopython2
Prof. Wim Van Criekinge
 
PPTX
Van criekinge 2017_11_13_rodebiotech
Prof. Wim Van Criekinge
 
PPTX
P4 2017 io
Prof. Wim Van Criekinge
 
PPTX
T5 2017 database_searching_v_upload
Prof. Wim Van Criekinge
 
PPTX
P1 3 2017_python_exercises
Prof. Wim Van Criekinge
 
PPTX
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part5_v_upload
Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
Prof. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
P7 2018 biopython3
Prof. Wim Van Criekinge
 
P6 2018 biopython2b
Prof. Wim Van Criekinge
 
P4 2018 io_functions
Prof. Wim Van Criekinge
 
P3 2018 python_regexes
Prof. Wim Van Criekinge
 
P1 2018 python
Prof. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Prof. Wim Van Criekinge
 
2018 05 08_biological_databases_no_sql
Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
Prof. Wim Van Criekinge
 
2018 03 20_biological_databases_part3
Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
Prof. Wim Van Criekinge
 
P7 2017 biopython3
Prof. Wim Van Criekinge
 
P6 2017 biopython2
Prof. Wim Van Criekinge
 
Van criekinge 2017_11_13_rodebiotech
Prof. Wim Van Criekinge
 
T5 2017 database_searching_v_upload
Prof. Wim Van Criekinge
 
P1 3 2017_python_exercises
Prof. Wim Van Criekinge
 
P3 2017 python_regexes
Prof. Wim Van Criekinge
 

Recently uploaded (20)

PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPTX
How to Manage Promotions in Odoo 18 Sales
Celine George
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PDF
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
DOCX
A summary of SPRING SILKWORMS by Mao Dun.docx
maryjosie1
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PDF
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
PDF
'' IMPORTANCE OF EXCLUSIVE BREAST FEEDING ''
SHAHEEN SHAIKH
 
PDF
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
How to Manage Promotions in Odoo 18 Sales
Celine George
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
People & Earth's Ecosystem -Lesson 2: People & Population
marvinnbustamante1
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
A summary of SPRING SILKWORMS by Mao Dun.docx
maryjosie1
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
IMP NAAC-Reforms-Stakeholder-Consultation-Presentation-on-Draft-Metrics-Unive...
BHARTIWADEKAR
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
community health nursing question paper 2.pdf
Prince kumar
 
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
'' IMPORTANCE OF EXCLUSIVE BREAST FEEDING ''
SHAHEEN SHAIKH
 
BÀI TẬP BỔ TRỢ THEO LESSON TIẾNG ANH - I-LEARN SMART WORLD 7 - CẢ NĂM - CÓ ĐÁ...
Nguyen Thanh Tu Collection
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 

2012 03 01_bioinformatics_ii_les1

  • 1.  
  • 2. FBW 01-03-2012 Wim Van Criekinge RELOADED 2
  • 3. Inhoud 09:00-11:00 (2s) (A: Theorie) Coup. Links., CL.A1057 16:00-19:00 (2s) (B: Practicum) Coup. Links., CL.PC-C Cursus: 40 € do 16 februari: Geen Les do 23 februari: Geen Les do 1 maart: Recap Bioinformatics I, RDBMS, (Bio)SQL do 8 maart: Web Application Developent (PHP) do 15 maart: MyGenBank do 22 maart: Genome Browsers do 29 maart: Galaxy do 5 april: Geen Les do 12 april: Geen Les do 19 april: Datamining (Tim De Meyer) do 26 april: Textmining (Maté Ongenaert) do 3 mei: Systems Biology (Bart Deplancke) do 10 mei: Les 10 (projectvoorstelling)
  • 4. Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system Eg. DAML+OIL Hybrid systems
  • 5. 4 3 2 1 0 A brief history of time (BYA) Origin of life Origin of eukaryotes insects Fungi/animal Plant/animal Earliest fossils BYA
  • 6. Rat versus mouse RBP Rat versus bacterial lipocalin
  • 7.  
  • 8. Sander-Schneider HSSP: homology derived secondary structure
  • 9.  
  • 10. About the Syllabus / CD In tegenstelling tot Bioinformatica I is het minder de bedoeling om een overzicht te geven van de verschillende (sub)domeinen in de bioinformatica. Bioinformatica II (Reloaded) schetst een zo accuraat mogelijk beeld van de huidige stand van zaken in het bioinformatica onderzoek. Hiervoor wordt er dieper ingegaan in het gebruik van relationele databeheersystemen en hun praktische implementaties, vandaar de term “ Reloaded ”. Deze methodologie laat toe om grote heterogene datasets, typische voor biologische experimenten die meer en meer op (meta)genoomschaal worden uitgevoerd, te beheersen en gestructureerd op te slaan voor verdere (statistische) analyse. Het tweede van de cursus focust op de verschillende methodieken uit machine learning en kunstmatige intelligentie en hoe deze kunnnen ingeschakelt worden om data, via het gebruik van databanken,  om te zetten tot nieuwe verifieerbare gefundeerde hypothesis die zo hopelijk leiden tot nieuwe kennis.
  • 11.  
  • 12. Usage of the databases Annotation searches - Search for keywords, authors, features
  • 13. Usage of the databases Annotation searches - Search for keywords, authors, features What is the protein sequence for human insulin? How does the 3D structure of calmodulin look like? What is the genetic location of the cystic fibrosis gene? List all intron sequences in rat.
  • 14. Usage of the databases Annotation searches - Search for keywords, authors, features
  • 15. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences
  • 16. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Is there any known protein sequence that is similar to x? Is this gene known in any other species? Has someone already cloned this sequence?
  • 17. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences
  • 18. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns
  • 19. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Do my protein sequence contain any known motif (that can give me a clue about the function)? Which known sequences contain this motif? Is any part of my nucleotide sequence recognized by a transcriptional factor? List all known start, splice and stop signals in my genomic sequence.
  • 20. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns
  • 21. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases
  • 22. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases What may the structure of my protein be? Secondary structure prediction. Modelling by homology. What is the gene structure of my genomic sequence? Which parts of my protein have a high antigenicity?
  • 23. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases
  • 24. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases Comparisons
  • 25. Usage of the databases Annotation searches - Search for keywords, authors, features Homology (similarity) searches - Search for similar sequences Pattern searches - Search for occurrences of patterns Predictions - Using the databases as knowledge databases Comparisons Gene families Phylogenetic trees
  • 26. Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system Eg. DAML+OIL Hybrid systems
  • 27. GenBank Format LOCUS LISOD 756 bp DNA BCT 30-JUN-1993 DEFINITION L.ivanovii sod gene for superoxide dismutase. ACCESSION X64011.1 GI:37619753 NID g44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii. ORGANISM Listeria ivanovii Eubacteria; Firmicutes; Low G+C gram-positive bacteria; Bacillaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE 92140371 REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
  • 28. FEATURES Location/Qualifiers source 1..756 /organism=&quot;Listeria ivanovii&quot; /strain=&quot;ATCC 19119&quot; /db_xref=&quot;taxon:1638&quot; RBS 95..100 /gene=&quot;sod&quot; gene 95..746 /gene=&quot;sod&quot; CDS 109..717 /gene=&quot;sod&quot; /EC_number=&quot;1.15.1.1&quot; /codon_start=1 /product=&quot;superoxide dismutase&quot; /db_xref=&quot;PID:g44011&quot; /db_xref=&quot;SWISS-PROT:P28763&quot; /transl_table=11 /translation=&quot;MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKL NEAVSGHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPN GGGAPTGNLKAAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVS TANQDSPLSEGKTPVLGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRF DAAK&quot; terminator 723..746 /gene=&quot;sod&quot;
  • 29. Example of location descriptors Location Description 476 Points to a single base in the presented sequence 340..565 Points to a continuous range of bases bounded by and including the starting and ending bases <345..500 The exact lower boundary point of a feature is unknown. (102.110) Indicates that the exact location is unknown but that it is one of the bases between bases 102 and 110. (23.45)..600 Specifies that the starting point is one of the bases between bases 23 and 45, inclusive, and the end base 600 123^124 Points to a site between bases 123 and 124 145^177 Points to a site anywhere between bases 145 and 177 J00193:hladr Points to a feature whose location is described in another entry: the feature labeled 'hladr' in the entry (in this database) with primary accession 'J00193'
  • 30. BASE COUNT 247 a 136 c 151 g 222 t ORIGIN 1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 61 gtaatttctt ttcacataaa taataaacaa tccgaggagg aatttttaat gacttacgaa 121 ttaccaaaat taccttatac ttatgatgct ttggagccga attttgataa agaaacaatg 181 gaaattcact atacaaagca ccacaatatt tatgtaacaa aactaaatga agcagtctca 241 ggacacgcag aacttgcaag taaacctggg gaagaattag ttgctaatct agatagcgtt 301 cctgaagaaa ttcgtggcgc agtacgtaac cacggtggtg gacatgctaa ccatacttta 361 ttctggtcta gtcttagccc aaatggtggt ggtgctccaa ctggtaactt aaaagcagca 421 atcgaaagcg aattcggcac atttgatgaa ttcaaagaaa aattcaatgc ggcagctgcg 481 gctcgttttg gttcaggatg ggcatggcta gtagtgaaca atggtaaact agaaattgtt 541 tccactgcta accaagattc tccacttagc gaaggtaaaa ctccagttct tggcttagat 601 gtttgggaac atgcttatta tcttaaattc caaaaccgtc gtcctgaata cattgacaca 661 ttttggaatg taattaactg ggatgaacga aataaacgct ttgacgcagc aaaataatta 721 tcgaaaggct cacttaggtg ggtcttttta tttcta //
  • 31. EMBL format ID LISOD standard; DNA; PRO; 756 BP. IDentification XX AC X64011; S78972; Accession (Axxxxx, Afxxxxxx), GUID XX NI g44010 Nucleotide Identifier --> x.x XX DT 28-APR-1992 (Rel. 31, Created) DaTe DT 30-JUN-1993 (Rel. 36, Last updated, Version 6) XX DE L.ivanovii sod gene for superoxide dismutase DEscription XX. KW sod gene; superoxide dismutase. KeyWord XX OS Listeria ivanovii Organism Species OC Eubacteria; Firmicutes; Low G+C gram-positive bacteria; Bacillaceae; OC Listeria. Organism Classification XX RN [1] RA Haas A., Goebel W.; Reference RT &quot;Cloning of a superoxide dismutase gene from Listeria ivanovii by RT functional complementation in Escherichia coli and RT characterization of the gene product.&quot;; RL Mol. Gen. Genet. 231:313-322(1992). XX
  • 32. Example of a SwissProt entry ID TNFA_HUMAN STANDARD; PRT; 233 AA. IDentification AC P01375; ACcession DT 21-JUL-1986 (REL. 01, CREATED) DaTe DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE) DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) DE TUMOR NECROSIS FACTOR PRECURSOR (TNF-ALPHA) (CACHECTIN). GN TNFA. Gene name OS HOMO SAPIENS (HUMAN). Organism Species OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; OC EUTHERIA; PRIMATES. Organism Classification RN [1] Reference RP SEQUENCE FROM N.A. RX MEDLINE; 87217060. RA NEDOSPASOV S.A., SHAKHOV A.N., TURETSKAYA R.L., METT V.A., RA AZIZOV M.M., GEORGIEV G.P., KOROBKO V.G., DOBRYNIN V.N., RA FILIPPOV S.A., BYSTROV N.S., BOLDYREVA E.F., CHUVPILO S.A., RA CHUMAKOV A.M., SHINGAROVA L.N., OVCHINNIKOV Y.A.; RL COLD SPRING HARB. SYMP. QUANT. BIOL. 51:611-624(1986). RN [2] RP SEQUENCE FROM N.A. RX MEDLINE; 85086244. RA PENNICA D., NEDWIN G.E., HAYFLICK J.S., SEEBURG P.H., DERYNCK R., RA PALLADINO M.A., KOHR W.J., AGGARWAL B.B., GOEDDEL D.V.; RL NATURE 312:724-729(1984). ...
  • 33. CC -!- FUNCTION: CYTOKINE WITH A WIDE VARIETY OF FUNCTIONS: IT CAN CC CAUSE CYTOLYSIS OF CERTAIN TUMOR CELL LINES, IT IS IMPLICATED CC IN THE INDUCTION OF CACHEXIA, IT IS A POTENT PYROGEN CAUSING CC FEVER BY DIRECT ACTION OR BY STIMULATION OF IL-1 SECRETION, IT CC CAN STIMULATE CELL PROLIFERATION & INDUCE CELL DIFFERENTIATION CC UNDER CERTAIN CONDITIONS. Comments CC -!- SUBUNIT: HOMOTRIMER. CC -!- SUBCELLULAR LOCATION: TYPE II MEMBRANE PROTEIN. ALSO EXISTS AS CC AN EXTRACELLULAR SOLUBLE FORM. CC -!- PTM: THE SOLUBLE FORM DERIVES FROM THE MEMBRANE FORM BY CC PROTEOLYTIC PROCESSING. CC -!- DISEASE: CACHEXIA ACCOMPANIES A VARIETY OF DISEASES, INCLUDING CC CANCER AND INFECTION, AND IS CHARACTERIZED BY GENERAL ILL CC HEALTH AND MALNUTRITION. CC -!- SIMILARITY: BELONGS TO THE TUMOR NECROSIS FACTOR FAMILY. DR EMBL; X02910; G37210; -. Database Cross-references DR EMBL; M16441; G339741; -. DR EMBL; X01394; G37220; -. DR EMBL; M10988; G339738; -. DR EMBL; M26331; G339764; -. DR EMBL; Z15026; G37212; -. DR PIR; B23784; QWHUN. DR PIR; A44189; A44189. DR PDB; 1TNF; 15-JAN-91. DR PDB; 2TUN; 31-JAN-94.
  • 34. KW CYTOKINE; CYTOTOXIN; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL-ANCHOR; KW MYRISTYLATION; 3D-STRUCTURE. KeyWord FT PROPEP 1 76 Feature Table FT CHAIN 77 233 TUMOR NECROSIS FACTOR. FT TRANSMEM 36 56 SIGNAL-ANCHOR (TYPE-II PROTEIN). FT LIPID 19 19 MYRISTATE. FT LIPID 20 20 MYRISTATE. FT DISULFID 145 177 FT MUTAGEN 105 105 L->S: LOW ACTIVITY. FT MUTAGEN 108 108 R->W: BIOLOGICALLY INACTIVE. FT MUTAGEN 112 112 L->F: BIOLOGICALLY INACTIVE. FT MUTAGEN 162 162 S->F: BIOLOGICALLY INACTIVE. FT MUTAGEN 167 167 V->A,D: BIOLOGICALLY INACTIVE. FT MUTAGEN 222 222 E->K: BIOLOGICALLY INACTIVE. FT CONFLICT 63 63 F -> S (IN REF. 5). FT STRAND 89 93 FT TURN 99 100 FT TURN 109 110 FT STRAND 112 113 FT TURN 115 116 FT STRAND 118 119 FT STRAND 124 125
  • 35. FT STRAND 130 143 FT STRAND 152 159 FT STRAND 166 170 FT STRAND 173 174 FT TURN 183 184 FT STRAND 189 202 FT TURN 204 205 FT STRAND 207 212 FT HELIX 215 217 FT STRAND 218 218 FT STRAND 227 232 SQ SEQUENCE 233 AA; 25644 MW; 666D7069 CRC32; MSTESMIRDV ELAEEALPKK TGGPQGSRRC LFLSLFSFLI VAGATTLFCL LHFGVIGPQR EEFPRDLSLI SPLAQAVRSS SRTPSDKPVA HVVANPQAEG QLQWLNRRAN ALLANGVELR DNQLVVPSEG LYLIYSQVLF KGQGCPSTHV LLTHTISRIA VSYQTKVNLL SAIKSPCQRE TPEGAEAKPW YEPIYLGGVF QLEKGDRLSA EINRPDYLDF AESGQVYFGI IAL //
  • 36. Structure databases Protein Data Bank (PDB) Protein Data Bank - http://www.rcsb.org/pdb Diffraction 7373 structures determined by X-ray diffraction NMR 388 structures determined by NMR spectroscopy Theoretical Model 201 structures proposed by modeling
  • 37. PDB
  • 38. PDB
  • 39. PDB
  • 40. PDB
  • 41. Visualizing Structures Cn3D versie 4.0 (NCBI)
  • 42. Les 1 Bioinformatics I Revisited in 5 slides Why bother making databases ? DataBases FF *.txt Indexed version Relational (RDBMS) Access, MySQL, PostGRES, Oracle OO (OODBMS) AceDB, ObjectStore Hierarchical XML Frame based system Eg. DAML+OIL Hybrid systems
  • 43. Problems with Flat files … Wasted storage space Wasted processing time Data control problems Problems caused by changes to data structures Access to data difficult Data out of date Constraints are system based Limited querying eg. all single exon GPCRs (<1000 bp)
  • 44. What is a relational database ? Sets of tables and links (the data) A language to query the datanase (Structured Query Language) A program to manage the data (RDBMS) Flat files are not relational Data type (attribute) is part of the data Record order mateters Multiline records Massive duplication Bv Organism: Homo sapeinsm Eukaryota, … Some records are hierarchical Xrefs Records contain multiple “sub-records” Implecit “Key”
  • 45. records fields linear file of homogeneous records name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address...................... name......................... surname.................... phone........................ address......................
  • 46. Terms and concepts: tuple domain attribute key integrity rules
  • 47. Introduction to Database Systems Historic Background Hierarchical databases (IMS) - IBM 1968 Hierarchical structures between file records Network databases - CODASYL Group 1969 Network structures of record types Linked chains between 'Owner' and 'Member' records Included in Cobol, procedural language - Manual navigation Relational Data Model - E. F. Codd 1970 Mathematical foundation of databases New non-procedural language SQL - Automatic navigation Object-relational databases Object-oriented databases
  • 48. Relational The Relational model is not only very mature , but it has developed a strong knowledge on how to make a relational back-end fast and reliable, and how to exploit different technologies such as massive SMP, Optical jukeboxes, clustering and etc. Object databases are nowhere near to this, and I do not expect then to get there in the short or medium term. Relational Databases have a very well-known and proven underlying mathematical theory, a simple one (the set theory) that makes possible automatic cost-based query optimization, schema generation from high-level models and many other features that are now vital for mission-critical Information Systems development and operations.
  • 49. The Benefits of Databases Redundancy can be reduced Inconsistency can be avoid ed Conflicting requirements can be balanced Standards can be enforced Data can be shared Data independence Integrity can be maintained Security restrictions can be applied
  • 50. Relational Terminology ID NAME PHONE EMP_ID 201 Unisports 55-2066101 12 202 Simms Atheletics 81-20101 14 203 Delhi Sports 91-10351 14 204 Womansport 1-206-104-0103 11 Row (Tuple) Column (Attribute) CUSTOMER Table (Relation)
  • 51. Relational Database Terminology Each row of data in a table is uniquely identified by a primary key (PK) Information in multiple tables can be logically related by foreign keys (FK) ID LAST_NAME FIRST_NAME 10 Havel Marta 11 Magee Colin 12 Giljum Henry 14 Nguyen Mai ID NAME PHONE EMP_ID 201 Unisports 55-2066101 12 202 Simms Atheletics 81-20101 14 203 Delhi Sports 91-10351 14 204 Womansport 1-206-104-0103 11 Table Name: CUSTOMER Table Name: EMP Primary Key Foreign Key Primary Key
  • 52. Relational Database Terminology Relational operators Relational select rel WHERE boolean-xpr project rel [ attr-specs ] join rel JOIN rel divide by rel DIVIDEBY rel Set-based  rel UNION rel  rel INTERSECT rel \ rel MINUS rel  rel TIMES rel
  • 53. Disadvantages size complexity cost Additional hardware costs Higher impact of failure Recovery more difficult
  • 54. RDBM products Free MySQL, very fast, widely usedm easy to jump into but limited non standard SQL PostrgreSQL – full SQLm limited OO, higher learning curve than MySQL Commercial MS Access – Great query builder, GUI interfaces MS SQL Server – full SQL, NT only Oracle, everything, including the kitchen sink IBM DB2, Sybase
  • 55. Example 3-tier model in biological database http://www.bioinformatics.be Example of different interface to the same back-end database (MySQL)
  • 56. What is the Internet? A network of networks Based on TCP/IP (Transmission Control Protocol/Internet Protocol) Global A variety of services and tools
  • 57. What is the World Wide Web? The Web presents information as a series of &quot;documents,&quot; often referred to as web pages, that are prepared using the Hypertext Markup Language (HTML). Using HTML, the document's author can specially code sections of the document to &quot;point&quot; to other information resources. These specially coded sections are referred to as hypertext links. Users viewing the webpage can select the hypertext link and retrieve or connect to the information resource that the link points to.
  • 58. What is HTTP? In Summary : HTTP is an acronym for Hypertext Transfer Protocol. HTTP is the set of rules, or protocol, that enables hypertext data to be transferred from one computer to another, and is based on the client/server principle. Hypertext is text that is coded using the Hypertext Markup Language. These codes and HTTP work together to link resources to each other. HTTP enables users to retrieve a wide variety of resources such as text, graphics, sound, animation and other hypertext documents, and allows hypertext access to other Internet protocols.
  • 59. What is HTML? Standardized codes Web pages SGML Descriptive markup Tags
  • 60. What is HTML? HTML stands for Hypertext Markup Language. HTML consists of standardized codes, or &quot;tags&quot;, that are used to define the structure of information on a web page. HTML is used to prepare documents for the World Wide Web. A web page is single a unit of information, often called a document, that is available on the World Wide Web. HTML defines several aspects of a web page including heading levels, bold, italics, images, paragraph breaks and hypertext links to other resources.
  • 61. What is HTML? HTML is a sub-language of SGML, or Standard Generalized Markup Language. SGML is a system that defines and standardizes the structure of documents. Both SGML and HTML utilize descriptive markup to define the structure of an area of text. In general terms, descriptive markup does not specify a particular font or point size for an area of text. Instead, it describes an area of text as a heading or a caption, for example. Therefore, in HTML, text is marked as a heading, subheading, numbered list, bold, italic, etc.
  • 62. What is a URL? URLs consist of letters, numbers, and punctuation. The basic structure of a URL is hierarchical, and the hierarchy moves from left to right: Examples: http://www.healthyway.com:8080/exercise/mtbike.html gopher://gopher.state.edu/ ftp://ftp.company.com/ protocol://server-name.domain-name.top-level domain:port/directory/filename
  • 63. What is an IP Address? A way to identify machines on the Internet A number Unique Global Standardized
  • 64. What is an IP Address? If you want to connect to another computer, transfer files to or from another computer, or send an e-mail message, you first need to know where the other computer is - you need the computer's &quot;address.&quot; An IP (Internet Protocol) address is an identifier for a particular machine on a particular network; it is part of a scheme to identify computers on the Internet. IP addresses are also referred to as IP numbers and Internet addresses. An IP address consists of four sections separated by periods. Each section contains a number ranging from 0 to 255. Example = 198.41.0.52
  • 65. What is an IP Address? The diagram below compares Class A, Class B and Class C IP addresses. The blue numbers represent the network and the red numbers represent hosts on the network. Therefore, a Class A network can support many more hosts than a Class C network.
  • 66. What is Internet Addressing? Most computers on the Internet have a unique domain name. Special computers, called domain name servers, look up the domain name and match it to the corresponding IP address so that data can be properly routed to its destination on the Internet. An example domain name is: healthyway.com Domain names are easier for most people to relate to than a numeric IP address.
  • 67. What is Internet Addressing? URL stands for Uniform Resource Locator. URLs are used to identify specific sites and files available on the World Wide Web. The structure of a URL is: protocol://server.subdomain.top-level-domain/directory/filename Not all URLs will have the directory and filename. Two examples: http://www.healthyway.com/exercise/mtbike.html gopher://gopher.state.edu/
  • 68. What is TCP/IP? A suite of protocols Rules for sending and receiving data across Networks Addressing Management and verification
  • 69. What is TCP/IP? TCP/IP stands for Transmission Control Protocol/Internet Protocol. TCP/IP is actually a collection of protocols, or rules, that govern the way data travels from one machine to another across networks. The Internet is based on TCP/IP.
  • 70. What is TCP/IP? TCP/IP has two major components: TCP and IP. IP: envelopes and addresses the data enables the network to read the envelope and forward the data to its destination defines how much data can fit in a single &quot;envelope&quot; (a packet)
  • 71. What is TCP/IP? The relationship between data, IP, and networks is often compared to the relationship between a letter, its addressed envelope, and the postal system.
  • 72. What is TCP/IP? TCP: breaks data up into packets that the network can handle efficiently verifies that all the packets arrive at their destination &quot;reassembles&quot; the data TCP/IP can be compared to moving across country.
  • 73. What is a packet? Unit of information Data Header Information Routers TCP/IP
  • 74. What is a packet? A packet is a single unit, or &quot;package&quot;, of data that is sent across a network. Data is broken into packets before it is sent across the Internet. Types of data that are sent across the Internet using packets include: E-mail messages Files, via File Transfer Protocol (FTP) Web pages, via the World Wide Web (WWW)
  • 75. What is a packet? In addition to the actual data, packets also contain header information. The header of a packet contains both the originating and destination IP (Internet Protocol) address. The header also contains coding to handle transmission errors and keep packets flowing. Header information can be compared to addressing an envelope. Like the header of a packet, an envelope contains the addresses of both the sender and the recipient, in order to keep track of who the envelope is from and who it is going to.
  • 76. What is a packet? Header information is used by routers to send packets across a network. Routers are computers that are dedicated to &quot;reading&quot; header information and determining which router to send the packet to next. Packets move from router to router until they reach their final destination, in much the same way that an envelope travels between postal substations before reaching the recipient. The packets that make up data, such as an e-mail message or a web page, will not necessarily all follow the same route to the final destination. The route that a packet travels depends on many variables, including network traffic at that particular moment and the size of the packet being sent.
  • 77. What is a packet? Transmission Control Protocol/Internet Protocol (TCP/IP) is a set of rules that govern how data is transmitted across networks and the Internet. TCP/IP utilizes packets to send information across the Internet. TCP and IP have different functions related to packets.
  • 78. What is a packet? TCP completes the following: Sends the packets in sequence, so they arrive at their destination in the correct order. Ensures the integrity of packets. If a packet has been damaged, TCP will request that the damaged packet be resent.
  • 79. What is a packet? IP completes the following: Breaks the data into packets. Places header information into the packet, enabling the packet to be forwarded from router to router until it reaches the final destination. Determines how much data can fit into a single packet.
  • 80. What is a packet? The following diagram illustrates an e-mail message being sent across a network. 1. Data that makes up an e-mail message is split into packets by the IP portion of TCP/IP. IP also adds header information to each packet. 2. Using header information in the packets, routers determine the best path for each packet to take to its final destination. 3. The TCP portion of TCP/IP reassembles the packets in the correct order and ensures that all packets have arrived undamaged. Message is sent
  • 81. What is Telnet? Telnet is a protocol, or set of rules, that enables one computer to connect to another computer. This process is also referred to as remote login. The user's computer, which initiates the connection, is referred to as the local computer, and the machine being connected to, which accepts the connection, is referred to as the remote, or host, computer. The remote computer can be physically located in the next room, the next town, or in another country.
  • 82. What is Telnet? Once connected, the user's computer emulates the remote computer. When the user types in commands, they are executed on the remote computer. The user's monitor displays what is taking place on the remote computer during the telnet session. The procedure for connecting to a remote computer will depend on how your Internet access is set-up.
  • 83. What is Telnet? Once a connection to a remote computer is made, instructions or menus may appear. Some remote machines may require a user to have an account on the machine, and may prompt users for a username and password. Many resources, such as library catalogs, are available via telnet without an account and password. Here is an example taken from a telnet session to Washington University in St. Louis, MO:
  • 84. SSH Features Command line terminal connection tool Replacement for rsh, rcp, telnet, and others All traffic encrypted Both ends authenticate themselves to the other end Ability to carry and encrypt non-terminal traffic
  • 85. Brief History SSH.com’s SSH1, originally completely free with source code, then license changed with version 1.2.13 SSH.com’s SSH2, originally only commercial, but now free for some uses. OpenSSH team took the last free SSH1 release, refixed bugs, added features, and added support for the SSH2 protocol.
  • 86. SSH key background Old way: password stored on server, user supplied password compared to stored version New way: private key kept on client, public key stored on server.
  • 87. What is FTP? Protocol File transfer Client/Server based Anonymous FTP File types Compression
  • 88. What is FTP? FTP stands for File Transfer Protocol, and is part of the TCP/IP protocol suite. It is the protocol, or set of rules, which enables files to be transferred between computers. FTP is a powerful tool which allows files to be transferred from “computer A” to “computer B”, or vice versa.
  • 89. What is FTP? FTP works on the client/server principle. A client program enables the user to interact with a server in order to access information and services on the server computer. Files that can be transferred are stored on special computers called FTP servers. To access these files, an FTP client program is used. This is an interface that allows the user to locate the file(s) to be transferred and initiate the transfer process.
  • 90. What is FTP? The basic steps to use FTP are: Connect to the FTP server Navigate the file structure to find the file you want Transfer the file The specifics of each step will vary, depending on the client program being used and the type of Internet connection.
  • 91. What is FTP? Anonymous FTP Anonymous FTP allows a user to access a wealth of publicly available information. No special account or password is needed. However, an anonymous FTP site will sometimes ask that users login with the name “anonymous” and use their e-mail address as the password.
  • 92. What is FTP? There is a wide variety of files that are publicly available through anonymous FTP: Shareware - software that you can use free for a trial period but then pay a fee for Freeware - completely free software, for example fonts, clipart and games Upgrades & Patches - upgrades to current software and “fixes” for software problems Documents - examples include research papers, articles and Internet documentation
  • 93. What is FTP? Files on FTP servers are often compressed. Compression decreases file size. This enables more files to be stored on the server and makes file transfer times shorter. In order to use a compressed file it needs to be decompressed using appropriate software. It is a good idea to have current virus checking software on the computer before files are downloaded to it.
  • 94.  
  • 95.  
  • 96.  
  • 97.  
  • 99. Conclusions A database is a central component of any contemporary information system The operations on the database and the mainenance of database consistency is handled by a DBMS There exist stand alone query languages or embedded languages but both deal with definition (DDL) and manipulation (DML) aspects The structural properties, constraints and operations permitted within a DBMS are defined by a data model - hierarchical, network, relational Recovery and concurrency control are essential Linking of heterogebous datasources is central theme in modern bioinformatics

Editor's Notes

  • #9: The new curve saturated around 20% for alignments over more than 250 residues --- and for alignments shorter than 11 residues the new equation yielded values above 100%. However, this was acceptable as 100% identity for gragments of 10-11 residues does not imply structural similarity.
  • #44: 5
  • #63: 3
  • #64: 1
  • #65: 2
  • #66: 6
  • #67: 4
  • #68: 7
  • #85: SSH’s first use was as a replacement for rsh, the Unix r emote sh ell application. This tool allowed one to connect to a shell on a remote machine. The tool suffered from two major shortcomings. First, like telnet it sent all traffic in cleartext, meaning that a sniffer tool at any point between the two machines could read all commands sent and replies received. Secondly, the /etc/hosts.equiv and ~/.rhosts files listed trusted machines and users; these could make rsh connections without any further authentication. If an attacker compromised any of these trusted hosts, they would immediately get access to the rsh server with no more effort. Also, if the attacker was successfully able to spoof the IP address of a trusted host, they’d get the same access. SSH encrypts all traffic, including the password or key authentication. It also uses host keys to definitively identify both hosts involved in the communication, getting around man-in-the-middle attacks and IP spoofing.
  • #86: The licensing issue is rather complex; depending on which release of the ssh1 and ssh2 applications you choose: Source code may or may not be available Use may be free or for cost for educational institutions Use may be free or for cost for companies The O’Reilly SSH book covers this in good detail. The SSH1 protocol has some shortcomings that aren’t easily fixed except by using the newer, but incompatible SSH2 protocol. If possible, you should use SSH clients and servers that support SSH2 and prefer it over SSH1 protocol connections.
  • #87: The serious problem with the password approach, whether used with telnet or with ssh, is that the password you need to enter at the client end is stored on the server. Even though it’s stored in an encoded form in /etc/passwd or /etc/shadow, this password can be cracked with brute force once one has access to that file. The difference with the public/private key split is that if an attacker gets the public key stored on the server, that public key cannot be used to get back into the server! Only the private key, kept on the client only, can be used to get into a server with the public key.