SlideShare a Scribd company logo
Microbial Metagenomics  Drives a New Cyberinfrastructure Invited Talk  School of Biological Sciences University of California, Irvine March 3, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor,  Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's Center for Earth Observations and Applications at Scripps Institution of Oceanography, will build a state-of-the-art computational resource and develop software tools to decipher the genetic code of communities of microbial life in the world's oceans. The Gordon and Betty Moore Foundation has awarded $24.5 million over seven years to create the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA). Scientists will use CAMERA for metagenomics research -- analyzing microbial genomic sequence data in the context of other microbial species, as well as in comparison to a variety of other "metadata" such as the chemical and physical conditions in which microbes are sampled. The CAMERA project will contain the results of the Venter Institute's Sorcerer II Expedition, which carried out the first large-scale genomic survey of microbial life in the world's oceans to produce the largest gene catalogue ever assembled. Sorcerer II is expected to more than double the number of protein sequences currently available in the National Institutes of Health's GenBank. In addition to Sorcerer II's ecological genomic data, the CAMERA database will be augmented by the full genomes of more than 150 critical marine microbes enabling new comparative genomics studies.
Calit2 Brings Computer Scientists and Engineers  Together with Biomedical Researchers Some Areas of Concentration: Metagenomics Genomic Analysis of Organisms Evolution of Genomes Cancer Genomics Human Genomic Variation and Disease Mitochondrial Evolution Proteomics Computational Biology Information Theory and Biological Systems UC San Diego UC Irvine 1200 Researchers in Two Buildings
Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial World Source: Carl Woese, et al You Are Here Much of Genome Work Has Occurred in Animals
Calit2 Researcher Eskin Collaborates with Perlegen Sciences on Map of Human Genetic Variation Across Populations David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen,  Eran Halperin,  Eleazar Eskin , Dennis G. Ballinger,  Kelly A. Frazer, David R. Cox.  “ Whole-Genome Patterns of Common DNA Variation in Three Human Populations”  Science 18 February, 2005: 307(5712):1072-1079. “ We have characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry.” “ Although knowledge of a single genetic risk factor can seldom be used to predict the treatment outcome of a common disease, knowledge of a large fraction of all the major genetic risk factors contributing to a treatment response or common disease could have immediate utility, allowing  existing treatment options to be matched to individual patients  without requiring additional knowledge of the mechanisms by which the genetic differences lead to different outcomes .” “ More detailed haplotype  analysis results are available at http://research.calit2.net/hap/wgha/ “
For Mitochondrial Diseases It Has Been More Productive   to Classify Patients by Genetic Defect Rather than by Clinical Manifestation Over the past 10 years, mitochondrial defects have been implicated in a wide variety of degenerative diseases, aging, and cancer… The same mtDNA mutation can produce quite different phenotypes,  and different mutations can produce similar phenotypes. … The essential role of mitochondrial oxidative phosphorylation in cellular energy production,  the generation of reactive oxygen species,  and the initiation of apoptosis  has suggested a number of novel mechanisms for mitochondrial pathology. -- Douglas Wallace, Science, Vol. 283, 1482-1488,  5 March 1999
Comparative Genomics Can Reveal Biological Facts That Are Not Visible Within a Species  “ After sequencing these three genomes, it is clear that substantial rearrangements in the human genome happen only once in a million years, while the rate of rearrangements in the rat and mouse is much faster.” --Glenn Tesler, UCSD Dept. of Mathematics www.calit2.net/culture/features/2004/4-1_pevzner.html Co-Authors Pavel Pevzner and Glenn Tesler, UCSD April 1, 2004 December 05, 2002 December 9, 2004
Advanced Algorithmic Techniques  Reveal Unexpected Results “ Many of the chicken–human aligned,  non-coding sequences occur  far from genes, frequently in clusters that seem to be  under selection for functions that are not yet understood.” Nature 432, 695 - 716 (09 December 2004)
Microbial Metagenomics is  a Rapidly Emerging Field of Research “ Despite their ubiquity, relatively little is known about the majority of environmental microorganisms, largely because of their resistance to culture under standard laboratory conditions.” “ The application of high-throughput shotgun sequencing environmental samples has recently provided global views of those communities not obtainable from 16S rRNA or BAC clone–sequencing surveys .” Comparative Metagenomics of Microbial Communities  Susannah Green Tringe, Christian von Mering, Arthur Kobayashi, Asaf A. Salamov, Kevin Chen, Hwai W. Chang, Mircea Podar, Jay M. Short, Eric J. Mathur, John C. Detter, Peer Bork, Philip Hugenholtz, Edward M. Rubin Science 22 April 2005
Looking Back Nearly 4 Billion Years In the Evolution of Microbe Genomics Science Falkowski and Vargas 304 (5667): 58
The Sargasso Sea Experiment  The Power of Environmental Metagenomics Yielded a Total of  Over 1 billion Base Pairs of Non-Redundant Sequence Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms  Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al.  Science  2 April 2004: Vol. 304.  pp. 66 - 74
PI Larry Smarr
Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes CAMERA will include  All Sorcerer II Metagenomic Data
Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 150 Marine Microbes www.moore.org/microgenome/trees_main.asp CAMERA will include  All Moore Marine Microbial Genomes
Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute
Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans www.moore.org/microgenome/worldmap.asp
Calit2 is Discussing Including  Other Metagenomic Data Sets A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms.  We discovered significant intersubject variability.  Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease. “ Diversity of the Human Intestinal Microbial Flora”  Paul B. Eckburg, et al  Science  (10 June 2005) 395 Phylotypes
Genomic Data Is Growing Rapidly,  But  Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank www.rcsb.org/pdb/holdings.html www.ncbi.nlm.nih.gov/Genbank 100 Billion Bases! Total Data < 1TB 35,000 Structures
Metagenomics Will Couple to Earth Observations  Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution  Technical Working Group January 6-7, 2005
Challenge: Average Throughput of NASA Data Products  to End User is < 50 Mbps  Tested October 2005 http://ensight.eos.nasa.gov/Missions/icesat/index.shtml Internet2 Backbone is 10,000 Mbps! Throughput is < 0.5% to End User
National Lambda Rail (NLR) and TeraGrid Provides  Cyberinfrastructure Backbone for U.S. Researchers San Francisco Pittsburgh Cleveland San Diego Los Angeles Portland Seattle Pensacola Baton Rouge Houston San Antonio Las Cruces / El Paso Phoenix New York City Washington, DC Raleigh Jacksonville Dallas Tulsa Atlanta Kansas City Denver Ogden/ Salt Lake City Boise Albuquerque UC-TeraGrid UIC/NW-Starlight Chicago International  Collaborators NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout NSF’s TeraGrid Has 4 x 10Gb  Lambda Backbone  Links Two Dozen State and Regional Optical Networks DOE, NSF, & NASA Using NLR
The OptIPuter Project –    Creating a LambdaGrid “Web” for Gigabyte Data Objects NSF Large Information Technology Research Proposal Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PI Partnering Campuses: USC, SDSU, NW, TA&M, UvA, SARA, NASA Industrial Partners IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5 Million Over Five Years Linking Global Scale Science Projects to User’s Linux Clusters NIH Biomedical Informatics NSF EarthScope and ORION Research Network
Using the OptIPuter to Couple Data Assimilation Models  to Remote Data Sources Including Biology Regional Ocean Modeling System (ROMS)  http://ourocean.jpl.nasa.gov/ NASA MODIS Mean Primary Productivity  for April 2001 in California Current System
Calit2 Intends to Jump Beyond Traditional Web-Accessible Databases Data  Backend (DB, Files) W E B  PORTAL (pre-filtered,  queries metadata) Response Request + many others Source: Phil Papadopoulos, SDSC, Calit2 BIRN PDB NCBI Genbank
Calit2’s Direct Access Core Architecture  Will Create Next Generation Metagenomics Server Traditional User Response Request Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine  Microbial Project NASA Goddard  Satellite Data Community Microbial Metagenomics Data Flat File Server Farm W E B  PORTAL Dedicated Compute Farm (100s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs)  Web (other service) Local  Cluster Local Environment Direct Access  Lambda Cnxns Data- Base Farm 10 GigE  Fabric
First Implementation of  the CAMERA Complex Compute Database & Storage
Analysis Data Sets, Data Services,  Tools, and Workflows Assemblies of Metagenomic Data e.g, GOS, JGI CSP Annotations Genomic and Metagenomic Data “ All-against-all” Alignments of ORFs Updated Periodically Gene Clusters and Associated Data Profiles, Multiple-Sequence Alignments,  HMMs, Phylogenies, Peptide Sequences Data Services ‘ Raw’ and Specialized Analysis Data Rich Query Facilities Tools and Workflows Navigate and Sift Raw and Analysis Data Publish Workflows and Develop New Ones Prioritize Features via Dialogue with Community Source:  Saul Kravitz Director of Software Engineering J. Craig Venter Institute
CAMERA Timeline Release 1:  Mid-2006 Majority of GOS + Moore Microbe Genome Data 6 Gbp Has Been Assembled Initial Versions of Core Tools BLAST, Reference Alignment Viewer Release 2: Early-2007 Additional Data Additional/Improved Tools Improved Usability Subsequent Move Towards Semantic DB, Direct Access Additional Tools & Data Based on Community Feedback
Announcing Tuesday January 17, 2006
The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Calit2@UCSD Building Extremely Thermostable -- Useful for Many  Industrial Processes (e.g. Chemical and Food)  173 Structures (122 from JCSG) Determining the Protein Structures of the Thermotoga Maritima Genome  122 T.M. Structures Solved by JCSG  (75 Unique In The PDB)   Direct Structural Coverage of 25% of the Expressed Soluble Proteins Probably Represents the Highest Structural Coverage of Any Organism Source: John Wooley, UCSD
UCI’s IGB Develops a Suite of Programs and Servers  for Protein Structure and Structural Feature Prediction www.igb.uci.edu/tools.htm Source: Pierre Baldi, UCI Sixty Affiliated  IGB Labs at UCI e.g.:
CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastructure: Raw Resources, Middleware & Execution Environment NBCR Rocks Clusters Virtual Organizations Web Services KEPLER Workflow Management Vision Telescience Portal Located in Calit2@UCSD Building National Biomedical Computation  Resource  an NIH supported resource center
Calit2 is Collaborating with Douglas Wallace-- Planning to Bring MITOMAP into Calit2 Domain The Human mtDNA Map, Showing  the Location of Selected Pathogenic Mutations Within the 16,569-Base Pair Genome MITOMAP:  A Human Mitochondrial Genome Database.  www.mitomap.org , 2005 5 March 1999
Displaying Images from Electron Microscope Zeiss Scanning Electron Microscope in Calit2@ UCI
Zooming In
Metagenomics “Extreme Assembly”  Requires Large Amount of Pixel Real Estate Source: Karin Remington J. Craig Venter Institute Prochlorococcus Microbacterium Burkholderia Rhodobacter SAR-86 unknown unknown
Metagenomics Requires a Global View of Data  and the Ability to Zoom Into Detail Interactively Overlay of Metagenomics Data onto Sequenced Reference Genomes (This Image: Prochloroccocus marinus MED4) Source: Karin Remington J. Craig Venter Institute
OptIPuter Scalable Adaptive Graphics Environment (SAGE) Allows Integration of HD Streams Source: David Lee,  NCMIR, UCSD
Calit2 and the Venter Institute Will Combine Telepresence with Remote Interactive Analysis Live Demonstration  of 21st Century  National-Scale  Team Science OptIPuter  Visualized  Data HDTV  Over  Lambda 25 Miles Venter Institute
OptIPuter@UCI is Up and Working Created 09-27-2005 by Garrett Hildebrand Modified 11-03-2005 by Jessica Yu 10 GE SPDS Catalyst 3750 in CSI ONS 15540 WDM at UCI campus MPOE (CPL) 10 GE DWDM Network Line Engineering Gateway Building,  Catalyst 3750 in 3 rd floor IDF MDF Catalyst 6500 w/ firewall, 1 st  floor closet Wave-2 : layer-2 GE. UCSD address space 137.110.247.210-222/28 Floor 2 Catalyst 6500 Floor 3 Catalyst 6500 Floor 4 Catalyst 6500 Wave-1 : UCSD address space 137.110.247.242-246 NACS-reserved for testing ESMF Catalyst 3750 in NACS Machine Room (Optiputer) Viz Lab Wave 1 1GE Wave 2 1GE Calit2 Building UCInet HIPerWall Los Angeles 1 GE DWDM Network Line Tustin CENIC Calren POP UCSD  Optiputer Network
Calit2/SDSC Proposal to Create a UC Cyberinfrastructure  of “On-Ramps” to National LambdaRail Resources OptIPuter + CalREN-XD  + TeraGrid = “OptiGrid” Source: Fran Berman, SDSC , Larry Smarr, Calit2 Creating a Critical Mass of End Users on a Secure LambdaGrid UC San Francisco  UC San Diego  UC Riverside  UC Irvine  UC Davis  UC Berkeley UC Santa Cruz UC Santa Barbara  UC Los Angeles  UC Merced

More Related Content

PPTX
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
bedutilh
 
PDF
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
European Centre for Disease Prevention and Control (ECDC)
 
PPT
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
Lubna MRL
 
PPT
Advancing the Metagenomics Revolution
Larry Smarr
 
PPT
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
Larry Smarr
 
PPT
The Emerging Global Community of Microbial Metagenomics Researchers
Larry Smarr
 
PPT
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
PPT
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
Saul Kravitz
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
bedutilh
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
European Centre for Disease Prevention and Control (ECDC)
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
Lubna MRL
 
Advancing the Metagenomics Revolution
Larry Smarr
 
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
Larry Smarr
 
The Emerging Global Community of Microbial Metagenomics Researchers
Larry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
Saul Kravitz
 

What's hot (20)

PPT
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
Larry Smarr
 
PPTX
Analysis of binning tool in metagenomics
Dr. sreeremya S
 
PPT
Microbial Metagenomics and Human Health
Larry Smarr
 
PPTX
Metagenomics
Chinthu V Saji
 
PDF
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
VHIR Vall d’Hebron Institut de Recerca
 
PPT
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Larry Smarr
 
PPTX
Tom Delmont: From the Terragenome Project to Global Metagenomic Comparisons: ...
GigaScience, BGI Hong Kong
 
PDF
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics
QIAGEN
 
PDF
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
PPTX
Metagenomics and it’s applications
Sham Sadiq
 
PPTX
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
nist-spin
 
PPT
Metagenomic
kumarkanika
 
PPTX
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Mick Watson
 
PPT
Metagenomics sequencing
cdgenomics525
 
PDF
Pattemore 2015
Julie Pattemore
 
PPT
Metagenomics analysis
VijiMahesh1
 
PPTX
Metagenomics newer approach in understanding Microbes
Society for Microbiology and Infection care
 
PPTX
metagenomics
Ghooda Shaqour
 
PPTX
Molecular pathology in microbiology and metagenomics
CharithRanatunga
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
Larry Smarr
 
Analysis of binning tool in metagenomics
Dr. sreeremya S
 
Microbial Metagenomics and Human Health
Larry Smarr
 
Metagenomics
Chinthu V Saji
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
VHIR Vall d’Hebron Institut de Recerca
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Larry Smarr
 
Tom Delmont: From the Terragenome Project to Global Metagenomic Comparisons: ...
GigaScience, BGI Hong Kong
 
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics
QIAGEN
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Metagenomics and it’s applications
Sham Sadiq
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
nist-spin
 
Metagenomic
kumarkanika
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Mick Watson
 
Metagenomics sequencing
cdgenomics525
 
Pattemore 2015
Julie Pattemore
 
Metagenomics analysis
VijiMahesh1
 
Metagenomics newer approach in understanding Microbes
Society for Microbiology and Infection care
 
metagenomics
Ghooda Shaqour
 
Molecular pathology in microbiology and metagenomics
CharithRanatunga
 
Ad

Similar to Microbial Metagenomics Drives a New Cyberinfrastructure (20)

PPT
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
PPT
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Larry Smarr
 
PPT
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
PPT
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Larry Smarr
 
PPT
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Larry Smarr
 
PPT
Building an Information Infrastructure to Support Genetic Sciences
Larry Smarr
 
PPT
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Larry Smarr
 
PPT
Genomics at the Speed of Light: Understanding the Living Ocean
Larry Smarr
 
PPT
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
PPT
Sequencing Genomics: The New Big Data Driver
Larry Smarr
 
PPT
Cross-Disciplinary Biomedical Research at Calit2
Larry Smarr
 
PPT
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Larry Smarr
 
PPT
OptIPuter: Metagenomics at Light Speed
Larry Smarr
 
PPTX
Metagenomics
Surender Rawat
 
PPTX
Quantified Self On Being A Personal Genomic Observatory
Larry Smarr
 
PPTX
Phylotastic metagenomics
Holly Bik
 
PPT
High Performance Collaboration
Larry Smarr
 
PDF
Marine Host-Microbiome Interactions: Challenges and Opportunities
Jonathan Eisen
 
PPTX
Big data nebraska
Adina Chuang Howe
 
PPTX
Metagenomics , Applications, Techniques And Limitations .pptx
MalikSahib22
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Larry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Larry Smarr
 
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Larry Smarr
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Larry Smarr
 
Building an Information Infrastructure to Support Genetic Sciences
Larry Smarr
 
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analys...
Larry Smarr
 
Genomics at the Speed of Light: Understanding the Living Ocean
Larry Smarr
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
Sequencing Genomics: The New Big Data Driver
Larry Smarr
 
Cross-Disciplinary Biomedical Research at Calit2
Larry Smarr
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Larry Smarr
 
OptIPuter: Metagenomics at Light Speed
Larry Smarr
 
Metagenomics
Surender Rawat
 
Quantified Self On Being A Personal Genomic Observatory
Larry Smarr
 
Phylotastic metagenomics
Holly Bik
 
High Performance Collaboration
Larry Smarr
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Jonathan Eisen
 
Big data nebraska
Adina Chuang Howe
 
Metagenomics , Applications, Techniques And Limitations .pptx
MalikSahib22
 
Ad

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
Larry Smarr
 
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Larry Smarr
 
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Larry Smarr
 
PPTX
National Research Platform: Application Drivers
Larry Smarr
 
PPT
From Supercomputing to the Grid - Larry Smarr
Larry Smarr
 
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Larry Smarr
 
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
Larry Smarr
 
PPT
The Coming of the Grid - September 8-10,1997
Larry Smarr
 
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
Larry Smarr
 
PPT
High Performance Geographic Information Systems
Larry Smarr
 
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Larry Smarr
 
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
Larry Smarr
 
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
Larry Smarr
 
PPTX
The CENIC-AI Resource: The Right Connection
Larry Smarr
 
PPTX
The Pacific Research Platform: The First Six Years
Larry Smarr
 
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
Larry Smarr
 
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Larry Smarr
 
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Larry Smarr
 
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
Larry Smarr
 
PPTX
Digital Twins of Physical Reality - Future in Review
Larry Smarr
 
Smart Patients, Big Data, NextGen Primary Care
Larry Smarr
 
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Larry Smarr
 
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Larry Smarr
 
National Research Platform: Application Drivers
Larry Smarr
 
From Supercomputing to the Grid - Larry Smarr
Larry Smarr
 
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Larry Smarr
 
Redefining Collaboration through Groupware - From Groupware to Societyware
Larry Smarr
 
The Coming of the Grid - September 8-10,1997
Larry Smarr
 
Supercomputers: Directions in Technology, Architecture, and Applications
Larry Smarr
 
High Performance Geographic Information Systems
Larry Smarr
 
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Larry Smarr
 
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
Larry Smarr
 
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
Larry Smarr
 
The CENIC-AI Resource: The Right Connection
Larry Smarr
 
The Pacific Research Platform: The First Six Years
Larry Smarr
 
The NSF Grants Leading Up to CHASE-CI ENS
Larry Smarr
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Larry Smarr
 
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Larry Smarr
 
Toward a National Research Platform to Enable Data-Intensive Computing
Larry Smarr
 
Digital Twins of Physical Reality - Future in Review
Larry Smarr
 

Recently uploaded (20)

PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 

Microbial Metagenomics Drives a New Cyberinfrastructure

  • 1. Microbial Metagenomics Drives a New Cyberinfrastructure Invited Talk School of Biological Sciences University of California, Irvine March 3, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD
  • 2. Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's Center for Earth Observations and Applications at Scripps Institution of Oceanography, will build a state-of-the-art computational resource and develop software tools to decipher the genetic code of communities of microbial life in the world's oceans. The Gordon and Betty Moore Foundation has awarded $24.5 million over seven years to create the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA). Scientists will use CAMERA for metagenomics research -- analyzing microbial genomic sequence data in the context of other microbial species, as well as in comparison to a variety of other &quot;metadata&quot; such as the chemical and physical conditions in which microbes are sampled. The CAMERA project will contain the results of the Venter Institute's Sorcerer II Expedition, which carried out the first large-scale genomic survey of microbial life in the world's oceans to produce the largest gene catalogue ever assembled. Sorcerer II is expected to more than double the number of protein sequences currently available in the National Institutes of Health's GenBank. In addition to Sorcerer II's ecological genomic data, the CAMERA database will be augmented by the full genomes of more than 150 critical marine microbes enabling new comparative genomics studies.
  • 3. Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers Some Areas of Concentration: Metagenomics Genomic Analysis of Organisms Evolution of Genomes Cancer Genomics Human Genomic Variation and Disease Mitochondrial Evolution Proteomics Computational Biology Information Theory and Biological Systems UC San Diego UC Irvine 1200 Researchers in Two Buildings
  • 4. Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial World Source: Carl Woese, et al You Are Here Much of Genome Work Has Occurred in Animals
  • 5. Calit2 Researcher Eskin Collaborates with Perlegen Sciences on Map of Human Genetic Variation Across Populations David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin, Eleazar Eskin , Dennis G. Ballinger, Kelly A. Frazer, David R. Cox. “ Whole-Genome Patterns of Common DNA Variation in Three Human Populations” Science 18 February, 2005: 307(5712):1072-1079. “ We have characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry.” “ Although knowledge of a single genetic risk factor can seldom be used to predict the treatment outcome of a common disease, knowledge of a large fraction of all the major genetic risk factors contributing to a treatment response or common disease could have immediate utility, allowing existing treatment options to be matched to individual patients without requiring additional knowledge of the mechanisms by which the genetic differences lead to different outcomes .” “ More detailed haplotype analysis results are available at http://research.calit2.net/hap/wgha/ “
  • 6. For Mitochondrial Diseases It Has Been More Productive to Classify Patients by Genetic Defect Rather than by Clinical Manifestation Over the past 10 years, mitochondrial defects have been implicated in a wide variety of degenerative diseases, aging, and cancer… The same mtDNA mutation can produce quite different phenotypes, and different mutations can produce similar phenotypes. … The essential role of mitochondrial oxidative phosphorylation in cellular energy production, the generation of reactive oxygen species, and the initiation of apoptosis has suggested a number of novel mechanisms for mitochondrial pathology. -- Douglas Wallace, Science, Vol. 283, 1482-1488, 5 March 1999
  • 7. Comparative Genomics Can Reveal Biological Facts That Are Not Visible Within a Species “ After sequencing these three genomes, it is clear that substantial rearrangements in the human genome happen only once in a million years, while the rate of rearrangements in the rat and mouse is much faster.” --Glenn Tesler, UCSD Dept. of Mathematics www.calit2.net/culture/features/2004/4-1_pevzner.html Co-Authors Pavel Pevzner and Glenn Tesler, UCSD April 1, 2004 December 05, 2002 December 9, 2004
  • 8. Advanced Algorithmic Techniques Reveal Unexpected Results “ Many of the chicken–human aligned, non-coding sequences occur far from genes, frequently in clusters that seem to be under selection for functions that are not yet understood.” Nature 432, 695 - 716 (09 December 2004)
  • 9. Microbial Metagenomics is a Rapidly Emerging Field of Research “ Despite their ubiquity, relatively little is known about the majority of environmental microorganisms, largely because of their resistance to culture under standard laboratory conditions.” “ The application of high-throughput shotgun sequencing environmental samples has recently provided global views of those communities not obtainable from 16S rRNA or BAC clone–sequencing surveys .” Comparative Metagenomics of Microbial Communities Susannah Green Tringe, Christian von Mering, Arthur Kobayashi, Asaf A. Salamov, Kevin Chen, Hwai W. Chang, Mircea Podar, Jay M. Short, Eric J. Mathur, John C. Detter, Peer Bork, Philip Hugenholtz, Edward M. Rubin Science 22 April 2005
  • 10. Looking Back Nearly 4 Billion Years In the Evolution of Microbe Genomics Science Falkowski and Vargas 304 (5667): 58
  • 11. The Sargasso Sea Experiment The Power of Environmental Metagenomics Yielded a Total of Over 1 billion Base Pairs of Non-Redundant Sequence Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol. 304. pp. 66 - 74
  • 13. Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes CAMERA will include All Sorcerer II Metagenomic Data
  • 14. Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 150 Marine Microbes www.moore.org/microgenome/trees_main.asp CAMERA will include All Moore Marine Microbial Genomes
  • 15. Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute
  • 16. Moore Microbial Genome Sequencing Project Selected Microbes Throughout the World’s Oceans www.moore.org/microgenome/worldmap.asp
  • 17. Calit2 is Discussing Including Other Metagenomic Data Sets A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms. We discovered significant intersubject variability. Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease. “ Diversity of the Human Intestinal Microbial Flora” Paul B. Eckburg, et al Science (10 June 2005) 395 Phylotypes
  • 18. Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank www.rcsb.org/pdb/holdings.html www.ncbi.nlm.nih.gov/Genbank 100 Billion Bases! Total Data < 1TB 35,000 Structures
  • 19. Metagenomics Will Couple to Earth Observations Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution Technical Working Group January 6-7, 2005
  • 20. Challenge: Average Throughput of NASA Data Products to End User is < 50 Mbps Tested October 2005 http://ensight.eos.nasa.gov/Missions/icesat/index.shtml Internet2 Backbone is 10,000 Mbps! Throughput is < 0.5% to End User
  • 21. National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers San Francisco Pittsburgh Cleveland San Diego Los Angeles Portland Seattle Pensacola Baton Rouge Houston San Antonio Las Cruces / El Paso Phoenix New York City Washington, DC Raleigh Jacksonville Dallas Tulsa Atlanta Kansas City Denver Ogden/ Salt Lake City Boise Albuquerque UC-TeraGrid UIC/NW-Starlight Chicago International Collaborators NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone Links Two Dozen State and Regional Optical Networks DOE, NSF, & NASA Using NLR
  • 22. The OptIPuter Project – Creating a LambdaGrid “Web” for Gigabyte Data Objects NSF Large Information Technology Research Proposal Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PI Partnering Campuses: USC, SDSU, NW, TA&M, UvA, SARA, NASA Industrial Partners IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5 Million Over Five Years Linking Global Scale Science Projects to User’s Linux Clusters NIH Biomedical Informatics NSF EarthScope and ORION Research Network
  • 23. Using the OptIPuter to Couple Data Assimilation Models to Remote Data Sources Including Biology Regional Ocean Modeling System (ROMS) http://ourocean.jpl.nasa.gov/ NASA MODIS Mean Primary Productivity for April 2001 in California Current System
  • 24. Calit2 Intends to Jump Beyond Traditional Web-Accessible Databases Data Backend (DB, Files) W E B PORTAL (pre-filtered, queries metadata) Response Request + many others Source: Phil Papadopoulos, SDSC, Calit2 BIRN PDB NCBI Genbank
  • 25. Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Traditional User Response Request Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA Goddard Satellite Data Community Microbial Metagenomics Data Flat File Server Farm W E B PORTAL Dedicated Compute Farm (100s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric
  • 26. First Implementation of the CAMERA Complex Compute Database & Storage
  • 27. Analysis Data Sets, Data Services, Tools, and Workflows Assemblies of Metagenomic Data e.g, GOS, JGI CSP Annotations Genomic and Metagenomic Data “ All-against-all” Alignments of ORFs Updated Periodically Gene Clusters and Associated Data Profiles, Multiple-Sequence Alignments, HMMs, Phylogenies, Peptide Sequences Data Services ‘ Raw’ and Specialized Analysis Data Rich Query Facilities Tools and Workflows Navigate and Sift Raw and Analysis Data Publish Workflows and Develop New Ones Prioritize Features via Dialogue with Community Source: Saul Kravitz Director of Software Engineering J. Craig Venter Institute
  • 28. CAMERA Timeline Release 1: Mid-2006 Majority of GOS + Moore Microbe Genome Data 6 Gbp Has Been Assembled Initial Versions of Core Tools BLAST, Reference Alignment Viewer Release 2: Early-2007 Additional Data Additional/Improved Tools Improved Usability Subsequent Move Towards Semantic DB, Direct Access Additional Tools & Data Based on Community Feedback
  • 30. The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Calit2@UCSD Building Extremely Thermostable -- Useful for Many Industrial Processes (e.g. Chemical and Food) 173 Structures (122 from JCSG) Determining the Protein Structures of the Thermotoga Maritima Genome 122 T.M. Structures Solved by JCSG (75 Unique In The PDB) Direct Structural Coverage of 25% of the Expressed Soluble Proteins Probably Represents the Highest Structural Coverage of Any Organism Source: John Wooley, UCSD
  • 31. UCI’s IGB Develops a Suite of Programs and Servers for Protein Structure and Structural Feature Prediction www.igb.uci.edu/tools.htm Source: Pierre Baldi, UCI Sixty Affiliated IGB Labs at UCI e.g.:
  • 32. CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastructure: Raw Resources, Middleware & Execution Environment NBCR Rocks Clusters Virtual Organizations Web Services KEPLER Workflow Management Vision Telescience Portal Located in Calit2@UCSD Building National Biomedical Computation Resource an NIH supported resource center
  • 33. Calit2 is Collaborating with Douglas Wallace-- Planning to Bring MITOMAP into Calit2 Domain The Human mtDNA Map, Showing the Location of Selected Pathogenic Mutations Within the 16,569-Base Pair Genome MITOMAP: A Human Mitochondrial Genome Database. www.mitomap.org , 2005 5 March 1999
  • 34. Displaying Images from Electron Microscope Zeiss Scanning Electron Microscope in Calit2@ UCI
  • 36. Metagenomics “Extreme Assembly” Requires Large Amount of Pixel Real Estate Source: Karin Remington J. Craig Venter Institute Prochlorococcus Microbacterium Burkholderia Rhodobacter SAR-86 unknown unknown
  • 37. Metagenomics Requires a Global View of Data and the Ability to Zoom Into Detail Interactively Overlay of Metagenomics Data onto Sequenced Reference Genomes (This Image: Prochloroccocus marinus MED4) Source: Karin Remington J. Craig Venter Institute
  • 38. OptIPuter Scalable Adaptive Graphics Environment (SAGE) Allows Integration of HD Streams Source: David Lee, NCMIR, UCSD
  • 39. Calit2 and the Venter Institute Will Combine Telepresence with Remote Interactive Analysis Live Demonstration of 21st Century National-Scale Team Science OptIPuter Visualized Data HDTV Over Lambda 25 Miles Venter Institute
  • 40. OptIPuter@UCI is Up and Working Created 09-27-2005 by Garrett Hildebrand Modified 11-03-2005 by Jessica Yu 10 GE SPDS Catalyst 3750 in CSI ONS 15540 WDM at UCI campus MPOE (CPL) 10 GE DWDM Network Line Engineering Gateway Building, Catalyst 3750 in 3 rd floor IDF MDF Catalyst 6500 w/ firewall, 1 st floor closet Wave-2 : layer-2 GE. UCSD address space 137.110.247.210-222/28 Floor 2 Catalyst 6500 Floor 3 Catalyst 6500 Floor 4 Catalyst 6500 Wave-1 : UCSD address space 137.110.247.242-246 NACS-reserved for testing ESMF Catalyst 3750 in NACS Machine Room (Optiputer) Viz Lab Wave 1 1GE Wave 2 1GE Calit2 Building UCInet HIPerWall Los Angeles 1 GE DWDM Network Line Tustin CENIC Calren POP UCSD Optiputer Network
  • 41. Calit2/SDSC Proposal to Create a UC Cyberinfrastructure of “On-Ramps” to National LambdaRail Resources OptIPuter + CalREN-XD + TeraGrid = “OptiGrid” Source: Fran Berman, SDSC , Larry Smarr, Calit2 Creating a Critical Mass of End Users on a Secure LambdaGrid UC San Francisco UC San Diego UC Riverside UC Irvine UC Davis UC Berkeley UC Santa Cruz UC Santa Barbara UC Los Angeles UC Merced