SlideShare a Scribd company logo
BioCloud

Random large-scale tools that you
           can use
Disclaimer

I'm working on computer security research... no biology
background anywhere in my field, not even on computer virus ;)

While working, I stumbled across hadoop for scalable web
spidering purposes.

I'm not a bioinformatician (yet)... but I saw a powerful tool that
could be useful in your research field(s):

                       "biodatacrunching" ?
Glossary

• Cluster (beowulf)
• Grid
• Cloud
Biology and computer science

• Increasingly resource-hungry applications
   o Nowadays, they can be approached by "brute force"
   o More data means more "iron" to crunch it
• Local IT team nor budget keep up with this pace
   o €€€ spent on new hardware
   o €€€ spent on IT personnel
   o Isn't it wiser to scale one machine at a time ?
• Developers get angry or frustrated on
   o Delays on software installation and config
   o Unscheduled downtimes
   o Delays as a result of not enough computing power
What is cloud computing ?

In plain english:
http://www.youtube.com/watch?v=XdBd14rjcs0
Infrastructure layer
Cloud niche
Infraestructure

• Amazon
  o EC2
  o S3
  o AMI
      Recently added BioInformatic appliances
      Public data sets
• Eukalyptus
  o EC2 + AMI server-side open source implementation
  o We run it for our internal projects
• Enomalism
• Rightscale & Service Cloud
  o Tools/Consultants for the upcoming cloud issues
Application layer
        • Tecnologias para paralelizar
          aplicaciones
Application layer

• Hadoop
  o Open source mapreduce implementation
  o Java based, but any language can be used
• Cloudburst-bio
  o MapReduce fine tuned implementation for Bio (XXX)
Easy mapreduce
What is hadoop

Quotation from official web page:

 "Hadoop is a software platform that lets one easily write and
    run applications that process vast amounts of data."

"vast amounts of data (ATGTTAG...)" + "easily" = sounds good

                                  
                isn't it ? or is it vaporware ?
Why is it used for ?

• Attack problems that imply several GB, TB even PB of data
• The programmer does not care on job management
   o The focus is on data transformation, piping (useful work)

• Not intended for realtime processing
• Suitable to offload databases from long batch jobs
What is MapReduce

Joel on software explanation
Useful to crunch *tons* of data parallellized by design
HDFS: Hadoop Distributed FileSystem
What about Jobs control ?
Who is using it ?

• Google
  o Lots of internal projects (proprietary MapReduce)
      GMail spam machine learning
      Google maps
      ...

• Yahoo
  o Internal web graph (powers search engine)
  o Pig (sqlish abstraction)
  o Sort 1 terabyte of data in 209 seconds

• Facebook
   o Users big graph, used for data mining (Hive)
Hadoop has (lots of) new friends

•   Nutch
•   Mahout
•   Hbase
•   Hama
•   Pig
•   ZooKeeper
•   Smartfrog
•   ...
Next steps ?

Identify resource-hungry applications (batch vs interactive)
Migrate apps to cloud
1) Allocate a certain fixed amount of money
2) Give a try on amazon EC2
3) Optional: Build (local) rocks cluster with Eukaliptus cloud

Test, deploy, automate, automate and automate ... puppet ?
(a few) References


http://www.cloudera.com/hadoop-training-thinking-at-scale
http://www.slideshare.net/tag/hadoop
http://sourceforge.net/projects/cloudburst-bio/
http://hadoop.apache.org/core/
http://people.apache.org/~rdonkin/hadoop-talk/hadoop.html

More Related Content

What's hot (20)

PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PPTX
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
PDF
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
PPTX
Big data architecture on cloud computing infrastructure
datastack
 
PDF
Hd insight essentials quick view
Rajesh Nadipalli
 
PPTX
Big data vahidamiri-datastack.ir
datastack
 
PPTX
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
PPTX
عصر کلان داده، چرا و چگونه؟
datastack
 
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
Alluxio, Inc.
 
PPTX
BigData- On - AWS Cloud -1
Milind gunjan
 
PDF
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
PPT
Hadoop distributions - ecosystem
Jakub Stransky
 
PPTX
Hadoop
Oded Rotter
 
PDF
Alluxio Use Cases and Future Directions
Alluxio, Inc.
 
PPTX
Hd insight overview
vhrocca
 
PPT
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
PDF
Orchestrate a Data Symphony
Alluxio, Inc.
 
PPTX
Hadoop
thisisnabin
 
PPT
The solution for big data
Shubham Pendharkar
 
PPTX
Optimizing Big Data to run in the Public Cloud
Qubole
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
Big data architecture on cloud computing infrastructure
datastack
 
Hd insight essentials quick view
Rajesh Nadipalli
 
Big data vahidamiri-datastack.ir
datastack
 
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
عصر کلان داده، چرا و چگونه؟
datastack
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
Alluxio, Inc.
 
BigData- On - AWS Cloud -1
Milind gunjan
 
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Hadoop distributions - ecosystem
Jakub Stransky
 
Hadoop
Oded Rotter
 
Alluxio Use Cases and Future Directions
Alluxio, Inc.
 
Hd insight overview
vhrocca
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
Orchestrate a Data Symphony
Alluxio, Inc.
 
Hadoop
thisisnabin
 
The solution for big data
Shubham Pendharkar
 
Optimizing Big Data to run in the Public Cloud
Qubole
 

Viewers also liked (17)

PDF
Map Reduce
Inael Rodrigues
 
PDF
Código limpo: Comentários
Inael Rodrigues
 
PPTX
Federated Cloud Computing
David Wallom
 
PDF
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
M. Gonzalo Claros
 
PDF
Contact Center Technology Trends: Part 1
DATAMARK
 
PDF
Global remittances product writeup
Partho Chakraborty
 
PDF
Cloud banking
Partho Chakraborty
 
PDF
Salesforce - AI for CRM
Ambachtelijke Marketing
 
PDF
Cloud Based Infrastructure for Banking
Heri Supriadi
 
PDF
Cloud Computing for Banking - Accenture
Kim Jensen
 
PPTX
The Everyday Bank: The Role of Cloud Computing in the Future of Banking
Accenture Technology
 
PPTX
fog computing ppt
sravya raju
 
PPTX
Cloud security ppt
Venkatesh Chary
 
PPTX
Cloud computing security issues and challenges
Dheeraj Negi
 
PPTX
Cloud Computing Security
Ninh Nguyen
 
PPTX
Data security in cloud computing
Prince Chandu
 
PPTX
FOG COMPUTING
Saisharan Amaravadhi
 
Map Reduce
Inael Rodrigues
 
Código limpo: Comentários
Inael Rodrigues
 
Federated Cloud Computing
David Wallom
 
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
M. Gonzalo Claros
 
Contact Center Technology Trends: Part 1
DATAMARK
 
Global remittances product writeup
Partho Chakraborty
 
Cloud banking
Partho Chakraborty
 
Salesforce - AI for CRM
Ambachtelijke Marketing
 
Cloud Based Infrastructure for Banking
Heri Supriadi
 
Cloud Computing for Banking - Accenture
Kim Jensen
 
The Everyday Bank: The Role of Cloud Computing in the Future of Banking
Accenture Technology
 
fog computing ppt
sravya raju
 
Cloud security ppt
Venkatesh Chary
 
Cloud computing security issues and challenges
Dheeraj Negi
 
Cloud Computing Security
Ninh Nguyen
 
Data security in cloud computing
Prince Chandu
 
FOG COMPUTING
Saisharan Amaravadhi
 
Ad

Similar to Cloud computing and Hadoop introduction (20)

PPTX
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
PDF
Geek camp
jdhok
 
PPTX
Bw tech hadoop
Mindgrub Technologies
 
PPTX
BW Tech Meetup: Hadoop and The rise of Big Data
Mindgrub Technologies
 
ODP
Hadoop introduction
葵慶 李
 
PPT
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
PPT
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
PPT
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
PDF
Semantic web meetup 14.november 2013
Jean-Pierre König
 
PDF
Hadoop
Veera Sundari
 
PPTX
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
PDF
Hadoop Primer
Steve Staso
 
PPTX
hadoop-ecosystem-ppt.pptx
raghavanand36
 
PPT
Taylor bosc2010
BOSC 2010
 
PDF
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
 
PPTX
Hadoop for Bioinformatics: Building a Scalable Variant Store
Uri Laserson
 
PDF
Whitepaper : CHI: Hadoop's Rise in Life Sciences
EMC
 
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
PDF
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
StampedeCon
 
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Geek camp
jdhok
 
Bw tech hadoop
Mindgrub Technologies
 
BW Tech Meetup: Hadoop and The rise of Big Data
Mindgrub Technologies
 
Hadoop introduction
葵慶 李
 
Hadoop ecosystem framework n hadoop in live environment
Delhi/NCR HUG
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
AWS (Hadoop) Meetup 30.04.09
Chris Purrington
 
Semantic web meetup 14.november 2013
Jean-Pierre König
 
MODULE 1: Introduction to Big Data Analytics.pptx
NiramayKolalle
 
Hadoop Primer
Steve Staso
 
hadoop-ecosystem-ppt.pptx
raghavanand36
 
Taylor bosc2010
BOSC 2010
 
Hadoop on Azure, Blue elephants
Ovidiu Dimulescu
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Uri Laserson
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
EMC
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
MapReduce Best Practices and Lessons Learned Applied to Enterprise Datasets -...
StampedeCon
 
Ad

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 

Cloud computing and Hadoop introduction