Outline
1.  Working with CloudMan
2.  Deploying bioinformatics tools with
CloudBioLinux
3.  Deploying a genome assembly and annotation
pipeline with Fabric
Working with CloudMan
Enis Afgan
@ GCC 2014, Baltimore
Researchers
`
100sGB
100+
Why the cloud?
Infrastructure Customization
How to use the cloud?
1. Get an account on the supported cloud
2. Start a master instance via the cloud web
console or CloudLaunch
3. Use CloudMan s web interface on the master
instance to manage the platform
4. Use or customize Galaxy
YOUR TURN
Launch an instance
1.  Visit biocloudcentral.org
2.  Enter the access key and secret key
provided by Dave Clements on 6/25
3.  Provide your email address
4.  Use your initials as the cluster name
5.  Set any password (and remember it)
6.  Keep Large instance type
7.  Start your instance
Wait for the instance to start (~2-3 minutes)
For more details, see
wiki.galaxyproject.org/CloudMan
Launch an instance
1.  Visit usegalaxy.org/cloudlaunch
2.  Enter your access key and secret key
(credits provided by Dave Clements on 6/25)
3.  Choose New cluster
4.  Set any name as the cluster name
5.  Set any password
6.  Keep cloudman_keypair
7.  Keep Large instance type
8.  Launch your instance
Wait for the instance to start (~2-3 minutes)
For more details, see
wiki.galaxyproject.org/CloudMan
Manual scaling
•  Explicitly add 1 worker node to your cluster
•  Node type
corresponds to
node processing
capacity
•  Research use of
Spot instances
Auto-scaling
Using an S3 bucket as a data source
Accessing an instance over ssh
Use the terminal (or install Secure Shell for Chrome extension)
SSH using user ubuntu and the password you chose when
launching an instance:
[local machine]$ ssh ubuntu@<instance IP address>
Once logged in
•  You have full system access to your instance,
including sudo; use it as any other system
•  CloudMan-managed file systems are located
under /mnt
•  Can submit jobs to SGE job manager via the
standard qsub command
Customize your instance -
install a new tool
$ cd /mnt/galaxy/export
$ wget http://heanet.dl.sourceforge.net/
project/dnaclust/parallel_release_3/
dnaclust_linux_release3.zip
$ unzip dnaclust_linux_release3.zip
Use the new tool in the cluster mode
1.  Create a new sample shell file to run the tool; call it job_script.sh
with the following content:
#$ -cwd
/mnt/galaxy/export/dnaclust_linux_release3/dnaclust -l -s
0.9 /mnt/workshop-data/mtDNA.fasta
2.  Submit single job to SGE queue
qsub job_script.sh
3.  Check the queue: qstat -f
4.  Job output will be in the local directory in file job_script.sh.o#
5.  Submit the same job a number of times:
qsub job_script.sh (*10)
watch qstat ‒f
1.  See all jobs lined up
6.  [optional] See auto-scaling in action (if enabled) [1.5-2 mins]
Sharing-an-Instance
•  Share the entire CloudMan platform
•  Includes all of user data and even the
customizations
•  Publish a self-contained
analysis
•  Make a note of the
share-string and send
it to your neighbor
UNDERPINNING CONCEPTS
Technical details
CloudMan
core
Tool
as a
service
- cloud resource management
- multi-cloud interface
- service management and monitoring
- user interaction
- system state
- well-defined interface / API
- web interface
- self-contained service definition
- implements the default service
interface
- implements optional service hooks
- service dependencies
- automatic dependency management
- dynamic loading
Launcher /
Management
Console
Application(s)
(eg, Galaxy)
1°
2°
3°
6°, 8°
9°
Persistent
data
repository
Start CloudMan
Setup services
5°
4°
7°
10°
CM-w
CM-w
CM-w
...
FS 1 FS ...
Block storage
Contextualize
image
CloudMan MI
CloudMan MI
...
CloudMan MI
CloudMan machine image
11°
S3/Swift
CloudMan master instance
Snap1 Snap..
Managed
FS
Inst. storage
or or
FS archive
What do you get?
•  Cluster-in-the-cloud: SGE
•  Galaxy on the Cloud + control
•  Customizable: tools, configs, data
•  Sharable
•  Extensible
Supported cloud
middleware
1.  Amazon Web Services
2.  OpenStack
3.  Eucalyptus
4.  OpenNebula
Building
•  Leverage CloudBioLinux build framework
•  Via GVL flavor
•  Base CloudMan machine image
•  Tools and data
•  There are also more specific CBL flavors available
•  cloudman
github.com/afgane/gvl_flavor
Deploying
•  Integrated with BioCloudCentral.org app
•  Use the public one, deploy your own or run locally
•  BCC supports multiple clouds
Launcher /
Management
Console
Application(s)
(eg, Galaxy)
1°
2°
3°
6°, 8°
9°
Persistent
data
repository
Start CloudMan
Setup services
5°
4°
7°
10°
CM-w
CM-w
CM-w
...
FS 1 FS ...
Block storage
Contextualize
image
CloudMan MI
CloudMan MI
...
CloudMan MI
CloudMan machine image
11°
S3/Swift
CloudMan master instance
Snap1 Snap..
Managed
FS
Inst. storage
or or
FS archive
Troubleshooting
/mnt/galaxy[Indices]
/mnt/cm/paster.log cm-<hash>
/usr/bin/ec2autorun.log
/tmp/cm/cm_boot.log
/mnt/cm/paster.log
2
1
3

GCC 2014 scriptable workshop

  • 1.
    Outline 1.  Working withCloudMan 2.  Deploying bioinformatics tools with CloudBioLinux 3.  Deploying a genome assembly and annotation pipeline with Fabric
  • 2.
    Working with CloudMan EnisAfgan @ GCC 2014, Baltimore
  • 3.
  • 4.
  • 5.
    How to usethe cloud? 1. Get an account on the supported cloud 2. Start a master instance via the cloud web console or CloudLaunch 3. Use CloudMan s web interface on the master instance to manage the platform 4. Use or customize Galaxy
  • 6.
  • 7.
    Launch an instance 1. Visit biocloudcentral.org 2.  Enter the access key and secret key provided by Dave Clements on 6/25 3.  Provide your email address 4.  Use your initials as the cluster name 5.  Set any password (and remember it) 6.  Keep Large instance type 7.  Start your instance Wait for the instance to start (~2-3 minutes) For more details, see wiki.galaxyproject.org/CloudMan
  • 8.
    Launch an instance 1. Visit usegalaxy.org/cloudlaunch 2.  Enter your access key and secret key (credits provided by Dave Clements on 6/25) 3.  Choose New cluster 4.  Set any name as the cluster name 5.  Set any password 6.  Keep cloudman_keypair 7.  Keep Large instance type 8.  Launch your instance Wait for the instance to start (~2-3 minutes) For more details, see wiki.galaxyproject.org/CloudMan
  • 10.
    Manual scaling •  Explicitlyadd 1 worker node to your cluster •  Node type corresponds to node processing capacity •  Research use of Spot instances
  • 11.
  • 12.
    Using an S3bucket as a data source
  • 13.
    Accessing an instanceover ssh Use the terminal (or install Secure Shell for Chrome extension) SSH using user ubuntu and the password you chose when launching an instance: [local machine]$ ssh ubuntu@<instance IP address>
  • 14.
    Once logged in • You have full system access to your instance, including sudo; use it as any other system •  CloudMan-managed file systems are located under /mnt •  Can submit jobs to SGE job manager via the standard qsub command
  • 15.
    Customize your instance- install a new tool $ cd /mnt/galaxy/export $ wget http://heanet.dl.sourceforge.net/ project/dnaclust/parallel_release_3/ dnaclust_linux_release3.zip $ unzip dnaclust_linux_release3.zip
  • 16.
    Use the newtool in the cluster mode 1.  Create a new sample shell file to run the tool; call it job_script.sh with the following content: #$ -cwd /mnt/galaxy/export/dnaclust_linux_release3/dnaclust -l -s 0.9 /mnt/workshop-data/mtDNA.fasta 2.  Submit single job to SGE queue qsub job_script.sh 3.  Check the queue: qstat -f 4.  Job output will be in the local directory in file job_script.sh.o# 5.  Submit the same job a number of times: qsub job_script.sh (*10) watch qstat ‒f 1.  See all jobs lined up 6.  [optional] See auto-scaling in action (if enabled) [1.5-2 mins]
  • 17.
    Sharing-an-Instance •  Share theentire CloudMan platform •  Includes all of user data and even the customizations •  Publish a self-contained analysis •  Make a note of the share-string and send it to your neighbor
  • 18.
  • 19.
    CloudMan core Tool as a service - cloudresource management - multi-cloud interface - service management and monitoring - user interaction - system state - well-defined interface / API - web interface - self-contained service definition - implements the default service interface - implements optional service hooks - service dependencies - automatic dependency management - dynamic loading
  • 20.
    Launcher / Management Console Application(s) (eg, Galaxy) 1° 2° 3° 6°,8° 9° Persistent data repository Start CloudMan Setup services 5° 4° 7° 10° CM-w CM-w CM-w ... FS 1 FS ... Block storage Contextualize image CloudMan MI CloudMan MI ... CloudMan MI CloudMan machine image 11° S3/Swift CloudMan master instance Snap1 Snap.. Managed FS Inst. storage or or FS archive
  • 22.
    What do youget? •  Cluster-in-the-cloud: SGE •  Galaxy on the Cloud + control •  Customizable: tools, configs, data •  Sharable •  Extensible
  • 23.
    Supported cloud middleware 1.  AmazonWeb Services 2.  OpenStack 3.  Eucalyptus 4.  OpenNebula
  • 24.
    Building •  Leverage CloudBioLinuxbuild framework •  Via GVL flavor •  Base CloudMan machine image •  Tools and data •  There are also more specific CBL flavors available •  cloudman github.com/afgane/gvl_flavor
  • 25.
    Deploying •  Integrated withBioCloudCentral.org app •  Use the public one, deploy your own or run locally •  BCC supports multiple clouds
  • 26.
    Launcher / Management Console Application(s) (eg, Galaxy) 1° 2° 3° 6°,8° 9° Persistent data repository Start CloudMan Setup services 5° 4° 7° 10° CM-w CM-w CM-w ... FS 1 FS ... Block storage Contextualize image CloudMan MI CloudMan MI ... CloudMan MI CloudMan machine image 11° S3/Swift CloudMan master instance Snap1 Snap.. Managed FS Inst. storage or or FS archive Troubleshooting /mnt/galaxy[Indices] /mnt/cm/paster.log cm-<hash> /usr/bin/ec2autorun.log /tmp/cm/cm_boot.log /mnt/cm/paster.log 2 1 3