SlideShare a Scribd company logo
The genesis of clusterlib
An open source library to tame your
favourite supercomputer
Arnaud Joly
May 2015, Phd hour discussions
Use case for the birth of clusterlib
Solving supervised learning tasks
The goal of supervised learning is to learn a function from
input-output pairs in order to predict the output for any new
input.
A supervised learning task
1.5 1.0 0.5 0.0 0.5 1.0 1.5
1.5
1.0
0.5
0.0
0.5
1.0
1.5
A learnt function
1.5 1.0 0.5 0.0 0.5 1.0 1.5
1.5
1.0
0.5
0.0
0.5
1.0
1.5
Time is running out
A huge set of tasks and a practical upper bound
O(#datasets×#algorithms×#parameters) ≤ Time before deadline
Embarrassingly parallel tasks.
Supercomputer = scheduler + workers
Supercomputer = cluster of computers
CECI is in !
Thanks to the CECI1 at the University of Liège, we have access
to
7 supercomputers (or clusters of computers);
≈ 20 000 cores;
>60 000 GB ram;
12 high performance scientific GPU.
1
The CECI (http: // www. ceci-hpc. be/ ) is a supercomputer
consortium funded by the FNRS.
A glimpse to the SLURM scheduler user interface
How to sumit a job?
First, we need to write a bash file job.sh which specifies
resource requirements and calls the program.
#!/bin/bash
#SBATCH --job-name=job-name
#SBATCH --time=10:00
#SBATCH --mem=1000
srun hostname
Then you can launch the job in the queue
$ sbatch job.sh
How to launch easily many jobs?
With clusterlib, how to launch easily many jobs??
Let’s generate jobs submission command on the fly!
>>> from clusterlib.scheduler import submit
>>> script = submit(job_command="srun hostname",
... job_name="test",
... time="10:00",
... memory=1000,
... backend="slurm")
>>> print(script)
echo ’#!/bin/bash
srun hostname’ | sbatch --job-name=test --time=10:00 --mem=1000
>>> # Let’s launch the job
>>> os.system(script)
A glimpse to the SLURM scheduler user interface
How to check if a job is running?
To check if a job is running, you can use the squeue command.
$ squeue -u ‘whoami‘
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1225128 defq job-9999 someone PD 0:00 1 (Priority)
1225129 defq job-9998 someone PD 0:00 1 (Priority)
...
1224607 defq job-0003 someone R 7:39:16 1 node025
1224605 defq job-0002 someone R 7:43:25 1 node040
1224593 defq job-0001 someone R 8:06:33 1 node035
How to avoid launching running, queued or completed jobs?
With clusterlib, how to avoid re-launching ... ?
... running and queued jobs?
Let’s give to jobs a unique name, then let’s retrieve the names
of running / queued jobs.
>>> from clusterlib.scheduler import queued_or_running_jobs
>>> queued_or_running_jobs()
["job-0001", "job-0002", ..., "job-9999"]
... completed jobs?
A job must indicate through the file system that it has been
completed e.g. by creating a file or by registering completion into a
database.
clusterlib provides a small NO-SQL database based on sqlite3.
Simplicity beats complexity
With only 4 functions
from clusterlib.scheduler import queued_or_running_jobs
from clusterlib.scheduler import submit
from clusterlib.storage import sqlite3_dumps
from clusterlib.storage import sqlite3_loads
We launch easily thousands of jobs.
We are task DRY (Don’t repeat yourself).
We are smoothly working on SLURM and SGE schedulers.
We have only a dependency on python,
Are we done?
Are we done?
Let’s go open source!
Why making an open source library?
Give back to the open source community.
Open source initiative affiliate communities
Why making an open source library?
Bug reports are great!
Why making an open source library?
Welcome new contributors !
From left to right: Olivier Grisel, Antonio Sutera, Loic Esteve (no
photo) and Konstantin Petrov (no photo).
Why making open source in sciences? Reproducibility!
Be proud of your code
Open source way: Host your code publicly
www.myproject.com
Another awesome host platform
Tip Sit on the shoulders of a giant!
Bonus Use a control version system such as git (clusterlib
choice), mercurial,. . .
Open source way: Choose a license
No license = closed source
In short
You can’t use or even read my code.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
In short
You can read / use / share / modify my code, but derivatives
must retains those rights.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
In short
Do whatever you want with the code, but keep my name with it.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
For the sake of wisdom or money, choose carefully.
Open source way: Choose a license
No license = closed source
GPL-like license = copyleft
BSD / MIT-style license = permissive
For the sake of open source, pick a popular open source license.
For the sake of wisdom or money, choose carefully.
For the sake of science, go with BSD / MIT-style license.
Clusterlib is BSD Licensed.
Open source way: Start an issue tracker
An issue tracker allows managing and maintaining a list of
issues.
Tip Sit on the shoulders of giants! (again)
Open source way: Let users discuss with core contributors
1. Issue tracker (only viable for small project)
2. Mailing list : sourceforge, google groups, . . .
3. Stack overflow tag for big projects
Tip Sit on the shoulders of giants! (again * 2)
Are we done?
Are we done?
The grand seduction!
Know who you are!
Vision
The goal of the clusterlib is to ease the creation, launch and
management of embarrassingly parallel jobs on supercomputers
with schedulers such as SLURM and SGE.
Core values
Pure python, simple, user-friendly!
Attrative documentation
Readme
Attrative documentation
API documentation is nice.
Tip Follow a standard such as "PEP 0257 – Docstring
Conventions".
Attrative documentation
Beautiful doc is better.
clusterlib uses sphinx for building its doc.
Attrative documentation
And even better with examples.
Attrative documentation
A narrative documentation is awesome.
Attrative documentation
What’s new?
Thanks to all clusterlib contributors!
Appealing test suite
How good is your test suite (code coverage)?
All code lines are hit by the tests (100% line coverage), but not
all code paths (branches) are tested.
$ make test
nosetests clusterlib doc
Name Stmts Miss Branch BrMiss Cover Missing
------------------------------------------------------------------
clusterlib 1 0 0 0 100%
clusterlib.scheduler 82 0 40 16 87%
clusterlib.storage 35 0 14 0 100%
------------------------------------------------------------------
TOTAL 118 0 54 16 91%
------------------------------------------------------------------
Ran 12 tests in 0.189s
OK (SKIP=3)
Publicity
Are we done?
Are we done?
Not yet! Let’s make our live easier!
Automation, automation, . . .
Continuous testing
clusterlib uses Travis CI (works for many languages).
Tip Sit on the shoulders of giants! (again * 3)
Automation, automation, . . .
Continuous integration pays off in the long term!
Ensure test suite is often run. Awesome during development.
Tip Continuous integration can be enhanced with code
test coverage and code quality coverage.
Automation, automation, . . .
Continuous doc building
Tip Sit on the shoulders of giants! (again * 4)
Create and join an open source projects!
Join a community! Learn the best practices and technologies!
Make your code survive more than the one project! Give your
code to the world!
scikit-learn sprint 2014
A full example of clusterlib usage
# main.py
import sys, os
from clusterlib.storage import sqlite3_dumps
NOSQL_PATH = os.path.join(os.environ["HOME"], "job.sqlite3")
def main(argv=None):
# For ease here, function parameters are sys.argv
if argv is None:
argv = sys.argv
# Do heavy computation
# Save script evaluation on the hard disk
if __name__ == "__main__":
main()
# Great, the jobs is done!
sqlite3_dumps({" ".join(sys.argv): "JOB DONE"}, NOSQL_PATH)
A full example of clusterlib usage
# launcher.py
import sys
from clusterlib.scheduler import queued_or_running_jobs
from clusterlib.scheduler import submit
from clusterlib.storage import sqlite3_loads
from main import NOSQL_PATH
if __name__ == "__main__":
scheduled_jobs = set(queued_or_running_jobs())
done_jobs = sqlite3_loads(NOSQL_PATH)
for param in range(100):
job_name = "job-param=%s" % param
job_command = ("%s main.py --param %s"
% (sys.executable, param))
if (job_name not in scheduled_jobs and
job_command not in done_jobs):
script = submit(job_command, job_name=job_name)

More Related Content

What's hot (19)

PDF
Reversing the dropbox client on windows
extremecoders
 
PDF
PyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
Pôle Systematic Paris-Region
 
PDF
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
npinto
 
PDF
Doing the Impossible
Alexander Loechel
 
PDF
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
David Beazley (Dabeaz LLC)
 
PDF
Practicing Python 3
Mosky Liu
 
PDF
Ekon 25 Python4Delphi_MX475
Max Kleiner
 
PPTX
Quality assurance of large c++ projects
corehard_by
 
PDF
Developer-friendly taskqueues: What you should ask yourself before choosing one
Sylvain Zimmer
 
PDF
EKON 25 Python4Delphi_mX4
Max Kleiner
 
PDF
Python for IoT, A return of experience
Alexandre Abadie
 
PPTX
Sour Pickles
SensePost
 
PDF
Perl-C/C++ Integration with Swig
David Beazley (Dabeaz LLC)
 
PDF
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
David Beazley (Dabeaz LLC)
 
PDF
Introduction to IPython & Jupyter Notebooks
Eueung Mulyana
 
PDF
Data analytics in the cloud with Jupyter notebooks.
Graham Dumpleton
 
ODP
Is Python still production ready ? Ludovic Gasc
Pôle Systematic Paris-Region
 
PDF
Using Python3 to Build a Cloud Computing Service for my Superboard II
David Beazley (Dabeaz LLC)
 
PDF
Tensorflow on Android
Koan-Sin Tan
 
Reversing the dropbox client on windows
extremecoders
 
PyParis 2017 / Pandas - What's new and whats coming - Joris van den Bossche
Pôle Systematic Paris-Region
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
npinto
 
Doing the Impossible
Alexander Loechel
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
David Beazley (Dabeaz LLC)
 
Practicing Python 3
Mosky Liu
 
Ekon 25 Python4Delphi_MX475
Max Kleiner
 
Quality assurance of large c++ projects
corehard_by
 
Developer-friendly taskqueues: What you should ask yourself before choosing one
Sylvain Zimmer
 
EKON 25 Python4Delphi_mX4
Max Kleiner
 
Python for IoT, A return of experience
Alexandre Abadie
 
Sour Pickles
SensePost
 
Perl-C/C++ Integration with Swig
David Beazley (Dabeaz LLC)
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
David Beazley (Dabeaz LLC)
 
Introduction to IPython & Jupyter Notebooks
Eueung Mulyana
 
Data analytics in the cloud with Jupyter notebooks.
Graham Dumpleton
 
Is Python still production ready ? Ludovic Gasc
Pôle Systematic Paris-Region
 
Using Python3 to Build a Cloud Computing Service for my Superboard II
David Beazley (Dabeaz LLC)
 
Tensorflow on Android
Koan-Sin Tan
 

Viewers also liked (7)

PDF
Open Source Technology for Libraries
Nicole C. Engard
 
PDF
Practical Open Source Software for Libraries (part 1)
Nicole C. Engard
 
PDF
Open Source Software and Libraries
Ellyssa Kroski
 
PPT
Power Point Presentation on Open Source Software
opensourceacademy
 
PPT
Open Source Technology
priyadharshini murugan
 
PPT
Open Source Software Presentation
Henry Briggs
 
PPTX
OPEN SOURCE SEMINAR PRESENTATION
Ritwick Halder
 
Open Source Technology for Libraries
Nicole C. Engard
 
Practical Open Source Software for Libraries (part 1)
Nicole C. Engard
 
Open Source Software and Libraries
Ellyssa Kroski
 
Power Point Presentation on Open Source Software
opensourceacademy
 
Open Source Technology
priyadharshini murugan
 
Open Source Software Presentation
Henry Briggs
 
OPEN SOURCE SEMINAR PRESENTATION
Ritwick Halder
 
Ad

Similar to The genesis of clusterlib - An open source library to tame your favourite supercomputer (20)

PDF
Open frameworks 101_fitc
benDesigning
 
PDF
Python testing like a pro by Keith Yang
PYCON MY PLT
 
PDF
Hacking the Kinect with GAFFTA Day 1
benDesigning
 
PDF
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
Edge AI and Vision Alliance
 
PDF
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
PPTX
What is Python? An overview of Python for science.
Nicholas Pringle
 
PDF
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
Diego Freniche Brito
 
PDF
05 python.pdf
SugumarSarDurai
 
PPTX
OptView2 - C++ on Sea 2022
Ofek Shilon
 
PPT
Spock Framework
Леонид Ставила
 
PDF
Software Engineering
Tharindu Weerasinghe
 
PDF
PyCon2022 - Building Python Extensions
Henry Schreiner
 
PDF
PyData Boston 2013
Travis Oliphant
 
PDF
python-160403194316.pdf
gmadhu8
 
PPTX
Python Seminar PPT
Shivam Gupta
 
PPTX
Python
Shivam Gupta
 
PPTX
Python programming language presentation
dhanishev1
 
PPTX
HPC Examples
Wendi Sapp
 
PPTX
Kubernetes 101
Stanislav Pogrebnyak
 
PDF
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Adrian Cockcroft
 
Open frameworks 101_fitc
benDesigning
 
Python testing like a pro by Keith Yang
PYCON MY PLT
 
Hacking the Kinect with GAFFTA Day 1
benDesigning
 
"The OpenCV Open Source Computer Vision Library: Latest Developments," a Pres...
Edge AI and Vision Alliance
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
What is Python? An overview of Python for science.
Nicholas Pringle
 
MobileConf 2021 Slides: Let's build macOS CLI Utilities using Swift
Diego Freniche Brito
 
05 python.pdf
SugumarSarDurai
 
OptView2 - C++ on Sea 2022
Ofek Shilon
 
Software Engineering
Tharindu Weerasinghe
 
PyCon2022 - Building Python Extensions
Henry Schreiner
 
PyData Boston 2013
Travis Oliphant
 
python-160403194316.pdf
gmadhu8
 
Python Seminar PPT
Shivam Gupta
 
Python
Shivam Gupta
 
Python programming language presentation
dhanishev1
 
HPC Examples
Wendi Sapp
 
Kubernetes 101
Stanislav Pogrebnyak
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Adrian Cockcroft
 
Ad

Recently uploaded (20)

PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 

The genesis of clusterlib - An open source library to tame your favourite supercomputer

  • 1. The genesis of clusterlib An open source library to tame your favourite supercomputer Arnaud Joly May 2015, Phd hour discussions
  • 2. Use case for the birth of clusterlib Solving supervised learning tasks The goal of supervised learning is to learn a function from input-output pairs in order to predict the output for any new input. A supervised learning task 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 A learnt function 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5
  • 3. Time is running out A huge set of tasks and a practical upper bound O(#datasets×#algorithms×#parameters) ≤ Time before deadline Embarrassingly parallel tasks.
  • 4. Supercomputer = scheduler + workers Supercomputer = cluster of computers
  • 5. CECI is in ! Thanks to the CECI1 at the University of Liège, we have access to 7 supercomputers (or clusters of computers); ≈ 20 000 cores; >60 000 GB ram; 12 high performance scientific GPU. 1 The CECI (http: // www. ceci-hpc. be/ ) is a supercomputer consortium funded by the FNRS.
  • 6. A glimpse to the SLURM scheduler user interface How to sumit a job? First, we need to write a bash file job.sh which specifies resource requirements and calls the program. #!/bin/bash #SBATCH --job-name=job-name #SBATCH --time=10:00 #SBATCH --mem=1000 srun hostname Then you can launch the job in the queue $ sbatch job.sh How to launch easily many jobs?
  • 7. With clusterlib, how to launch easily many jobs?? Let’s generate jobs submission command on the fly! >>> from clusterlib.scheduler import submit >>> script = submit(job_command="srun hostname", ... job_name="test", ... time="10:00", ... memory=1000, ... backend="slurm") >>> print(script) echo ’#!/bin/bash srun hostname’ | sbatch --job-name=test --time=10:00 --mem=1000 >>> # Let’s launch the job >>> os.system(script)
  • 8. A glimpse to the SLURM scheduler user interface How to check if a job is running? To check if a job is running, you can use the squeue command. $ squeue -u ‘whoami‘ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1225128 defq job-9999 someone PD 0:00 1 (Priority) 1225129 defq job-9998 someone PD 0:00 1 (Priority) ... 1224607 defq job-0003 someone R 7:39:16 1 node025 1224605 defq job-0002 someone R 7:43:25 1 node040 1224593 defq job-0001 someone R 8:06:33 1 node035 How to avoid launching running, queued or completed jobs?
  • 9. With clusterlib, how to avoid re-launching ... ? ... running and queued jobs? Let’s give to jobs a unique name, then let’s retrieve the names of running / queued jobs. >>> from clusterlib.scheduler import queued_or_running_jobs >>> queued_or_running_jobs() ["job-0001", "job-0002", ..., "job-9999"] ... completed jobs? A job must indicate through the file system that it has been completed e.g. by creating a file or by registering completion into a database. clusterlib provides a small NO-SQL database based on sqlite3.
  • 10. Simplicity beats complexity With only 4 functions from clusterlib.scheduler import queued_or_running_jobs from clusterlib.scheduler import submit from clusterlib.storage import sqlite3_dumps from clusterlib.storage import sqlite3_loads We launch easily thousands of jobs. We are task DRY (Don’t repeat yourself). We are smoothly working on SLURM and SGE schedulers. We have only a dependency on python,
  • 12. Are we done? Let’s go open source!
  • 13. Why making an open source library? Give back to the open source community. Open source initiative affiliate communities
  • 14. Why making an open source library? Bug reports are great!
  • 15. Why making an open source library? Welcome new contributors ! From left to right: Olivier Grisel, Antonio Sutera, Loic Esteve (no photo) and Konstantin Petrov (no photo).
  • 16. Why making open source in sciences? Reproducibility!
  • 17. Be proud of your code
  • 18. Open source way: Host your code publicly www.myproject.com Another awesome host platform Tip Sit on the shoulders of a giant! Bonus Use a control version system such as git (clusterlib choice), mercurial,. . .
  • 19. Open source way: Choose a license No license = closed source In short You can’t use or even read my code.
  • 20. Open source way: Choose a license No license = closed source GPL-like license = copyleft In short You can read / use / share / modify my code, but derivatives must retains those rights.
  • 21. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive In short Do whatever you want with the code, but keep my name with it.
  • 22. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license.
  • 23. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license. For the sake of wisdom or money, choose carefully.
  • 24. Open source way: Choose a license No license = closed source GPL-like license = copyleft BSD / MIT-style license = permissive For the sake of open source, pick a popular open source license. For the sake of wisdom or money, choose carefully. For the sake of science, go with BSD / MIT-style license. Clusterlib is BSD Licensed.
  • 25. Open source way: Start an issue tracker An issue tracker allows managing and maintaining a list of issues. Tip Sit on the shoulders of giants! (again)
  • 26. Open source way: Let users discuss with core contributors 1. Issue tracker (only viable for small project) 2. Mailing list : sourceforge, google groups, . . . 3. Stack overflow tag for big projects Tip Sit on the shoulders of giants! (again * 2)
  • 28. Are we done? The grand seduction!
  • 29. Know who you are! Vision The goal of the clusterlib is to ease the creation, launch and management of embarrassingly parallel jobs on supercomputers with schedulers such as SLURM and SGE. Core values Pure python, simple, user-friendly!
  • 31. Attrative documentation API documentation is nice. Tip Follow a standard such as "PEP 0257 – Docstring Conventions".
  • 32. Attrative documentation Beautiful doc is better. clusterlib uses sphinx for building its doc.
  • 33. Attrative documentation And even better with examples.
  • 34. Attrative documentation A narrative documentation is awesome.
  • 35. Attrative documentation What’s new? Thanks to all clusterlib contributors!
  • 36. Appealing test suite How good is your test suite (code coverage)? All code lines are hit by the tests (100% line coverage), but not all code paths (branches) are tested. $ make test nosetests clusterlib doc Name Stmts Miss Branch BrMiss Cover Missing ------------------------------------------------------------------ clusterlib 1 0 0 0 100% clusterlib.scheduler 82 0 40 16 87% clusterlib.storage 35 0 14 0 100% ------------------------------------------------------------------ TOTAL 118 0 54 16 91% ------------------------------------------------------------------ Ran 12 tests in 0.189s OK (SKIP=3)
  • 39. Are we done? Not yet! Let’s make our live easier!
  • 40. Automation, automation, . . . Continuous testing clusterlib uses Travis CI (works for many languages). Tip Sit on the shoulders of giants! (again * 3)
  • 41. Automation, automation, . . . Continuous integration pays off in the long term! Ensure test suite is often run. Awesome during development. Tip Continuous integration can be enhanced with code test coverage and code quality coverage.
  • 42. Automation, automation, . . . Continuous doc building Tip Sit on the shoulders of giants! (again * 4)
  • 43. Create and join an open source projects! Join a community! Learn the best practices and technologies! Make your code survive more than the one project! Give your code to the world! scikit-learn sprint 2014
  • 44. A full example of clusterlib usage # main.py import sys, os from clusterlib.storage import sqlite3_dumps NOSQL_PATH = os.path.join(os.environ["HOME"], "job.sqlite3") def main(argv=None): # For ease here, function parameters are sys.argv if argv is None: argv = sys.argv # Do heavy computation # Save script evaluation on the hard disk if __name__ == "__main__": main() # Great, the jobs is done! sqlite3_dumps({" ".join(sys.argv): "JOB DONE"}, NOSQL_PATH)
  • 45. A full example of clusterlib usage # launcher.py import sys from clusterlib.scheduler import queued_or_running_jobs from clusterlib.scheduler import submit from clusterlib.storage import sqlite3_loads from main import NOSQL_PATH if __name__ == "__main__": scheduled_jobs = set(queued_or_running_jobs()) done_jobs = sqlite3_loads(NOSQL_PATH) for param in range(100): job_name = "job-param=%s" % param job_command = ("%s main.py --param %s" % (sys.executable, param)) if (job_name not in scheduled_jobs and job_command not in done_jobs): script = submit(job_command, job_name=job_name)