The 10 Best Practices
                 for Workflow Design
                               BioVeL M6 Workshop
                             Göteborg, May 10-11, 2012
         Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid)
Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)
Our specialty: Knowledge Discovery
                                                                       http://biosemantics.org




                   Disambiguation*
                     Text Mining




                             Substrates for
                              Knowledge
                              Discovery

                                                         Methods for
                                                      Knowledge Discovery


                       Applications
                       •Predict protein-protein, protein-disease associations, gene prioritization
                       •Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome
                       •Yours?


* Global disambiguation initiative: http://snipurl.com/conceptweballiance                            2
Introduction
                               Why build good workflows?


Good workflow design = good science!




                                                         3
Introduction
                      Best practices for workflow design




 Best Practices for workflow design
                  =
Best Practices experimental science
                  +
Best Practices software engineering



                                                        4
1
Make a sketch workflow




                         5
Best practice 1
                                     Sketch an Abstract Workflow




Powerpoint courtersy of Eleni Mina

                                                                 6
2
Use modules




              7
http://www.myexperiment.org/workflows/74.html


                                                8
3
     Think about the output
(and the data in your workflow in general)




                                             9
Best practice 3
Think about the output




    ?
            http://...




                         10
4
Provide example inputs and outputs




                                     11
Taverna 2.3 Recipe
      Taverna 2.4
  Select input/output
Right-click input/output
  Select tab ‘Details’
  Select ‘Annotation’
   Click ‘Annotation’
     Add Example
     Add Example



                           12
5
Annotate




           13
Best practice 5
                       Annotate
Each component in
 Taverna can be
    annotated




                                14
Best practice 5
Annotate and help your users




                            15
6
Make workflow executable from
 outside the local environment



                                 16
Best practice 6
                                         Make workflow executable by others

How to check that others can execute your workflow?


» Try it!                                               Proof of executability
   › Ask a colleague
   › Use an external t2web runner

» Tips
   › Use Web Services
   › If you use local command line tools
      • Install tools on a publicly accessible server (e.g. applies to Rserve)
      • Use system that your users can set up (e.g. BioLinux)



                                                                                 17
7
Choose services carefully




                            18
Best practice 7
Choose services carefully




                         19
Best practice 7
Choose services carefully




                         20
8
Reuse existing workflows




                           21
Best practice 8
                                                             The reuse workflow


                                                           Not a best practice,
                                                           but a tip: know-how is
   Check                                                   important for reuse
                        Contact authors
workflows on
                 Neg.       Retry
myExperiment
       Pos.                                       Use scripts from
                                           Neg.
                                                    colleagues

Check services                                               Search the
                        Contact authors
     on                                                       internet
                 Neg.       Retry
 BioCatalogue
       Pos.
                                                                     Invent a new
                                                                        wheel


                        Reuse, Attribute
                        Respect licences


                                                                                    22
9
Advertise




            23
Advertise




 Unique reference for
in your papers and for
     others to cite




                                     24
10
Maintain




           25
Best Practice 10
                                                                     Maintain

Best practices to support maintenance

» Regularly check your workflow
   › Ask colleagues
» Enable support for maintenance
   › Register your workflow on myExperiment
   › Register Web Services on
» Enable peers to repair: annotate!

» Note about versioning
   › No need to register all edits on myExperiment: use subversion
   › Register important updates on myExperiment


                                                                             26
Bonus tip
Use common sense as scientist




                                27
Workflow Forever
                Preservation of good workflows for
                        future applications
 Workflow 74
 “Protein Discovery”
 2005




Workflow 2876
“Match gene lists
by literature” 2012




  Workflow 2805
  “Get Pathway genes”
   2012



                                                     28
Wf4Ever
  Outcomes for BioVeL




myExperiment 2.0
BioCatalogue
Taverna



Research Objects
Linked Data

Methods
Protocols for
   Preservation
   and
   Conservation


                  29
The 10 Best Practices of Workflow Design
                                                                                Thank you

Thank you for your attention
More information:
http://snipurl.com/workflowbestpractices

1.    Make a sketch workflow
2.    Use modules
3.    Think about the output
4.    Provide example inputs and outputs
5.    Annotate
6.    Make it executable from outside the local environment
7.    Choose services carefully
8.    Reuse existing workflows
9.    Advertise
10.   Maintain


                                                                                          30
Wf4Ever tooling
Sneak preview




             31
Supporting information
                                                             Workflow jargon



› Scientific workflow
  Paradigm to describe, manage, and share complex scientific analyses
› Workflow system
  Software to design, execute, and monitor scientific workflows
› Module
  = nested workflow = workflow in a workflow = workflow component
› Beanshell script
  A Java-based scripting language.
  Typically used for data type conversions in Taverna.
› Provenance
  History or trace of a workflow run.
  Allows you to look at intermediate data, which workflows and services
  were run, with what data.


                                                                              32

More Related Content

PPTX
Workflow User Interfaces Patterns
PDF
Adam shiwa summerschool 2012
PDF
Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy...
PPT
Struts2-Spring=Hibernate
PPTX
Parallel batch processing with spring batch slideshare
PDF
How to create a workflow
PDF
Mastering JIRA Workflow - Atlassian Summit 2010
ODT
Workflow Usage Best Practices
Workflow User Interfaces Patterns
Adam shiwa summerschool 2012
Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy...
Struts2-Spring=Hibernate
Parallel batch processing with spring batch slideshare
How to create a workflow
Mastering JIRA Workflow - Atlassian Summit 2010
Workflow Usage Best Practices

Viewers also liked (20)

PPT
Workflow Presentation
PDF
Workflow Strategies ppt
TXT
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
PDF
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
DOCX
Accounts Payable (AP) Process Flow
PPTX
DENTAL PLASTER
PPTX
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’
PPT
Curettes Clinical Application Guide
PDF
Oracle R12 Upgrade Lessons Learned
PPT
Composite restoration
PPTX
Avaya one touch video customer presentation march 1 2012
DOCX
B2B Branding from Tata steel
PPTX
types and classification of dental implants
PPTX
Customer Relationship Marketing CRM
PDF
Automotive Industry Analysis of the Big 3
PDF
Introduction to basic principles of pharmacology
PPTX
Customer Relationship Management - Case Study [Mercedes Benz]
PPT
Planning for New Hospital
Workflow Presentation
Workflow Strategies ppt
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Accounts Payable (AP) Process Flow
DENTAL PLASTER
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’
Curettes Clinical Application Guide
Oracle R12 Upgrade Lessons Learned
Composite restoration
Avaya one touch video customer presentation march 1 2012
B2B Branding from Tata steel
types and classification of dental implants
Customer Relationship Marketing CRM
Automotive Industry Analysis of the Big 3
Introduction to basic principles of pharmacology
Customer Relationship Management - Case Study [Mercedes Benz]
Planning for New Hospital
Ad

Similar to 10 Best Practices for Workflow Design (20)

PPTX
WORKS 11 Presentation
PPT
DCC Keynote 2007
PDF
OAI7 Research Objects
PPT
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
PPT
IMPACT/myGrid Hackathon - Introduction to Taverna
PDF
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
PPTX
Scientific Workflows Systems :In Drug discovery informatics
PDF
OeRC Seminar
PPTX
Redesign unit 6 repaired
KEY
OOER OER10
PDF
2012 03-28 Wf4ever, preserving workflows as digital research objects
PDF
A science-gateway workload archive application to the self-healing of workflo...
PPT
Why Workflows Break
PPTX
Status update OEG - Nov 2012
PPTX
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
PPT
PRESENTATION: Tips and Tricks for Government Agencies to Push the Limits of P...
PPT
Collaboration and Sharing
PDF
Methodologies for Cross-Border Living Labs Networking Hans Schaffer
PDF
Nintex Workflow 2010 Flyer
PPTX
Ogce Workflow Suite Tg09
WORKS 11 Presentation
DCC Keynote 2007
OAI7 Research Objects
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
IMPACT/myGrid Hackathon - Introduction to Taverna
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Scientific Workflows Systems :In Drug discovery informatics
OeRC Seminar
Redesign unit 6 repaired
OOER OER10
2012 03-28 Wf4ever, preserving workflows as digital research objects
A science-gateway workload archive application to the self-healing of workflo...
Why Workflows Break
Status update OEG - Nov 2012
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
PRESENTATION: Tips and Tricks for Government Agencies to Push the Limits of P...
Collaboration and Sharing
Methodologies for Cross-Border Living Labs Networking Hans Schaffer
Nintex Workflow 2010 Flyer
Ogce Workflow Suite Tg09
Ad

Recently uploaded (20)

PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Unlock new opportunities with location data.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Architecture types and enterprise applications.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
STKI Israel Market Study 2025 version august
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPT
What is a Computer? Input Devices /output devices
PPTX
The various Industrial Revolutions .pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Five Habits of High-Impact Board Members
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
O2C Customer Invoices to Receipt V15A.pptx
Developing a website for English-speaking practice to English as a foreign la...
Unlock new opportunities with location data.pdf
sustainability-14-14877-v2.pddhzftheheeeee
Architecture types and enterprise applications.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
STKI Israel Market Study 2025 version august
DP Operators-handbook-extract for the Mautical Institute
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
What is a Computer? Input Devices /output devices
The various Industrial Revolutions .pptx
Getting started with AI Agents and Multi-Agent Systems
Hindi spoken digit analysis for native and non-native speakers
A novel scalable deep ensemble learning framework for big data classification...
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Five Habits of High-Impact Board Members
Group 1 Presentation -Planning and Decision Making .pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
O2C Customer Invoices to Receipt V15A.pptx

10 Best Practices for Workflow Design

  • 1. The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid) Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)
  • 2. Our specialty: Knowledge Discovery http://biosemantics.org Disambiguation* Text Mining Substrates for Knowledge Discovery Methods for Knowledge Discovery Applications •Predict protein-protein, protein-disease associations, gene prioritization •Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome •Yours? * Global disambiguation initiative: http://snipurl.com/conceptweballiance 2
  • 3. Introduction Why build good workflows? Good workflow design = good science! 3
  • 4. Introduction Best practices for workflow design Best Practices for workflow design = Best Practices experimental science + Best Practices software engineering 4
  • 5. 1 Make a sketch workflow 5
  • 6. Best practice 1 Sketch an Abstract Workflow Powerpoint courtersy of Eleni Mina 6
  • 9. 3 Think about the output (and the data in your workflow in general) 9
  • 10. Best practice 3 Think about the output ? http://... 10
  • 11. 4 Provide example inputs and outputs 11
  • 12. Taverna 2.3 Recipe Taverna 2.4 Select input/output Right-click input/output Select tab ‘Details’ Select ‘Annotation’ Click ‘Annotation’ Add Example Add Example 12
  • 14. Best practice 5 Annotate Each component in Taverna can be annotated 14
  • 15. Best practice 5 Annotate and help your users 15
  • 16. 6 Make workflow executable from outside the local environment 16
  • 17. Best practice 6 Make workflow executable by others How to check that others can execute your workflow? » Try it! Proof of executability › Ask a colleague › Use an external t2web runner » Tips › Use Web Services › If you use local command line tools • Install tools on a publicly accessible server (e.g. applies to Rserve) • Use system that your users can set up (e.g. BioLinux) 17
  • 19. Best practice 7 Choose services carefully 19
  • 20. Best practice 7 Choose services carefully 20
  • 22. Best practice 8 The reuse workflow Not a best practice, but a tip: know-how is Check important for reuse Contact authors workflows on Neg. Retry myExperiment Pos. Use scripts from Neg. colleagues Check services Search the Contact authors on internet Neg. Retry BioCatalogue Pos. Invent a new wheel Reuse, Attribute Respect licences 22
  • 24. Advertise Unique reference for in your papers and for others to cite 24
  • 26. Best Practice 10 Maintain Best practices to support maintenance » Regularly check your workflow › Ask colleagues » Enable support for maintenance › Register your workflow on myExperiment › Register Web Services on » Enable peers to repair: annotate! » Note about versioning › No need to register all edits on myExperiment: use subversion › Register important updates on myExperiment 26
  • 27. Bonus tip Use common sense as scientist 27
  • 28. Workflow Forever Preservation of good workflows for future applications Workflow 74 “Protein Discovery” 2005 Workflow 2876 “Match gene lists by literature” 2012 Workflow 2805 “Get Pathway genes” 2012 28
  • 29. Wf4Ever Outcomes for BioVeL myExperiment 2.0 BioCatalogue Taverna Research Objects Linked Data Methods Protocols for Preservation and Conservation 29
  • 30. The 10 Best Practices of Workflow Design Thank you Thank you for your attention More information: http://snipurl.com/workflowbestpractices 1. Make a sketch workflow 2. Use modules 3. Think about the output 4. Provide example inputs and outputs 5. Annotate 6. Make it executable from outside the local environment 7. Choose services carefully 8. Reuse existing workflows 9. Advertise 10. Maintain 30
  • 32. Supporting information Workflow jargon › Scientific workflow Paradigm to describe, manage, and share complex scientific analyses › Workflow system Software to design, execute, and monitor scientific workflows › Module = nested workflow = workflow in a workflow = workflow component › Beanshell script A Java-based scripting language. Typically used for data type conversions in Taverna. › Provenance History or trace of a workflow run. Allows you to look at intermediate data, which workflows and services were run, with what data. 32

Editor's Notes

  • #4: Designing a good workflow is part of doing good research!
  • #5: This means that if you know about one or both of them, you should apply their principles to workflow design as well. (At the end we can say that using common sense about doing good science is a general best practice for creating workflows too.) Workflow design is a variant of software design Define hypothesis and approach Sketch a workflow of the approach Implement workflow Trial and error (iterate) Comment: where are the workflow design patterns?
  • #7: Boxes without content, can be in Taverna using e.g. empty script boxes, a powerpoint flow chart, or a napkin; if it is digital (e.g. Taverna) then we can store it digitally. < Comment: add concept mining workflow and a sketch Cite Eleni: 'helps me to share workflow while developing it, that makes it better‘ > How? In Taverna using empty beanshells In PowerPoint In a sketch book Why? Provides a reference point of the main task(s) of the workflow through the implementation process Promots sharing between computer and workflow systems due to its non-explicit nature Helps design experiment Helps communication (supervisors, colleagues)
  • #9: The workflow on the left explains the basic steps of a text mining process. The expanded workflow is much harder to understand. We can use each nested workflow as a workflow on its own. How? Describe and implement each of the executable processes in a workflow individually and independently In Taverna this can be done through nested workflows Why? Facilitates independent testing and validation of the execution of each of the individual modules Encourages re-use Note: Make sure that you publish the separate modules as well as the final nested workflow (unfortunately, myExperiment does not support this very well), or at least annotate the components when you publish the whole
  • #11: How? Consider if you want to populate data models/databases or create outputs of disconnected collections of files Consider who the results are for (overview for users, or the next workflow component) General advice: at least have a report as an output (provenance will have the separate parts anyway) Use Taverna for provenance collection (intermediate results are captured by provenance engine) Why? Easier to think about this at the design stage than trying to adjust a ready workflow Structure potentially large output data
  • #13: How? Example inputs and outputs can be recorded in Taverna Alternatively: add input or output files to a pack containing the workflow Use real example data Why? To help understand the workflow For validation For maintenance Note: Make sure that the input and the output examples are coupled. Keep in mind that the output has a timestamp. It may change due to changes in underlying databases.
  • #15: How? Choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow. Focus on how a component is used in this workflow and why it is in there. If it exists, reference to information about what the component does in general (e.g. by referencing a service on BioCatalogue) Assume that a referenced resource may disappear or change at some time in the future Use Taverna description fields and example fields*. Taverna keeps it with the workflow and myExperiment uses this information. Keep any notes that are related to the workflow, but not part of it, linked to it* Example of useful "extra" information: execution time, keywords, contact information, attribution myExperiment offers some of this, but best to put it in the workflow descriptions Why? Doing good science Record what is needed for a publication later on Increase re-usability Cite Kostas: ‘many workflows are badly documented computer programs' The wf4ever project will provide additional support (and incentives) for describing (the purpose of) workflow components, related objects and references (e.g. data sets), and support for storing the elements of an experiment with their metadata in a structured way.
  • #16: Facilitate understanding and reuse
  • #18: How? Use Web Services, any Taverna widget except external tool, and external tool only when it runs over ssh on publicly accessible server Use Taverna with local tools, but installed on a publicly accessible server with the Taverna server Use local tools from an easy to set up environment such as biolinux (only for a certain niche of users) TRY IT!! Why? Others will be able to run the workflow Proof of reproducibility
  • #20: How? Choose the service that is reliable based on: BioCatalogue reliability statistics (in practice: check on biocatalogue if it has a green light (momentarily not much more you can do)) How often it is used in other workflows Contact with service providers. Communicate! The reputation of the institution providing the service check trustworthiness of service provider (can also be a person, of whom you can check if they will remain at an institution to maintain the service) Why? Prevent workflow decay, prolong the life of the workflow Note to service developers: Many work around and ugly workflow practices come from having to deal with badly behaved services!
  • #21: Web Services are digital, their creators not. Communication saves web services and workflows from decay.
  • #23: A common misconception is that because they are workflows, they are automatically stable. It takes effort and often communication to reuse work, especially when using ‘state-of-the-art’ products made by scientists. How? Make your own workflows modular since this promotes reuse Search myExperiment and filter on most downloaded or most viewed Check if it has been used in a publication Use your contacts: maybe someone has tried to solve something similar before using a workflow? Try and try harder, contact authors! Why? Another user that is familiar with one of your workflows, is more likely to understand another workflow that you designed Beneficial when repairing workflows: By repairing a given workflow may entails repairing the workflows in which it is used as a subworkflow Fights redundancy Note: attribute others and respect licenses
  • #25: http://myExperiment.org/workflows/74?version=12 http://myExperiment.org/packs/258 How? Share your workflow on (don’t forget contact info!): myExperiment other social media e-mailing it around to colleagues Cite your workflow when publishing, using a stable identifier like myExperiment Make use of the pack functionality in myExperiment to bundle your workflow with other important documents such as a publication Why? Good science – share your results Get cited – fame! Progress, let others build on your work without reinventing it
  • #27: How? Act on information about services that are deprecated by changing services providing a note that that specific process in the workflow in not executable anymore Put your services on BioCatalogue (don't have to be the owner) and your workflows on myExperiment (notification iits planned) Regularly test the workfow (like 'unit tests') Why? Good practice – this is already demanded for some types of publications, like an application note in Bioinformatics Fight workflow decay, prolong the life of the workflow
  • #30: A Scientific Workflow can be seen as the combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving i.e. the implementation of a scientific method Need to be preserved (and conserved). More on this later.
  • #31: Could we skip this slide to save time?