SlideShare a Scribd company logo
www.wf4ever-project.org




Scientific Data Management -
  From the Lab to the Web
      José Manuel Gómez Pérez, iSOCO

         Semantic Data Management
             Dagstuhl Seminar
             22-27 April 2012
The data deluge
                                                          Some facts

                             »    In 2010 the size of the digital
                                  universe exceeded 1 Zettabyte
                                  (=1 trillion Gb)
                             »    1.8 Zb in 2011
                             »    35 Zb expected in 2020

                             »    90% unstructured data
                             »    70% user-generated
                             »    75% resulting from data copying,
                                  merging, and transforming

                             »    Metadata is the fastest growing
                                  data category
                             »    Much of such data is dynamic,
                                  real-time, volatile

Source: IDC ‘s The 2011 Digital Universe Study
       – Extracting Value from Chaos

                                                                     2
Dealing with dynamicity
                                         Two main challenges


» Challenge 1: Identifying and
  structuring the relevant portions of
  the data for the task at hand
   › First-class data citizens
» Challenge 2: Managing the lifecycle
  of data entities
   › Preservation
   › Evolution and versioning
   › Decay                         Both technical and
                                 social aspects involved

                                                               3
The Research Lifecycle
                                                Workflows in the Scientific Method


Background
 Hypothesis                           Results           Scientific
                   Experiment         Results
Assumptions                            (data)         Interpretation       Publication
                                       (Data)
 Input data
   Method


   Example: Genome-Wide Association Studies




                                                                                         4
Workflow-based Science
              What is a Scientific Workflow?


»    A mechanism for coordinating the
     execution of services and linking together
     resources.

»    The combination of data and processes
     into a configurable, structured set of steps
     that implement semi-automated
     computational solutions in scientific
     problem-solving


    Scientific workflows are at the core of
    scientific data management
        › Enable automation
        › Encourage best practices




                                                    5
Challenge 1

 Identifying and structuring
the relevant portions of the
  data for the task at hand

    First-class data citizens
Questions for Scientific Data and Workflows                        Issues
Who are you ?                                               Identity and Description
Where and when were you born ?                                     Authenticity
Who were your parents (creators) ?                                 Uniqueness
For which purpose were you conceived and have been used ?      Reuse, Repurpose

What do you have inside ?                                         Inspection
                                                                  Visualization
                                                                  Annotations
How is your content linked ?                                Graphical Representation
May I access all your parts ?                                    Access Rights
Which parts can I replace ?                                       Adaptability
What have they done to you ?                                      Provenance
Who and When ?                                                     Versioning
Why did they do that ?


Why have you been recommended to me ?                         Information Quality
Can I believe what you are saying or trust your results ?

Do you still produce the same results ?                         Reproducibility
Are you still working ?                                          Completeness
How could I repair you ?                                           Stability

How could I thank you ?                                              Credit
How could I talk about you ?                                                           7
Challenge 1: Identifying and structuring the relevant data
                                         Research Objects as Technical Objects

Carriers of Research Context                       Third Party     Alien
» Referentiable                        Distributed  Tenancy        Store
» Aggregation, Dispersed
    › Heterogeneous
    › Local and External
» Annotated metadata
    › Provenance
    › Structured: Manifests,
      Recipes, Permissions,
      Discourse
» Lifecycle
    › Publishing, Evolution
    › Versioning
» Mixed Stewardship
    › Graceful Degradation
» Sharing
    » Security & Privacy
                                       Technical Objects              Social Objects
» Stereotypical User Profiles
» Services
                               OAI-ORE                                                   8
Research Objects as Social Objects




                    Package,
                    Explore, Inspect,
                    Review,
                    Exchange,
                    Share, Reuse,
                    Publish, Credit




9      9
                                    9
http://purl.org/wf4ever/ro#
                                                   Research Object model core (simplified)

    RO specification: http://wf4ever.github.com/ro


                                       ore:aggregates
                                                          ro:ResearchObject
               ro:Resource
                                                                                       ore:isDescribedBy



                                                                                           ro:Manifest
    wfdesc:Workflow

                     ro:annotatesAggregatedResource        ro:AggregatedAnnotation



›    ro (aggregation and annotation)           Note: This figure shows a simplified view of the RO core.
›    wfdesc (workflow description)
›    Minim* (minimum info model)
›    wfprov (workflow provenance)
›    roprov (RO provenance)
›    roevo (evolution model)                                                                                   10
                                                                           *Minim   based on M. Gamble’s MIM
Challenge 2

Managing the lifecycle of
     data entities

   Evolution and Decay
Challenge 2: Managing the lifecycle of data entities
                 RO Evolution & Versioning




                                                 12
Challenge 2: Managing the lifecycle of data entities
                                                                       RO Decay



Workflow Decay
•   Component level
•   flux/decay/unavailability
•   Data level
•   Infrastructure level

Experiment Decay
•   Methodological changes
•   New technologies
•   New resources/components
•   New data




                                                                                 13
Preservation, Conservation, Recreating


Preserving
Archived Record
Fixed Snapshots
Review
Rerun & Replay

Conserving
Active Instrument
Live
Rerun & Reuse
Repair & Restore

Recreating
Archived Record
Active Instrument
Live
Rebuild Recycle Repurpose

                                                                     14
Challenge 2: Managing the lifecycle of data entities
    Possible types of decay (an example)




                                                 15
Decay Analysis
                    A Taxonomy of RO decay



1. Service tool is missing
2. Service file descriptor disappeared
3. Service up but not contactable
4. Service up but functionality changed
5. Local software dependencies
6. Data unavailability
7. Changes in data formats
8. Chained dependency
9. Credentials deprecated
10. Input data superseded by other data
11. RO metadata outdated (upon versioning)
12. Old fashioned RO
13. External references lose credit
14. Execution framework no longer available

                                              16
A taxonomy of workflow decay
      Sample decay type




                         17
Decay Analysis
                                    1.0 Certificate – Evaluation of Stability and Completeness

                                               1.0 Certificate of quality

                           Stability                                        Completeness



      Is the RO free from any form of decay                   Is the minimal aggregation of
      preventing workflow execution?                          resources encapsulated by the RO
                                                              consistent?


      »    Focus on reproducibility                           »   RO checklists
      »    Assisted detection of RO decay                     »   Produced by scientists
      »    Active monitoring on decay forms                   »   Automatically checked against
      »    RO and workflow provenance                             minimal model (minim)
                                                              »   RO evolution

      »    Notification
      »    Explanation


                                                                                                      18
1.0 Certificate notion originally proposed by Yde de Jong
Recap
                                      Lessons learnt


Scalability   » Data with a Purpose

              » Encapsulate & Conquer
                 › Goal-driven (purpose)
                 › Aggregation
                 › Community-managed

              » Nothing is immutable,
Provenance      especially data.
                 › Foster evolution
                 › Monitor decay

                                                  19
Thanks for your Attention!
                                               Questions




 Any Questions?

http://www.wf4ever-project.org/




                                                         20

More Related Content

PDF
Provenance Management to Enable Data Sharing
University of Arizona
 
PPT
Collaboration and Sharing
Jisc
 
PPTX
Innovations in Scholarly Communication and the Rise of Web 2.0 Scholarship
Thomas King
 
PPT
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Dimitrios Koureas
 
PDF
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Jose Enrique Ruiz
 
PDF
HBT - A Revolutionary Approach to Testing Software
STAG Software Private Limited
 
PDF
Paper talk: Idcc 11
Paolo Missier
 
Provenance Management to Enable Data Sharing
University of Arizona
 
Collaboration and Sharing
Jisc
 
Innovations in Scholarly Communication and the Rise of Web 2.0 Scholarship
Thomas King
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Dimitrios Koureas
 
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Jose Enrique Ruiz
 
HBT - A Revolutionary Approach to Testing Software
STAG Software Private Limited
 
Paper talk: Idcc 11
Paolo Missier
 

What's hot (8)

PDF
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Wolfgang Reinhardt
 
PPT
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Institute for Knowledge Mobilization
 
PPT
Knowledge mobilization
Integrated Knowledge Services
 
PDF
Qiagram
shc66columbia
 
PPTX
The changing scholarly content and communication landscape
Laura Czerniewicz
 
PDF
2012 Taiwan UX Summit 工作坊A 簡報
UXTW(Taiwan User Experience Professional Association)
 
KEY
Programming Education based on Jigsaw
Sahmyook Universivy
 
PPTX
Digital Scholar
tanbob
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Wolfgang Reinhardt
 
Albert Simard - Mobilizing Knowledge: Acquisition, Analysis, and Action
Institute for Knowledge Mobilization
 
Knowledge mobilization
Integrated Knowledge Services
 
Qiagram
shc66columbia
 
The changing scholarly content and communication landscape
Laura Czerniewicz
 
2012 Taiwan UX Summit 工作坊A 簡報
UXTW(Taiwan User Experience Professional Association)
 
Programming Education based on Jigsaw
Sahmyook Universivy
 
Digital Scholar
tanbob
 
Ad

Similar to Scientific data management from the lab to the web (20)

PDF
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
PPT
Knowledge Infrastructure for Global Systems Science
David De Roure
 
PPTX
2013-01-17 Research Object
Stian Soiland-Reyes
 
PDF
2012 03-28 Wf4ever, preserving workflows as digital research objects
Stian Soiland-Reyes
 
PDF
OeRC Seminar
seanb
 
PPTX
myExperiment and the Rise of Social Machines
David De Roure
 
KEY
RESTFul Services, Does it Matter Anymore?
Pat Cappelaere
 
PPTX
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
National Information Standards Organization (NISO)
 
PDF
Workflow Preservation
Jose Enrique Ruiz
 
PDF
Dileo Presentation (in English)
Giannis Tsakonas
 
PPT
discopen
Jisc
 
PDF
OAI7 Research Objects
seanb
 
PDF
Research Objects in Wf4Ever
Jose Enrique Ruiz
 
PPTX
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
Lee Dirks
 
PPT
Supporting Libraries in Leading the Way in Research Data Management
Marieke Guy
 
PPT
Metadata in general and Dublin Core in specific; some experiences
Kerstin Forsberg
 
PDF
IASSIT Kansa Presentation
ekansa
 
PDF
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
Pooyan Jamshidi
 
ODP
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
PPTX
Preserving the Inputs and Outputs of Scholarship
tsbbbu
 
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
Knowledge Infrastructure for Global Systems Science
David De Roure
 
2013-01-17 Research Object
Stian Soiland-Reyes
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
Stian Soiland-Reyes
 
OeRC Seminar
seanb
 
myExperiment and the Rise of Social Machines
David De Roure
 
RESTFul Services, Does it Matter Anymore?
Pat Cappelaere
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
National Information Standards Organization (NISO)
 
Workflow Preservation
Jose Enrique Ruiz
 
Dileo Presentation (in English)
Giannis Tsakonas
 
discopen
Jisc
 
OAI7 Research Objects
seanb
 
Research Objects in Wf4Ever
Jose Enrique Ruiz
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
Lee Dirks
 
Supporting Libraries in Leading the Way in Research Data Management
Marieke Guy
 
Metadata in general and Dublin Core in specific; some experiences
Kerstin Forsberg
 
IASSIT Kansa Presentation
ekansa
 
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
Pooyan Jamshidi
 
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
Preserving the Inputs and Outputs of Scholarship
tsbbbu
 
Ad

More from Jose Manuel Gómez-Pérez (9)

PPTX
Science religion-dsmeetupv1.0
Jose Manuel Gómez-Pérez
 
PDF
Trust and linked data jmgomez-v1.1
Jose Manuel Gómez-Pérez
 
PPT
Halo Pcs Kcap2007 V2
Jose Manuel Gómez-Pérez
 
PDF
Acquisition And Understanding Of Process Knowledgev1 1
Jose Manuel Gómez-Pérez
 
PPT
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
Jose Manuel Gómez-Pérez
 
PDF
Next Challenges in Corporate Knowledge Management
Jose Manuel Gómez-Pérez
 
PDF
Provenance: From e-Science to the Web Of Data
Jose Manuel Gómez-Pérez
 
PPTX
Tecnologías Semánticas en Salud
Jose Manuel Gómez-Pérez
 
PDF
Provenance and Trust
Jose Manuel Gómez-Pérez
 
Science religion-dsmeetupv1.0
Jose Manuel Gómez-Pérez
 
Trust and linked data jmgomez-v1.1
Jose Manuel Gómez-Pérez
 
Halo Pcs Kcap2007 V2
Jose Manuel Gómez-Pérez
 
Acquisition And Understanding Of Process Knowledgev1 1
Jose Manuel Gómez-Pérez
 
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
Jose Manuel Gómez-Pérez
 
Next Challenges in Corporate Knowledge Management
Jose Manuel Gómez-Pérez
 
Provenance: From e-Science to the Web Of Data
Jose Manuel Gómez-Pérez
 
Tecnologías Semánticas en Salud
Jose Manuel Gómez-Pérez
 
Provenance and Trust
Jose Manuel Gómez-Pérez
 

Recently uploaded (20)

PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Software Development Methodologies in 2025
KodekX
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 

Scientific data management from the lab to the web

  • 1. www.wf4ever-project.org Scientific Data Management - From the Lab to the Web José Manuel Gómez Pérez, iSOCO Semantic Data Management Dagstuhl Seminar 22-27 April 2012
  • 2. The data deluge Some facts » In 2010 the size of the digital universe exceeded 1 Zettabyte (=1 trillion Gb) » 1.8 Zb in 2011 » 35 Zb expected in 2020 » 90% unstructured data » 70% user-generated » 75% resulting from data copying, merging, and transforming » Metadata is the fastest growing data category » Much of such data is dynamic, real-time, volatile Source: IDC ‘s The 2011 Digital Universe Study – Extracting Value from Chaos 2
  • 3. Dealing with dynamicity Two main challenges » Challenge 1: Identifying and structuring the relevant portions of the data for the task at hand › First-class data citizens » Challenge 2: Managing the lifecycle of data entities › Preservation › Evolution and versioning › Decay Both technical and social aspects involved 3
  • 4. The Research Lifecycle Workflows in the Scientific Method Background Hypothesis Results Scientific Experiment Results Assumptions (data) Interpretation Publication (Data) Input data Method Example: Genome-Wide Association Studies 4
  • 5. Workflow-based Science What is a Scientific Workflow? » A mechanism for coordinating the execution of services and linking together resources. » The combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving Scientific workflows are at the core of scientific data management › Enable automation › Encourage best practices 5
  • 6. Challenge 1 Identifying and structuring the relevant portions of the data for the task at hand First-class data citizens
  • 7. Questions for Scientific Data and Workflows Issues Who are you ? Identity and Description Where and when were you born ? Authenticity Who were your parents (creators) ? Uniqueness For which purpose were you conceived and have been used ? Reuse, Repurpose What do you have inside ? Inspection Visualization Annotations How is your content linked ? Graphical Representation May I access all your parts ? Access Rights Which parts can I replace ? Adaptability What have they done to you ? Provenance Who and When ? Versioning Why did they do that ? Why have you been recommended to me ? Information Quality Can I believe what you are saying or trust your results ? Do you still produce the same results ? Reproducibility Are you still working ? Completeness How could I repair you ? Stability How could I thank you ? Credit How could I talk about you ? 7
  • 8. Challenge 1: Identifying and structuring the relevant data Research Objects as Technical Objects Carriers of Research Context Third Party Alien » Referentiable Distributed Tenancy Store » Aggregation, Dispersed › Heterogeneous › Local and External » Annotated metadata › Provenance › Structured: Manifests, Recipes, Permissions, Discourse » Lifecycle › Publishing, Evolution › Versioning » Mixed Stewardship › Graceful Degradation » Sharing » Security & Privacy Technical Objects Social Objects » Stereotypical User Profiles » Services OAI-ORE 8
  • 9. Research Objects as Social Objects Package, Explore, Inspect, Review, Exchange, Share, Reuse, Publish, Credit 9 9 9
  • 10. http://purl.org/wf4ever/ro# Research Object model core (simplified) RO specification: http://wf4ever.github.com/ro ore:aggregates ro:ResearchObject ro:Resource ore:isDescribedBy ro:Manifest wfdesc:Workflow ro:annotatesAggregatedResource ro:AggregatedAnnotation › ro (aggregation and annotation) Note: This figure shows a simplified view of the RO core. › wfdesc (workflow description) › Minim* (minimum info model) › wfprov (workflow provenance) › roprov (RO provenance) › roevo (evolution model) 10 *Minim based on M. Gamble’s MIM
  • 11. Challenge 2 Managing the lifecycle of data entities Evolution and Decay
  • 12. Challenge 2: Managing the lifecycle of data entities RO Evolution & Versioning 12
  • 13. Challenge 2: Managing the lifecycle of data entities RO Decay Workflow Decay • Component level • flux/decay/unavailability • Data level • Infrastructure level Experiment Decay • Methodological changes • New technologies • New resources/components • New data 13
  • 14. Preservation, Conservation, Recreating Preserving Archived Record Fixed Snapshots Review Rerun & Replay Conserving Active Instrument Live Rerun & Reuse Repair & Restore Recreating Archived Record Active Instrument Live Rebuild Recycle Repurpose 14
  • 15. Challenge 2: Managing the lifecycle of data entities Possible types of decay (an example) 15
  • 16. Decay Analysis A Taxonomy of RO decay 1. Service tool is missing 2. Service file descriptor disappeared 3. Service up but not contactable 4. Service up but functionality changed 5. Local software dependencies 6. Data unavailability 7. Changes in data formats 8. Chained dependency 9. Credentials deprecated 10. Input data superseded by other data 11. RO metadata outdated (upon versioning) 12. Old fashioned RO 13. External references lose credit 14. Execution framework no longer available 16
  • 17. A taxonomy of workflow decay Sample decay type 17
  • 18. Decay Analysis 1.0 Certificate – Evaluation of Stability and Completeness 1.0 Certificate of quality Stability Completeness Is the RO free from any form of decay Is the minimal aggregation of preventing workflow execution? resources encapsulated by the RO consistent? » Focus on reproducibility » RO checklists » Assisted detection of RO decay » Produced by scientists » Active monitoring on decay forms » Automatically checked against » RO and workflow provenance minimal model (minim) » RO evolution » Notification » Explanation 18 1.0 Certificate notion originally proposed by Yde de Jong
  • 19. Recap Lessons learnt Scalability » Data with a Purpose » Encapsulate & Conquer › Goal-driven (purpose) › Aggregation › Community-managed » Nothing is immutable, Provenance especially data. › Foster evolution › Monitor decay 19
  • 20. Thanks for your Attention! Questions Any Questions? http://www.wf4ever-project.org/ 20

Editor's Notes

  • #10: In this scenario student Dennis has made a conceptual workflow that takes the result of a gene expression experiment (activity values of all genes under two conditions: with/without a chemical compound). The wet laboratory experiment was done by others then Dennis. He makes a note of the origin (including a paper reference). The initial hypothesis is that the chemical compound disturbs gene expression. It is yet unknown which genes and what biological processes are affected. The conceptual workflow first performs one of the standard data preprocessing steps for the type of data Dennis has (Affymetrix gene expression array), then it uses a statistical test to filter those genes that are significantly differentially expressed between the two conditions, and finally it performs an enrichment test to find those pathways that are most prominent among the filtered genes. The latter requires an annotation process, where each gene is coupled to the pathways it was once implied in in other experiments (there is a database for that: KEGG).Dennis is new to workflows, so he wishes to start with an existing workflow. For each component he will search myExperiment for keywords. He then wishes to understand the workflows: look into them, perform test runs with test data and his own data, and see other peoples logs. When he finds workflows he does not understand, Dennis is inclined to create his own workflow with his own scripts. He will receive scripts from colleagues and perform tests that his colleagues are familiar with. As such, he can learn what his workflow is doing. This will help him interpret his results.Ultimately, the workflow may suggest for instance that the set of differentially expressed genes has the Wnt pathway as most common denominator. This pathway is well known for embryogenesis and cancer, information he finds on the internet. He makes a note of that. It will lead to the hypothesis that the chemical compound, may have effects on embryogenesis and/or cancer. This is now his interpretation of his experiment that he wishes to link to his experiment and the processed data. Dennis notes that in a next cycle he will want to perform another workflow that specifically tests this hypothesis, rather that perform an enrichment test. He will then look for a workflow that performs a 'global test', and replace this part in his workflow with the global test workflow. In his log he indicates this fact. In this case he will link the result of this test (most likely a new hypothesis) to the previous experiment and in particular to the initial hypothesis. At some point, he wishes to be able to retrieve this past information and the interrelationships among his hypotheses.Assuming his finding and new hypothesis are valuable and new, he will publish his results. The publication has cleaned information, sufficient for evaluating his hypothesis and rerunning the one workflow and the one dataset that lead to this result.Dennis Working Research Object will containA reference to the source of the data and the people to acknowledge for it.The initial hypothesisThe conceptual workflow or a summary of the experiment planReferences to workflows that were tested, with comments on their application for Dennis caseA reference to the workflow(s) that Dennis eventually uses, including acknowledgement information (including a note on how these people want to be acknowledged)Dennis his workflow, possibly with a backlog of previous versions that Dennis wishes to keep for reference (with notes and comments)Dennis his workflow run, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. 'here I used parameter A, next time I may try B')The final hypothesis, with comments.A reference to the results of the workflowA Design log that records Dennis considerations while making the workflowA Run log that records Dennis considerations while running and interpreting the workflowHis Publication Research Object will containThe workflowA caption for his workflow (filtered from his design and run log, all information necessary to run the experiment by a reviewer)A workflow run (results, and a caption filtered from run log)His initial hypothesisHis final hypothesisThe data sourceAcknowledgementsIn time, Dennis' workflow can be found on the basis of his Published and Working RO's metadata. This will create a rich and wide range of search capabilities for Dennis' successors.The Working RO is kept at Dennis local group, and is the most valuable resource for reusing the work. The Published RO is available for download and reuse. It is anticipated that interested parties will contact Dennis or his group for 'reuse in collaboration' (i.e. for the group's expertise).
  • #11: Emphasise the use of Linked Data. Note: the figures here are not intended to be readable. They’re simply emphasising the existence of the models. Example user requirements being addressed by RO:UR1.3 aggregate existing resources to conveniently access related resources from a single placeUR1.6 describe the relationships between aggregated resources so that other researchers can see how the resources fit togetherUR1.16 annotate experimental results using semantic models so that I can find/show links to other, relevant research objects