Data accessibility and the 
role of informatics in 
predicting the biosphere 
Alex Hardisty 
Director of Informatics Projects, 
School of Computer Science & Informatics 
Coordinator, FP7 BioVeL project www.biovel.eu 
email: hardistyar@cardiff.ac.uk 
/alexhardisty (occasionally!) 
1
Structuring the biodiversity informatics community at the European level and beyond 
Biodiversity Informatics Horizons 2013 
180 experts conclude that there is 
“a growing need for predictive biosphere modelling” 
• Integration: Make better use of what we have 
• Cooperation: Data from the whole world is needed 
• Promotion: Europe is well placed to offer leadership 
2
What if …? 
Imagine if we could … 
… Predict community level dynamics of 
ecosystems (i.e., behaviours) at scales 
from local to global, based on the 
ecology and biology of all individual 
organisms … 
e.g., Ecosystems: Time to model all life on Earth. Purves et al., 
Nature 493 (2013) 
Image: StuartMiles / FreeDigitalPh3otos.net
Imagine if we could … 
… Measure and calculate “Essential Biodiversity Variables” … 
… for any geographic area (continental, regional, local), by any person 
anywhere, using data for that area that may be held by any (research) 
infrastructure. Not only that, but also learn how to forecast EBVs 4
Depend on collaboration to deliver the evidence, i.e., based 
on synthesis and modelling of 
• Increasingly large amounts of data from multiple sources 
(environmental, taxonomic, genomic and ecological) 
• Gathered by manual observation and automated sensors, 
digitisation, nextgen sequencing and remote sensing 
Beyond the abilities of any one individual or any single 
research community to collect, observe or generate. 
Variety, Velocity and Volume of “Big Data” 
5 
Photo: Smokestacks against skyline and sunset, Estonia. © Curt Carnemark / World Bank Photo Collection
From informatics perspective, how close are we to that? 
Topical coverage 
100% 
Data sharing and QC 
100% 
0% 
Data types 
Data source tracking 
Data citation tracking 
Data integration 
User applications & 
interfaces 
Funding 
Access policy 
Technology 
GIS 
Standards 
Data 
9 research infrastructures from 
around the world exhibit “a 
satisfactory level of potential 
interoperability” 
Software architecture 
100% 
0% 
Programming 
languages 
Authentication 
Authorization 
Middleware 
Computing 
infrastructure 
Standards 
Technology 
Service logic 
0% 
Geographical 
coverage 
Infrastructure 
topology 
Native 
interoperability and 
enablers 
Merging of science & 
policy needs 
Merging of science & 
industry needs 
Engagement of 
citizens 
Licensing and 
business model 
General 
6
A computational challenge: Greater than that of weather 
forecasting; greater than that of climate prediction? 
Image from climateprediction.net 
HarfootMBJ, Newbold T, Tittensor DP, Emmott S, et al. (2014) Emergent Global 
Patterns of Ecosystem Structure and Function from a Mechanistic General 
Ecosystem Model. PLoS Biol 12(4): e1001841. doi:10.1371/journal.pbio.1001841 
For 1km resolution, “… 3 
to 6 orders of magnitude 
larger, … an exascale 
problem” 
Jack K. Horner 
Independent consultant & 
7 
Adviser to KU Biodiversity Institute
The situation today can be 
likened to meteorology in 
1950’s, 60’s and 70’s (and 
later in climatology) when 
the emergence of numerical 
weather prediction drove 
demand for: 
• New observations 
• The emergence of a global 
infrastructure for acquiring, 
mobilising and normalising 
data, and 
• Better models of global 
atmospheric behaviour 
8
Accessible data is useful data, not just for research 
Global policies/reports 
Regional 
policies/reports 
National 
policies/reports 
Data and information 
Direct provision of data/information 
Indirect provision through reports 
Assessment processes 
Green accounting etc 
9 
Diagram courtesy of EC FP7 EU BON project
To be able to predict the biosphere we need to 
mobilise data and make it accessible 
10
It’s a journey towards 
• Global data, covering the whole planet. There are 
significant gaps everywhere today 
• Making all our small-scale, local data – which often 
characterises the current day practice of field 
ecology – global 
That is to say, we have to mobilise, clean, normalise 
and quality assure many small sets of data that 
together can give us the global data we need to 
calibrate models 
We are achieving that for certain classes of data but 
it is not without its difficulties 
11
Issues arise in each of the 4 stages 
of mobilising data for synthesis 
• Data acquisition 
– Standardised measurement protocols 
• Data curation 
– Assigning right metadata and persistent identifiers 
– Finding a home for the data – and putting it there 
• Data discovery and access 
– Finding relevant data 
– Machine readable access to data i.e., WS front-end 
• Data processing / analysis, including re-use 
– Owners want attribution 
– Tracking provenance and follow licensing conditions 
– Problems at every step, on every workflow run 
http://envri.eu/rm 12
See also: 
“Showing you this 
map of aggregated 
bullfrog occurrences 
would be illegal” 
http://peterdesmet.com 
/posts/illegal-bullfrogs. 
html 
“Our analysis of the licenses of all 11.000+ GBIF registered datasets shows a 
bleak picture. Very few GBIF registered datasets can be easily and legally 
used, let alone without restrictions. This is mainly due to data being 
published with no or a non-standard license.” 
13 
Peter Desmet and Bart Aelterman, 22nd Nov 2013, peterdesmet.com
See also: 
“Showing you this 
map of aggregated 
bullfrog occurrences 
would be illegal” 
http://peterdesmet.com 
/posts/illegal-bullfrogs. 
html 
“Our analysis of the licenses of all 11.000+ GBIF registered datasets shows a 
bleak picture. Very few GBIF registered datasets can be easily and legally 
used, let alone without restrictions. This is mainly due to data being 
published with no or a non-standard license.” 
14 
Peter Desmet and Bart Aelterman, 22nd Nov 2013, peterdesmet.com
Data re-use: Owners want attribution 
Example 1) Taxonomic data refinement Workflow 
BioSTIF 
CoL 3 levels of attribution 
• complete work 
• contributing database of the record 
• expert who provides taxonomic 
scrutiny of the individual record. 
Tool 
license (s) 
GBIF data use agreement 
• Respect restrictions of access to sensitive data. 
• Identifier of ownership of data must be retained with every data record (through the workflow) 
• Publicly acknowledge the Data Publishers whose biodiversity data they have used. 
15 
• Any additional terms and conditions of use set by the Data Publisher.
More problems at every step, on every run 
Example 2) Niche Modelling Workflow 
Create model 
Model test 
Model projection 
High quality occurrence data 
set 
Select algorithm 
Select parameter values for 
the chosen algorithm 
Assemble the model on 
openModeller service 
Test the performance of the 
parameter in the model 
Test performance of the 
distribution prediction on the 
model 
Project Model with prediction 
layers 
Changing algorithm, parameter 
values, and set of layers 
Project Model with original 
layers 
Visualize and publish results 
Select layers with environmental 
factors that are likely to influence the 
distribution of the species 
Select prediction layers 
• License on algorithm 
• License on software 
Licenses on 
environmental data layers 
• Permissions to use 
• AuthN/AuthZ 
Moving data from one 
service to another 
• 3rd party software 
• All issues associated 
with publication 
16
In a recent EU BON study 
Only 35% of surveyed datasets 
(wider scope than just GBIF) are 
accessible under an open license or 
waiver, without restriction on use 
For 29 scientific questions relating to 
needs of European environmental 
policy, the availability of datasets to 
answer the questions is in the range 
‘satisfactory’ (3) to ‘poor’ (2) 
17
Multiple initiatives to make data more accessible; 
some are general purpose 
https://rd-alliance.org/ 
… builds the social and technical bridges that enable open sharing of data … 
researchers and innovators openly sharing data across technologies, disciplines, 
and countries to address the grand challenges of society. 
http://www.datafairport.org/ 
… successful community supported conventions, policies and practices for data 
identifiers, formats, checklists and vocabularies that enable data interoperability, 
citation and stewardship. 
ORCID and DataCite initiatives to uniquely identify (respectively) scientists and data sets 18
Some are more domain specific 
Promoting free and open access 
to biodiversity information 
A framework to focus 
effort and investment 
to deliver biodiversity 
knowledge more 
effectively 
www.biodiversityinformatics.org/ 
www.bouchout-declaration.org 19
A shared and maintained multi-purpose network of 
computationally-based processing services in an open 
data domain 
Image: CoolDesign / FreeDigitalPh2o0tos.net 
With 78 contributors, we 
published the whitepaper, 
April 2013 - since viewed 
more than 34,000 times.
Building a heterogeneous Service Network 
21 
Users’ workflows and 
applications 
Sustained Service and 
Data Providers 
GBIF, CoL, OBIS, WoRMS, 
EMBL-EBI, BGBM, CRIA, EoL, 
BHL, ALA, LTER, etc. & more. 
www.biodiversitycatalogue.org 
Recognised and stable 
Infrastructure Providers 
National, EGI.eu, PRACE, 
commercial, EUDAT, etc.
Preparing the next, coordinated steps 
22 
Diagram from LinkD Concept Note, September 2014
LinkD 
Develop the highly responsive digital framework required to enable high 
throughput research and support science of scale towards the long term vision of 
modelling Life on Earth 
LinkD 
Science of Scale 
for 
L i fe on Ear th 
What we want to do in LinkD? 
ELODINS ENVRI+ 
From slides by Vince Smith, LinkD proposal coordinator, Natural History Musuem, London
Take home message: “It’s a journey” 
• Accessible data is the enabler of “in-silico” science 
that leads towards predicting the biosphere 
• A shared multi-purpose network of processing 
services, sitting on top of open data is the route to 
interoperability 
•Working together as a community is essential 
24 
Photo: A lone farmer walks among rice paddies. © DFATD-MAECD/Tick Collins

More Related Content

PDF
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
PPTX
The African Open Science Platform/Susan Veldsman
PPT
Open Data in a Global Ecosystem
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PPT
BD2K Update
PPTX
Data as a research output and a research asset: the case for Open Science/Sim...
PPT
Big Data in Biomedicine – An NIH Perspective
PPTX
Creating a Data Management Plan for your Grant Application
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
The African Open Science Platform/Susan Veldsman
Open Data in a Global Ecosystem
The Commons: Leveraging the Power of the Cloud for Big Data
BD2K Update
Data as a research output and a research asset: the case for Open Science/Sim...
Big Data in Biomedicine – An NIH Perspective
Creating a Data Management Plan for your Grant Application

What's hot (20)

PPTX
Big Data as a Catalyst for Collaboration & Innovation
PPT
There is No Intelligent Life Down Here
PDF
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
PPT
Aaas Data Intensive Science And Grid
PPTX
Highlights from NIH Data Science
PPTX
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
PPTX
What is eScience, and where does it go from here?
PPTX
SWOT Analysis - What Does it Tell Us?
PPTX
Australia's Environmental Predictive Capability
PDF
EGI Engage: Impact & Results
PPTX
Bioinformatics in the Era of Open Science and Big Data
PDF
PDF
Cri big data
PPTX
Towards the Digital Research Enterprise
PPTX
UK e-Infrastructure for Research - UK/USA HPC Workshop, Oxford, July 2015
PPTX
Implications of the Fourth Paradigm
PDF
I o dav data workshop prof wafula final 19.9.17
PPTX
Data sharing for development: a case of Infrastructural development in Uganda...
PPT
David Park APAN Slid..
PPT
The NIH as a Digital Enterprise: Implications for PAG
Big Data as a Catalyst for Collaboration & Innovation
There is No Intelligent Life Down Here
Accelerating Science, Technology and Innovation Through Open Data and Open Sc...
Aaas Data Intensive Science And Grid
Highlights from NIH Data Science
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
What is eScience, and where does it go from here?
SWOT Analysis - What Does it Tell Us?
Australia's Environmental Predictive Capability
EGI Engage: Impact & Results
Bioinformatics in the Era of Open Science and Big Data
Cri big data
Towards the Digital Research Enterprise
UK e-Infrastructure for Research - UK/USA HPC Workshop, Oxford, July 2015
Implications of the Fourth Paradigm
I o dav data workshop prof wafula final 19.9.17
Data sharing for development: a case of Infrastructural development in Uganda...
David Park APAN Slid..
The NIH as a Digital Enterprise: Implications for PAG
Ad

Similar to Data accessibility and the role of informatics in predicting the biosphere (20)

PDF
A Data Biosphere for Biomedical Research
PPTX
NIST Big Data Public Working Group NBD-PWG
PDF
Open Science - Global Perspectives/Simon Hodson
PPTX
Open Science Globally: Some Developments/Dr Simon Hodson
PPT
Open Data in a GIS-perspective - Dr. Joep Crompvoets
PDF
Dealing with Semantic Heterogeneity in Real-Time Information
PPT
big_data_casestudies_2.ppt
PPT
SemWeb 4 Gov – opportunities and challenges
PDF
Interoperability academy 2024 - Day 1 - Digital Governance in Public Sector_J...
PPTX
Rdaeu russia_fg_1_july2014_final
PPTX
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
PPTX
Turning FAIR data into reality
PPTX
Software Sustainability Institute
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PPT
Sinnott Paper
PPTX
A coordinated framework for open data open science in Botswana/Simon Hodson
PDF
Open Data is not Enough
PPTX
Critique and Reflections on Open Data Initiatives
PDF
Towards a big data roadmap for europe
PPTX
Enabling the physical world to the Internet and potential benefits for agricu...
A Data Biosphere for Biomedical Research
NIST Big Data Public Working Group NBD-PWG
Open Science - Global Perspectives/Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
Open Data in a GIS-perspective - Dr. Joep Crompvoets
Dealing with Semantic Heterogeneity in Real-Time Information
big_data_casestudies_2.ppt
SemWeb 4 Gov – opportunities and challenges
Interoperability academy 2024 - Day 1 - Digital Governance in Public Sector_J...
Rdaeu russia_fg_1_july2014_final
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Turning FAIR data into reality
Software Sustainability Institute
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Sinnott Paper
A coordinated framework for open data open science in Botswana/Simon Hodson
Open Data is not Enough
Critique and Reflections on Open Data Initiatives
Towards a big data roadmap for europe
Enabling the physical world to the Internet and potential benefits for agricu...
Ad

More from Alex Hardisty (16)

PPTX
openDS - A new standard for digital specimens
PDF
Global Research Infrastructures for Biodiversity and Ecosystems Research
PDF
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
PPTX
Constructing bottomup
PPTX
Mapping Research Infrastructures with the ENVRI Reference Model
PPTX
BioVeL at IBERGRID e-Infrastructures and biodiversity workshop, 19th Septembe...
PPTX
Biodiversity Informatics Horizons 2013 - Introduction and Scope
PDF
Hardistyroberts190313opt 130319072407-phpapp02
PPTX
10th e concertation-brussels-06march2013-v2
PPTX
Eudat user forum-london-11march2013-biovel-v3
PDF
Biodiversity Virtual e-Laboratory (BioVeL)
PDF
E cconcertation lyon-22-sep2011-v3
PDF
AH-XLDBEurope-position-09 jun2011
PDF
XldbEuropeEdinburgh-09-jun2011
PDF
TextofKeynote-EGIforum-15-Sep2010
PDF
EGIforum-Amsterdam-15-Sep2010
openDS - A new standard for digital specimens
Global Research Infrastructures for Biodiversity and Ecosystems Research
Approach and outcome of the Biodiversity Virtual e-Laboratory (BioVeL) project
Constructing bottomup
Mapping Research Infrastructures with the ENVRI Reference Model
BioVeL at IBERGRID e-Infrastructures and biodiversity workshop, 19th Septembe...
Biodiversity Informatics Horizons 2013 - Introduction and Scope
Hardistyroberts190313opt 130319072407-phpapp02
10th e concertation-brussels-06march2013-v2
Eudat user forum-london-11march2013-biovel-v3
Biodiversity Virtual e-Laboratory (BioVeL)
E cconcertation lyon-22-sep2011-v3
AH-XLDBEurope-position-09 jun2011
XldbEuropeEdinburgh-09-jun2011
TextofKeynote-EGIforum-15-Sep2010
EGIforum-Amsterdam-15-Sep2010

Recently uploaded (20)

PDF
nebosh hse certificate in health and safeto leadership
PPTX
IMO 2020 - FUELS AND LUBES UPDATE -cs.pptx
DOCX
Aluminum Dome Roofs for Fire Water Tanks Protects the Tank and Maintains Wate...
PDF
rainfed swc for nature and agroforestrys
PPTX
Relation Between Forest Growth and Stand Density.pptx
PPTX
APR 05.05.25.pptx gffdtkdtxfxtdytdtdcfcfxr
PPTX
Drought management class in a simplified manner
DOCX
Aluminum Dome Roofs for Harvested Rainwater Tanks Provides a Durable, Sealed ...
PPTX
Water Pollution - save water save earth .pptx
PPTX
SCADAhjknvbxfbgmmmmmmmmmmmmmmmmmmmmmmm.pptx
DOCX
Aluminum Dome Roofs for Silo Tanks Provides a Weatherproof Cover for Bulk Mat...
PDF
Ph. D. progress seminar report- Hritankhi Tripathy.pdf
PDF
climate change , causes , effects and mitigation pdf
PPT
vdocument.in_site-planning-by-kevin-lynch.ppt
PPT
Soil Bioremediation Detailed presentation
PDF
Ciba Foundation Symposium - Cell Differentiation -- de Reuck, A_ V_ S_ (edito...
PPTX
Psychological Support for Elderly During Health Crises.pptx
PPTX
Rainwater Harvesting Methods and Techniques for Sustainable Water Management”
PPTX
Science and Society 011111111111111111111
DOCX
Aluminum Dome Roofs for Agricultural Waste Tanks Offers a Durable Cover for O...
nebosh hse certificate in health and safeto leadership
IMO 2020 - FUELS AND LUBES UPDATE -cs.pptx
Aluminum Dome Roofs for Fire Water Tanks Protects the Tank and Maintains Wate...
rainfed swc for nature and agroforestrys
Relation Between Forest Growth and Stand Density.pptx
APR 05.05.25.pptx gffdtkdtxfxtdytdtdcfcfxr
Drought management class in a simplified manner
Aluminum Dome Roofs for Harvested Rainwater Tanks Provides a Durable, Sealed ...
Water Pollution - save water save earth .pptx
SCADAhjknvbxfbgmmmmmmmmmmmmmmmmmmmmmmm.pptx
Aluminum Dome Roofs for Silo Tanks Provides a Weatherproof Cover for Bulk Mat...
Ph. D. progress seminar report- Hritankhi Tripathy.pdf
climate change , causes , effects and mitigation pdf
vdocument.in_site-planning-by-kevin-lynch.ppt
Soil Bioremediation Detailed presentation
Ciba Foundation Symposium - Cell Differentiation -- de Reuck, A_ V_ S_ (edito...
Psychological Support for Elderly During Health Crises.pptx
Rainwater Harvesting Methods and Techniques for Sustainable Water Management”
Science and Society 011111111111111111111
Aluminum Dome Roofs for Agricultural Waste Tanks Offers a Durable Cover for O...

Data accessibility and the role of informatics in predicting the biosphere

  • 1. Data accessibility and the role of informatics in predicting the biosphere Alex Hardisty Director of Informatics Projects, School of Computer Science & Informatics Coordinator, FP7 BioVeL project www.biovel.eu email: [email protected] /alexhardisty (occasionally!) 1
  • 2. Structuring the biodiversity informatics community at the European level and beyond Biodiversity Informatics Horizons 2013 180 experts conclude that there is “a growing need for predictive biosphere modelling” • Integration: Make better use of what we have • Cooperation: Data from the whole world is needed • Promotion: Europe is well placed to offer leadership 2
  • 3. What if …? Imagine if we could … … Predict community level dynamics of ecosystems (i.e., behaviours) at scales from local to global, based on the ecology and biology of all individual organisms … e.g., Ecosystems: Time to model all life on Earth. Purves et al., Nature 493 (2013) Image: StuartMiles / FreeDigitalPh3otos.net
  • 4. Imagine if we could … … Measure and calculate “Essential Biodiversity Variables” … … for any geographic area (continental, regional, local), by any person anywhere, using data for that area that may be held by any (research) infrastructure. Not only that, but also learn how to forecast EBVs 4
  • 5. Depend on collaboration to deliver the evidence, i.e., based on synthesis and modelling of • Increasingly large amounts of data from multiple sources (environmental, taxonomic, genomic and ecological) • Gathered by manual observation and automated sensors, digitisation, nextgen sequencing and remote sensing Beyond the abilities of any one individual or any single research community to collect, observe or generate. Variety, Velocity and Volume of “Big Data” 5 Photo: Smokestacks against skyline and sunset, Estonia. © Curt Carnemark / World Bank Photo Collection
  • 6. From informatics perspective, how close are we to that? Topical coverage 100% Data sharing and QC 100% 0% Data types Data source tracking Data citation tracking Data integration User applications & interfaces Funding Access policy Technology GIS Standards Data 9 research infrastructures from around the world exhibit “a satisfactory level of potential interoperability” Software architecture 100% 0% Programming languages Authentication Authorization Middleware Computing infrastructure Standards Technology Service logic 0% Geographical coverage Infrastructure topology Native interoperability and enablers Merging of science & policy needs Merging of science & industry needs Engagement of citizens Licensing and business model General 6
  • 7. A computational challenge: Greater than that of weather forecasting; greater than that of climate prediction? Image from climateprediction.net HarfootMBJ, Newbold T, Tittensor DP, Emmott S, et al. (2014) Emergent Global Patterns of Ecosystem Structure and Function from a Mechanistic General Ecosystem Model. PLoS Biol 12(4): e1001841. doi:10.1371/journal.pbio.1001841 For 1km resolution, “… 3 to 6 orders of magnitude larger, … an exascale problem” Jack K. Horner Independent consultant & 7 Adviser to KU Biodiversity Institute
  • 8. The situation today can be likened to meteorology in 1950’s, 60’s and 70’s (and later in climatology) when the emergence of numerical weather prediction drove demand for: • New observations • The emergence of a global infrastructure for acquiring, mobilising and normalising data, and • Better models of global atmospheric behaviour 8
  • 9. Accessible data is useful data, not just for research Global policies/reports Regional policies/reports National policies/reports Data and information Direct provision of data/information Indirect provision through reports Assessment processes Green accounting etc 9 Diagram courtesy of EC FP7 EU BON project
  • 10. To be able to predict the biosphere we need to mobilise data and make it accessible 10
  • 11. It’s a journey towards • Global data, covering the whole planet. There are significant gaps everywhere today • Making all our small-scale, local data – which often characterises the current day practice of field ecology – global That is to say, we have to mobilise, clean, normalise and quality assure many small sets of data that together can give us the global data we need to calibrate models We are achieving that for certain classes of data but it is not without its difficulties 11
  • 12. Issues arise in each of the 4 stages of mobilising data for synthesis • Data acquisition – Standardised measurement protocols • Data curation – Assigning right metadata and persistent identifiers – Finding a home for the data – and putting it there • Data discovery and access – Finding relevant data – Machine readable access to data i.e., WS front-end • Data processing / analysis, including re-use – Owners want attribution – Tracking provenance and follow licensing conditions – Problems at every step, on every workflow run http://envri.eu/rm 12
  • 13. See also: “Showing you this map of aggregated bullfrog occurrences would be illegal” http://peterdesmet.com /posts/illegal-bullfrogs. html “Our analysis of the licenses of all 11.000+ GBIF registered datasets shows a bleak picture. Very few GBIF registered datasets can be easily and legally used, let alone without restrictions. This is mainly due to data being published with no or a non-standard license.” 13 Peter Desmet and Bart Aelterman, 22nd Nov 2013, peterdesmet.com
  • 14. See also: “Showing you this map of aggregated bullfrog occurrences would be illegal” http://peterdesmet.com /posts/illegal-bullfrogs. html “Our analysis of the licenses of all 11.000+ GBIF registered datasets shows a bleak picture. Very few GBIF registered datasets can be easily and legally used, let alone without restrictions. This is mainly due to data being published with no or a non-standard license.” 14 Peter Desmet and Bart Aelterman, 22nd Nov 2013, peterdesmet.com
  • 15. Data re-use: Owners want attribution Example 1) Taxonomic data refinement Workflow BioSTIF CoL 3 levels of attribution • complete work • contributing database of the record • expert who provides taxonomic scrutiny of the individual record. Tool license (s) GBIF data use agreement • Respect restrictions of access to sensitive data. • Identifier of ownership of data must be retained with every data record (through the workflow) • Publicly acknowledge the Data Publishers whose biodiversity data they have used. 15 • Any additional terms and conditions of use set by the Data Publisher.
  • 16. More problems at every step, on every run Example 2) Niche Modelling Workflow Create model Model test Model projection High quality occurrence data set Select algorithm Select parameter values for the chosen algorithm Assemble the model on openModeller service Test the performance of the parameter in the model Test performance of the distribution prediction on the model Project Model with prediction layers Changing algorithm, parameter values, and set of layers Project Model with original layers Visualize and publish results Select layers with environmental factors that are likely to influence the distribution of the species Select prediction layers • License on algorithm • License on software Licenses on environmental data layers • Permissions to use • AuthN/AuthZ Moving data from one service to another • 3rd party software • All issues associated with publication 16
  • 17. In a recent EU BON study Only 35% of surveyed datasets (wider scope than just GBIF) are accessible under an open license or waiver, without restriction on use For 29 scientific questions relating to needs of European environmental policy, the availability of datasets to answer the questions is in the range ‘satisfactory’ (3) to ‘poor’ (2) 17
  • 18. Multiple initiatives to make data more accessible; some are general purpose https://rd-alliance.org/ … builds the social and technical bridges that enable open sharing of data … researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society. http://www.datafairport.org/ … successful community supported conventions, policies and practices for data identifiers, formats, checklists and vocabularies that enable data interoperability, citation and stewardship. ORCID and DataCite initiatives to uniquely identify (respectively) scientists and data sets 18
  • 19. Some are more domain specific Promoting free and open access to biodiversity information A framework to focus effort and investment to deliver biodiversity knowledge more effectively www.biodiversityinformatics.org/ www.bouchout-declaration.org 19
  • 20. A shared and maintained multi-purpose network of computationally-based processing services in an open data domain Image: CoolDesign / FreeDigitalPh2o0tos.net With 78 contributors, we published the whitepaper, April 2013 - since viewed more than 34,000 times.
  • 21. Building a heterogeneous Service Network 21 Users’ workflows and applications Sustained Service and Data Providers GBIF, CoL, OBIS, WoRMS, EMBL-EBI, BGBM, CRIA, EoL, BHL, ALA, LTER, etc. & more. www.biodiversitycatalogue.org Recognised and stable Infrastructure Providers National, EGI.eu, PRACE, commercial, EUDAT, etc.
  • 22. Preparing the next, coordinated steps 22 Diagram from LinkD Concept Note, September 2014
  • 23. LinkD Develop the highly responsive digital framework required to enable high throughput research and support science of scale towards the long term vision of modelling Life on Earth LinkD Science of Scale for L i fe on Ear th What we want to do in LinkD? ELODINS ENVRI+ From slides by Vince Smith, LinkD proposal coordinator, Natural History Musuem, London
  • 24. Take home message: “It’s a journey” • Accessible data is the enabler of “in-silico” science that leads towards predicting the biosphere • A shared multi-purpose network of processing services, sitting on top of open data is the route to interoperability •Working together as a community is essential 24 Photo: A lone farmer walks among rice paddies. © DFATD-MAECD/Tick Collins

Editor's Notes

  • #16: (s)
  • #24: Inspired by roadmap publications such as GBIO and the White paper. Mandated by European and global societal challenges. Supported by the maturity of the available foundational e-Infrastructures. Science of Scale: To maximize the efficiency of the available data, services and tools. This is what the commission calls science 2.0. In short is using economies of scale in data collection and associated infrastructure to do big things.