FAIR play?
Investigating the state of FAIR practice and
what is needed to turn FAIR data into reality
Sarah Jones
Digital Curation Centre
Rapporteur of EC Expert Group on FAIR data
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Presentation reflects the views of the author and group only
Turning FAIR into Reality: Report and Action Plan
https://doi.org/10.2777/1524
Report and Action Plan: Take a holistic approach to lay out what needs
to be done to make FAIR a reality, in general and for EOSC
Addresses the following key areas:
1. Concepts for FAIR
2. Creating a FAIR culture
3. Creating a technical ecosystem for FAIR
4. Skills and capacity building
5. Incentives and metrics
6. Investment and sustainability
Recommendations and Actions: 27 clear recommendations, structured
by these topics, are supported by precise actions for stakeholders.
Report is out!FAIR Expert Group
Consultations: https://github.com/FAIR-Data-EG
EG members
Global landscape of FAIR
Practices across disciplines and geographic boundaries
DOBES case study
The DOBES initiative (http://dobes.mpi.nl) was established in 2000 to document critically endangered languages.
Like FAIR, the DOBES principles address core
requirements necessary to support identification,
discovery and reuse of digital objects. In addition,
they stress the importance of digital preservation, an
aspect that could usefully be added to FAIR.
A number of principles were agreed in the first 2 years:
• Persistent identifiers should be assigned
• All objects should be accompanied by metadata
• Metadata standards should be used
• A structured catalogue should be provided
• All metadata should be public and available for
harvesting via the OAI-PMH protocol
• Data should be open by default, but available under
restrictions where necessary
• A limited set of archival data formats should be used
• Multiple copies should be maintained, ideally via
Trusted Digital Repositories
IVOA case study
Astronomy has been a pioneer of open data sharing, and remains at the forefront. Jointly using data from
different instruments or gathered at different times is at the core of the discipline’s science process.
The discipline established the International Virtual
Observatory Alliance (IVOA http://www.ivoa.net)
in 2002 to develop its interoperability framework
at the international level. It is fully operational and
continuously updated to deal with evolving
requirements.
It progressively developed the standards necessary
to Find, Access and Interoperate data, which have
been taken up by archives of space and ground-
based telescopes and major disciplinary data
centers.
The first step was the definition of a standard for
observational data called Flexible Image Transport
System (FITS) in 1979. This includes data and
metadata, allowing data Reuse.
The VO is an interoperability layer to be
implemented by data providers on top of their
data holdings. It is a global, open and inclusive
framework: anyone can “publish” a data resource
in the VO, and anyone can develop and share a
VO-enabled tool to access and process data found
in the VO.
Dutch use cases
• Six detailed use cases from
engineering, social science, climate
science, physics, health care
• Highlights differences in culture and
practice and need for domain
specific guidelines
• Tension between domain
approaches and interoperability
cross-domain
https://doi.org/10.5281/zenodo.1246815
Physics
• 70-year long tradition of
international research with
inherent data exchange.
• FAIR good at a high-level but
principles contain assumptions
about research methods used.
• Data analysis by machines alone
is not feasible for complex
physics datasets.
Climate science
• FAIR used implicitly. Sharing data
is ingrained as climate is
international.
• Established global standards and
exchange platform of data
centres.
• Transition from GRIB to netCDF
to exchange more with other
communities.
FAIR repositories
• Repository compliance is not high
– 38% not compliant in terms of Findability
– 52% not compliant in terms of Interoperability
– 46% not compliant in terms of Reusability
• 49% of repositories did not assign a DOI,
Handle or URN
• None of the repositories had visible
ontologies or controlled vocabularies
• Social science and climate science
repositories fared worse
* Based on study of 37 Dutch repositories
https://doi.org/10.2218/ijdc.v12i2.567
• Almost all provide open access to
metadata, but majority (70%) do not
provide open access to data
• 60% of Nordic repositories did not
assign a persistent identifier
• 56% do not employ metadata
standards
• 80% of the repositories are not
certified
* Based on study of 61 Nordic repositories in The State
of Open Science in the Nordic Countries NEIC report
AGU FAIR project
• Project convened by the American Geophysical Union
• Develop standards to connect researchers, publishers and repositories
• Builds on the Coalition on Publishing Data in the Earth and Space Sciences
(COPDESS) Statement
• 50+ organisational signatories so far
http://www.copdess.org/enabling-fair-data-project
And more….
FAIR work in Australia
• ARDC programme on
FAIR in disciplines
• Projects funded to make
data more FAIR
https://www.ands.org.au/
working-with-data/fairdata
Many international funders
covering FAIR in policy or
considering alignment
• European Commission
• Health Research Board
Ireland
• Tri-agency in Canada
• ….
Concepts
Concepts for FAIR
CC-BY-NC-ND by Cyril Attias
https://www.flickr.com/photos/newyork/4990002971
Define FAIR
• FAIR should be applied broadly to all objects (including
metadata, identifiers, software and DMPs) that are essential
to the practice of research, and should inform metrics relating
directly to these objects.
• Research communities must define how the FAIR principles
and related concepts apply in their context.
Apply broadly
Additional concepts
Making FAIR a reality depends on additional concepts that are
implied by the principles, including:
• The timeliness of sharing
• Data appraisal and selection
• Long-term preservation and stewardship
• Assessability – to assess quality, accuracy, reliability
• Legal interoperability – licenses, automated
to make explicit
FAIR Digital Objects
• Digital objects can include data,
software, and other research
resources
• Universal use of PIDs
• Use of common formats
• Data accompanied by code
• Rich metadata
• Clear licensing
FAIR ecosystem
• Essential components of the
FAIR ecosystem
• Record all components in
registries
• Ideally automated workflows
between them
• Ecosystem should work for
humans and machines
FAIR and Open
• Concepts of FAIR and Open
should not be conflated.
Data can be FAIR or Open,
both or neither
• The greatest potential
reuse comes when data are
both FAIR and Open
• Align and harmonise FAIR
and Open data policy
Culture
Research culture
CC-BY-NC-ND by Nathan Reading
https://www.flickr.com/photos/nathanreading/6799589666
Major change in
● Some communities already share and use FAIR data, some are making
progress, some are reluctant
● FAIR data availability does change the way science is done
● Disciplines know their data best and have to lead FAIR implementation
● Interdisciplinary work should be enabled to tackle Grand Challenges
● Incentives and rewards are fundamental to enable the change
research practice
Interoperability frameworks
● Support research communities to develop and maintain their
interoperability frameworks for FAIR sharing
● Engage in international collaboration fora to do this
● Exchange of good practices, define case studies and success stories
● Common standards to support disciplinary frameworks and promote
interoperability and reuse across disciplines
Data management via DMPs
A core element of research projects
● DMPs should cover all research outputs
● DMPs should be living documents
● DMPs should be tailored to disciplinary needs
● DMPs should be machine-actionable – use information in them!
● Harmonisation of DMP requirements across funders and organisations
DMP acting as a hub of information on
FAIR digital objects, connecting to the
wider elements of the ecosystem
Recognition Rewards
● Recognise the diversity of research contributions and encourage a
culture change to include these in CVs, applications and activity reports
● Give credit to all roles related to data management and sharing
● Evidence of past FAIR practice should be included in assessments of
research contribution
● Contribution to development and operation of certified and trusted
infrastructures that support FAIR data should be recognized, rewarded
and incentivised
Technology
Technology CC-BY-NC by Omar Barcena
https://www.flickr.com/photos/omaromar/8428266717
FAIR ecosystem
• Need to clearly define infrastructure
components essential in specific
contexts and fields
• Ecosystem and its components should
work for humans and machines
• Testbeds need to be used to evaluate,
evolve, innovate the ecosystem
Semantic
• Semantic technologies are essential for interoperability and need to be
developed, expanded and applied both within and across disciplines.
• Automated processing should be supported and facilitated by FAIR
components. This means that machines should be able to interact with
each other through the system, as well as with other components of the
system, at multiple levels and across disciplines.
Automated
Trusted Digital
• Data services must be encouraged and supported
to obtain certification, as frameworks to assess
FAIR services emerge.
• Existing community-endorsed methods to assess
data services, in particular CoreTrustSeal (CTS) for
trusted digital repositories, should be used as a
starting point.
Repositories
Culture
While there is much existing infrastructure to
build on, the further development and extension
of FAIR components is required.
These tools and services should fulfil the needs of
data producers and users, and be easy to adopt.
Technology
Image CC-BY by Nicolas Raymond
https://www.flickr.com/photos/80497449@N04/8691983876
Implementation
CC-BY by zzpza
www.flickr.com/photos/zzpza/3269784239
Key drivers
Incentives
Metrics
Skills
Investment
Cultural and
social aspects
that drive the
ecosystem and
enact change
Skills
• Two cohorts of professionals to support FAIR data:
- data scientists embedded in research projects
- data stewards who will ensure the curation of FAIR data
• Coordinate, systematise and accelerate the pedagogy
• Support formal and informal learning
• Ensure researchers have
foundational data skills
Create /
Analyse
Preserve
/ Share
FAIR metrics
• A set of metrics for FAIR Digital Objects should be developed and
implemented, starting from the basic common core of descriptive
metadata, PIDs and access.
• Build on existing work in this space – new RDA Working Group
• Certification schemes are needed to assess all components of the
ecosystem as FAIR services.
FAIR services
Many aspects of FAIR apply to services (findability, accessibility,
use of standards…) but you also want to check:
• Appropriate policy is in place
• Robustness of business processes
• Expertise of current staff
• Value proposition / business model
• Succession plans
• Trustworthiness
From metrics
• Use metrics to measure practice but beware misuse
• Generate genuine incentives – career progression for data sharing &
curation, recognise all outputs of research, include in recruitment
and project evaluation processes…
• Implement ‘next-generation’ metrics
• Automate reporting as far as possible
to incentives
Investment
• Provide strategic and coordinated funding to maintain the
components of the FAIR ecosystem
• Ensure funding is sustainable – no unfunded mandates
• Open EOSC to all providers, but ensure services are FAIR
FAIR Action Plan
A short tweetable recommendation
– Underpinned by several practical and specific action points
– Action points to be linked to stakeholders and timeframes
FAIR Action Plan is directed at the EC, Member States and international
level, but will also apply in context of EOSC to inform this roadmap
Recommendations
FAIR Action Plan
Context specific
• The Expert Group has developed an
overarching FAIR Action Plan
• Hope is that this will inspire the
definition of more detailed FAIR
Action Plans at research community
and Member State level
• What are the priority actions in your
area and for which stakeholders?
FAIR Action Plans
Thanks! Questions?
http://tinyurl.com/FAIR-EG https://doi.org/10.2777/1524

FAIR play?

  • 1.
    FAIR play? Investigating thestate of FAIR practice and what is needed to turn FAIR data into reality Sarah Jones Digital Curation Centre Rapporteur of EC Expert Group on FAIR data [email protected] Twitter: @sjDCC Presentation reflects the views of the author and group only
  • 2.
    Turning FAIR intoReality: Report and Action Plan https://doi.org/10.2777/1524 Report and Action Plan: Take a holistic approach to lay out what needs to be done to make FAIR a reality, in general and for EOSC Addresses the following key areas: 1. Concepts for FAIR 2. Creating a FAIR culture 3. Creating a technical ecosystem for FAIR 4. Skills and capacity building 5. Incentives and metrics 6. Investment and sustainability Recommendations and Actions: 27 clear recommendations, structured by these topics, are supported by precise actions for stakeholders. Report is out!FAIR Expert Group Consultations: https://github.com/FAIR-Data-EG
  • 3.
  • 4.
    Global landscape ofFAIR Practices across disciplines and geographic boundaries
  • 5.
    DOBES case study TheDOBES initiative (http://dobes.mpi.nl) was established in 2000 to document critically endangered languages. Like FAIR, the DOBES principles address core requirements necessary to support identification, discovery and reuse of digital objects. In addition, they stress the importance of digital preservation, an aspect that could usefully be added to FAIR. A number of principles were agreed in the first 2 years: • Persistent identifiers should be assigned • All objects should be accompanied by metadata • Metadata standards should be used • A structured catalogue should be provided • All metadata should be public and available for harvesting via the OAI-PMH protocol • Data should be open by default, but available under restrictions where necessary • A limited set of archival data formats should be used • Multiple copies should be maintained, ideally via Trusted Digital Repositories
  • 6.
    IVOA case study Astronomyhas been a pioneer of open data sharing, and remains at the forefront. Jointly using data from different instruments or gathered at different times is at the core of the discipline’s science process. The discipline established the International Virtual Observatory Alliance (IVOA http://www.ivoa.net) in 2002 to develop its interoperability framework at the international level. It is fully operational and continuously updated to deal with evolving requirements. It progressively developed the standards necessary to Find, Access and Interoperate data, which have been taken up by archives of space and ground- based telescopes and major disciplinary data centers. The first step was the definition of a standard for observational data called Flexible Image Transport System (FITS) in 1979. This includes data and metadata, allowing data Reuse. The VO is an interoperability layer to be implemented by data providers on top of their data holdings. It is a global, open and inclusive framework: anyone can “publish” a data resource in the VO, and anyone can develop and share a VO-enabled tool to access and process data found in the VO.
  • 7.
    Dutch use cases •Six detailed use cases from engineering, social science, climate science, physics, health care • Highlights differences in culture and practice and need for domain specific guidelines • Tension between domain approaches and interoperability cross-domain https://doi.org/10.5281/zenodo.1246815
  • 8.
    Physics • 70-year longtradition of international research with inherent data exchange. • FAIR good at a high-level but principles contain assumptions about research methods used. • Data analysis by machines alone is not feasible for complex physics datasets. Climate science • FAIR used implicitly. Sharing data is ingrained as climate is international. • Established global standards and exchange platform of data centres. • Transition from GRIB to netCDF to exchange more with other communities.
  • 9.
    FAIR repositories • Repositorycompliance is not high – 38% not compliant in terms of Findability – 52% not compliant in terms of Interoperability – 46% not compliant in terms of Reusability • 49% of repositories did not assign a DOI, Handle or URN • None of the repositories had visible ontologies or controlled vocabularies • Social science and climate science repositories fared worse * Based on study of 37 Dutch repositories https://doi.org/10.2218/ijdc.v12i2.567 • Almost all provide open access to metadata, but majority (70%) do not provide open access to data • 60% of Nordic repositories did not assign a persistent identifier • 56% do not employ metadata standards • 80% of the repositories are not certified * Based on study of 61 Nordic repositories in The State of Open Science in the Nordic Countries NEIC report
  • 10.
    AGU FAIR project •Project convened by the American Geophysical Union • Develop standards to connect researchers, publishers and repositories • Builds on the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS) Statement • 50+ organisational signatories so far http://www.copdess.org/enabling-fair-data-project
  • 11.
    And more…. FAIR workin Australia • ARDC programme on FAIR in disciplines • Projects funded to make data more FAIR https://www.ands.org.au/ working-with-data/fairdata Many international funders covering FAIR in policy or considering alignment • European Commission • Health Research Board Ireland • Tri-agency in Canada • ….
  • 12.
    Concepts Concepts for FAIR CC-BY-NC-NDby Cyril Attias https://www.flickr.com/photos/newyork/4990002971
  • 13.
    Define FAIR • FAIRshould be applied broadly to all objects (including metadata, identifiers, software and DMPs) that are essential to the practice of research, and should inform metrics relating directly to these objects. • Research communities must define how the FAIR principles and related concepts apply in their context. Apply broadly
  • 14.
    Additional concepts Making FAIRa reality depends on additional concepts that are implied by the principles, including: • The timeliness of sharing • Data appraisal and selection • Long-term preservation and stewardship • Assessability – to assess quality, accuracy, reliability • Legal interoperability – licenses, automated to make explicit
  • 15.
    FAIR Digital Objects •Digital objects can include data, software, and other research resources • Universal use of PIDs • Use of common formats • Data accompanied by code • Rich metadata • Clear licensing
  • 16.
    FAIR ecosystem • Essentialcomponents of the FAIR ecosystem • Record all components in registries • Ideally automated workflows between them • Ecosystem should work for humans and machines
  • 17.
    FAIR and Open •Concepts of FAIR and Open should not be conflated. Data can be FAIR or Open, both or neither • The greatest potential reuse comes when data are both FAIR and Open • Align and harmonise FAIR and Open data policy
  • 18.
    Culture Research culture CC-BY-NC-ND byNathan Reading https://www.flickr.com/photos/nathanreading/6799589666
  • 19.
    Major change in ●Some communities already share and use FAIR data, some are making progress, some are reluctant ● FAIR data availability does change the way science is done ● Disciplines know their data best and have to lead FAIR implementation ● Interdisciplinary work should be enabled to tackle Grand Challenges ● Incentives and rewards are fundamental to enable the change research practice
  • 20.
    Interoperability frameworks ● Supportresearch communities to develop and maintain their interoperability frameworks for FAIR sharing ● Engage in international collaboration fora to do this ● Exchange of good practices, define case studies and success stories ● Common standards to support disciplinary frameworks and promote interoperability and reuse across disciplines
  • 21.
    Data management viaDMPs A core element of research projects ● DMPs should cover all research outputs ● DMPs should be living documents ● DMPs should be tailored to disciplinary needs ● DMPs should be machine-actionable – use information in them! ● Harmonisation of DMP requirements across funders and organisations DMP acting as a hub of information on FAIR digital objects, connecting to the wider elements of the ecosystem
  • 22.
    Recognition Rewards ● Recognisethe diversity of research contributions and encourage a culture change to include these in CVs, applications and activity reports ● Give credit to all roles related to data management and sharing ● Evidence of past FAIR practice should be included in assessments of research contribution ● Contribution to development and operation of certified and trusted infrastructures that support FAIR data should be recognized, rewarded and incentivised
  • 23.
    Technology Technology CC-BY-NC byOmar Barcena https://www.flickr.com/photos/omaromar/8428266717
  • 24.
    FAIR ecosystem • Needto clearly define infrastructure components essential in specific contexts and fields • Ecosystem and its components should work for humans and machines • Testbeds need to be used to evaluate, evolve, innovate the ecosystem
  • 25.
    Semantic • Semantic technologiesare essential for interoperability and need to be developed, expanded and applied both within and across disciplines. • Automated processing should be supported and facilitated by FAIR components. This means that machines should be able to interact with each other through the system, as well as with other components of the system, at multiple levels and across disciplines. Automated
  • 26.
    Trusted Digital • Dataservices must be encouraged and supported to obtain certification, as frameworks to assess FAIR services emerge. • Existing community-endorsed methods to assess data services, in particular CoreTrustSeal (CTS) for trusted digital repositories, should be used as a starting point. Repositories
  • 27.
    Culture While there ismuch existing infrastructure to build on, the further development and extension of FAIR components is required. These tools and services should fulfil the needs of data producers and users, and be easy to adopt. Technology Image CC-BY by Nicolas Raymond https://www.flickr.com/photos/80497449@N04/8691983876
  • 28.
  • 29.
    Key drivers Incentives Metrics Skills Investment Cultural and socialaspects that drive the ecosystem and enact change
  • 30.
    Skills • Two cohortsof professionals to support FAIR data: - data scientists embedded in research projects - data stewards who will ensure the curation of FAIR data • Coordinate, systematise and accelerate the pedagogy • Support formal and informal learning • Ensure researchers have foundational data skills Create / Analyse Preserve / Share
  • 31.
    FAIR metrics • Aset of metrics for FAIR Digital Objects should be developed and implemented, starting from the basic common core of descriptive metadata, PIDs and access. • Build on existing work in this space – new RDA Working Group • Certification schemes are needed to assess all components of the ecosystem as FAIR services.
  • 32.
    FAIR services Many aspectsof FAIR apply to services (findability, accessibility, use of standards…) but you also want to check: • Appropriate policy is in place • Robustness of business processes • Expertise of current staff • Value proposition / business model • Succession plans • Trustworthiness
  • 33.
    From metrics • Usemetrics to measure practice but beware misuse • Generate genuine incentives – career progression for data sharing & curation, recognise all outputs of research, include in recruitment and project evaluation processes… • Implement ‘next-generation’ metrics • Automate reporting as far as possible to incentives
  • 34.
    Investment • Provide strategicand coordinated funding to maintain the components of the FAIR ecosystem • Ensure funding is sustainable – no unfunded mandates • Open EOSC to all providers, but ensure services are FAIR
  • 35.
    FAIR Action Plan Ashort tweetable recommendation – Underpinned by several practical and specific action points – Action points to be linked to stakeholders and timeframes FAIR Action Plan is directed at the EC, Member States and international level, but will also apply in context of EOSC to inform this roadmap
  • 36.
  • 37.
  • 38.
    Context specific • TheExpert Group has developed an overarching FAIR Action Plan • Hope is that this will inspire the definition of more detailed FAIR Action Plans at research community and Member State level • What are the priority actions in your area and for which stakeholders? FAIR Action Plans
  • 39.