FAIR digital research assets:
beyond the acronym
Susanna-Assunta Sansone, PhD
@SusannaASansone
ORCiD 0000-0001-5306-5690
Consultant,
Founding Academic Editor
Associate Director,
Principal Investigator
Neuroinformatics,	Kuala	Lumpur,	20-21	August,	2017
• Available in a public repository
• Findable through some sort of search facility
• Retrievable in a standard format
• Self-described so that third parties can make sense of it
• Intended to outlive the experiment for which they were collected
To do better science, more efficiently
we need data that are…
A set of principles, for those
wishing to enhance
the value of their
data holdings
Wider adoption of the FAIR principles, by research
infrastructure programmes, e.g.
Defining FAIRness
Defining a framework for evaluating FAIRness
By the
fairmetrics.org
Working Group
NOTE:
The Principles are high-level; do not suggest any specific
technology, standard, or implementation-solution
Principles put emphasis on enhancing the ability of machines to automatically find
and use the data, in addition to supporting its reuse by individuals
Interoperability standards – the pillars of FAIR
The invisible machinery
• Identifiers and metadata to be implemented by technical
experts in tools, registries, catalogues, databases, services
• It is essential to make standards ‘invisible’ to lay users, who
often have little or no familiarity with them
http://nometadata.org/logo
Metadata standards – fundamentals
• Descriptors for a digital object that help to understand what
it is, where to find it, how to access it etc.
• The type of metadata depends also on the type of digital
object (e.g. software, dataset)
• The depth and breadth of metadata varies according to
their purpose
§ e.g. reproducibility requires richer metadata then citation
• Domain-level descriptors that are essential for interpretation,
verification and reproducibility of datasets
• The depth and breadth of descriptors vary according to the
domain, broadly covering the what, who, when, how and why
Metadata standards - datasets
• Domain-level descriptors that are essential for interpretation,
verification and reproducibility of datasets
• The depth and breadth of descriptors vary according to the
domain, broadly covering the what, who, when, how and why
allowing:
§ experimental components (e.g., design, conditions, parameters),
§ fundamental biological entities (e.g., samples, genes, cells),
§ complex concepts (such as bioprocesses, tissues and diseases),
§ analytical process and the mathematical models, and
§ their instantiation in computational simulations (from the molecular
level through to whole populations of individuals)
to be harmonized with respect to structure, format and
annotation
Metadata standards - datasets
Metadata for discovery
model and related formats
Metadata for discovery, but not only
…..
Domain-specific metadata standards for datasets
MIAME
MIRIAM
MIQAS
MIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE
….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML
…
GELML
ISA
CML
MITAB
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO
…
TEDDY
PRO
XAO
DO
VO
de jure
standard
organizations
de facto
grass-roots
groups
Formats Terminologies Guidelines
220+
115+
548+
~1000
https://doi.org/10.6084/m9.figshare.3795816.v2
https://doi.org/10.6084/m9.figshare.4055496.v1
• Perspective and focus vary, ranging:
§ from standards with a specific biological or clinical domain of study
(e.g. neuroscience) or significance (e.g. model processes)
§ to the technology used (e.g. imaging modality)
• Motivation is different, spanning:
§ creation of new standards (to fill a gap)
§ mapping and harmonization of complementary or contrasting efforts
§ extensions and repurposing of existing standards
• Stakeholders are diverse, including those:
§ involved in managing, serving, curating, preserving, publishing or
regulating data and/or other digital objects
§ academia, industry, governmental sectors, and funding agencies
§ producers but also also consumers of the standards, as domain (and
not just technical) expertise is a must
A complex landscape
Standards’ life cycle
• Formulation
§ use cases, scope, prioritization and expertise
• Development
§ iterations, tests, feedback and evaluation
§ harmonization of different perspectives and available options
• Maintenance
§ (exemplar) implementations, technical documentation, education
material, metrics
§ sustainability, evolution (versions) and conversion modules
Technologically-delineated
views of the world
Biologically-delineated
views of the world
Generic features (‘common core’)
- description of source biomaterial
- experimental design components
Arrays &
Scanning
…
Columns
Gels
MS MS
FTIR
NMR
Columns
…
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
neuroscience
Fragmentation, duplications and gaps
Arrays
Scanning
…
Arrays
Scanning
… Arrays &
Scanning
…
Columns
Gels
MS MS
FTIR
NMR
Columns
…
transcriptomics
proteomics
metabolomics
Modularization to combine and validate
plant biology
epidemiology
neuroscience
Proteomics-based
investigations of
neurodegenerative diseases
Proteomics and metabolomics-
based investigations of
neurodegenerative diseases
Working in/across multiple domains is challenging
• Requires
§ Mapping between/among heterogeneous representations
§ Conceptual modelling framework to encompass the
domain specific metadata standards
§ Tools to handle customizable annotation, multiple
conversions and validation
Technical and social engineering required
• Pain points include
§ Fragmentation
§ Coordination, harmonization, extensions
§ Credit, incentives for contributors
§ Governance, ownership
§ Indicators and evaluation methods
§ Outreach and engagement with all stakeholders
§ Synergies between basic and clinical/medical areas
§ Implementations: infrastructures, tools, services
§ Education, documentation and training
§ Funding streams
§ Business models for sustainability
Too many
cooks in the
standards’
kitchen?
Standards
fusion…anyone?
doi: 10.1126/science.1180598
doi:10.1038/nbt1346doi:10.1038/nbt1346
OBO Portal and Foundry
Portal and Foundrydoi: 10.1038/nbt.1411
Doing my fair share
• Consumers:
§ How do I find the standards appropriate for my case?
• Producers
§ How do I make my standards visible to others?
Improving discoverability of standards
Monitors	the	development and	evolution of	standards,	
their	use in	databases and	the	adoption	of	both	in	data	policies,	
to	inform and	educate the	user	community
Standard developing groups, incl:Journal, publishers, incl:
Cross-links, data exchange, incl:
Societies and organisations, incl: Institutional RDM services, incl:
Projects, programmes:
Working with and for producers and consumers
Databases/data
repositories
Metadata standards
Formats Terminologies Guidelines
Interlink standards among themselves and with repositories
Data policies by
funders, journals and
other organizations
Formats Terminologies Guidelines
…and to indicate ‘adoption’
Databases/data
repositories
Data policies by
funders, journals and
other organizations
Metadata standards
270
48
23
2
97
87 4
204
9 6 8
Assign ‘indicators’ to describe their status…
Paper in preparation,
preliminary information as of July 2017
Ready	for	use,	implementation,	or	recommendation
In	development
Status	uncertain
Deprecated	as	subsumed	or	superseded
All	records	are	manually	curated
in-house	and	verified	by	the	
community	behind	each	resource
Help us map the neuroscience standards landscape
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
Paper in preparation,
preliminary information as of July 2017
Activating the decision-making chain
Models/Formats Reporting Guidelines Terminology Artifacts
Database Implementations
Journal Recommendations
Models/Formats Reporting Guidelines Terminology Artifacts
Number of standards recommended by 68 journals/publishers policies (the top one)
Number of standards implemented by 544 databases/repositories (the top one)
6 out of 223 (ISA-Tab)
26 out of 118 (MIAME)
8 out of 343 (NCBI Tax)
59 out of 116 (MIAME)
146 out of 223 (FASTA)
121 out of 343 (GO)
Paper in preparation,
preliminary information as of July 2017
Activating the decision-making chain
Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Milo
Thurston, DPhD
Research Software Engineer
Massimiliano
Izzo, PhD
Research Software Engineer
Peter
McQuilton, PhD
Knowledge Engineer
Allyson
Lister, PhD
Knowledge Engineer
Eamonn
Maguire, Dphil
Contractor
David
Johnson, PhD
Research Software Engineer
Melanie
Adekale, PhD
Biocurator Contractor
Delphine
Dauga, PhD
Biocurator Contractor
Susanna-Assunta Sansone, PhD
Principal Investigator, Associate Director
The (long) road to FAIR
Interoperability standards
are digital objects in their own right,
with their associated research, development and educational activities

FAIR and metadata standards - FAIRsharing and Neuroscience

  • 1.
    FAIR digital researchassets: beyond the acronym Susanna-Assunta Sansone, PhD @SusannaASansone ORCiD 0000-0001-5306-5690 Consultant, Founding Academic Editor Associate Director, Principal Investigator Neuroinformatics, Kuala Lumpur, 20-21 August, 2017
  • 2.
    • Available ina public repository • Findable through some sort of search facility • Retrievable in a standard format • Self-described so that third parties can make sense of it • Intended to outlive the experiment for which they were collected To do better science, more efficiently we need data that are…
  • 3.
    A set ofprinciples, for those wishing to enhance the value of their data holdings
  • 5.
    Wider adoption ofthe FAIR principles, by research infrastructure programmes, e.g.
  • 6.
  • 7.
    Defining a frameworkfor evaluating FAIRness By the fairmetrics.org Working Group
  • 8.
    NOTE: The Principles arehigh-level; do not suggest any specific technology, standard, or implementation-solution Principles put emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals Interoperability standards – the pillars of FAIR
  • 9.
    The invisible machinery •Identifiers and metadata to be implemented by technical experts in tools, registries, catalogues, databases, services • It is essential to make standards ‘invisible’ to lay users, who often have little or no familiarity with them
  • 10.
  • 11.
    Metadata standards –fundamentals • Descriptors for a digital object that help to understand what it is, where to find it, how to access it etc. • The type of metadata depends also on the type of digital object (e.g. software, dataset) • The depth and breadth of metadata varies according to their purpose § e.g. reproducibility requires richer metadata then citation
  • 12.
    • Domain-level descriptorsthat are essential for interpretation, verification and reproducibility of datasets • The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why Metadata standards - datasets
  • 13.
    • Domain-level descriptorsthat are essential for interpretation, verification and reproducibility of datasets • The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why allowing: § experimental components (e.g., design, conditions, parameters), § fundamental biological entities (e.g., samples, genes, cells), § complex concepts (such as bioprocesses, tissues and diseases), § analytical process and the mathematical models, and § their instantiation in computational simulations (from the molecular level through to whole populations of individuals) to be harmonized with respect to structure, format and annotation Metadata standards - datasets
  • 16.
  • 18.
    model and relatedformats Metadata for discovery, but not only
  • 20.
  • 22.
    Domain-specific metadata standardsfor datasets MIAME MIRIAM MIQAS MIX MIGEN ARRIVE MIAPE MIASE MIQE MISFISHIE …. REMARK CONSORT SRAxml SOFT FASTA DICOM MzML SBRML SEDML … GELML ISA CML MITAB AAO CHEBIOBI PATO ENVO MOD BTO IDO … TEDDY PRO XAO DO VO de jure standard organizations de facto grass-roots groups Formats Terminologies Guidelines 220+ 115+ 548+ ~1000
  • 23.
  • 24.
    • Perspective andfocus vary, ranging: § from standards with a specific biological or clinical domain of study (e.g. neuroscience) or significance (e.g. model processes) § to the technology used (e.g. imaging modality) • Motivation is different, spanning: § creation of new standards (to fill a gap) § mapping and harmonization of complementary or contrasting efforts § extensions and repurposing of existing standards • Stakeholders are diverse, including those: § involved in managing, serving, curating, preserving, publishing or regulating data and/or other digital objects § academia, industry, governmental sectors, and funding agencies § producers but also also consumers of the standards, as domain (and not just technical) expertise is a must A complex landscape
  • 25.
    Standards’ life cycle •Formulation § use cases, scope, prioritization and expertise • Development § iterations, tests, feedback and evaluation § harmonization of different perspectives and available options • Maintenance § (exemplar) implementations, technical documentation, education material, metrics § sustainability, evolution (versions) and conversion modules
  • 26.
    Technologically-delineated views of theworld Biologically-delineated views of the world Generic features (‘common core’) - description of source biomaterial - experimental design components Arrays & Scanning … Columns Gels MS MS FTIR NMR Columns … transcriptomics proteomics metabolomics plant biology epidemiology neuroscience Fragmentation, duplications and gaps Arrays Scanning …
  • 27.
    Arrays Scanning … Arrays & Scanning … Columns Gels MSMS FTIR NMR Columns … transcriptomics proteomics metabolomics Modularization to combine and validate plant biology epidemiology neuroscience Proteomics-based investigations of neurodegenerative diseases Proteomics and metabolomics- based investigations of neurodegenerative diseases
  • 28.
    Working in/across multipledomains is challenging • Requires § Mapping between/among heterogeneous representations § Conceptual modelling framework to encompass the domain specific metadata standards § Tools to handle customizable annotation, multiple conversions and validation
  • 30.
    Technical and socialengineering required • Pain points include § Fragmentation § Coordination, harmonization, extensions § Credit, incentives for contributors § Governance, ownership § Indicators and evaluation methods § Outreach and engagement with all stakeholders § Synergies between basic and clinical/medical areas § Implementations: infrastructures, tools, services § Education, documentation and training § Funding streams § Business models for sustainability
  • 31.
    Too many cooks inthe standards’ kitchen?
  • 32.
  • 34.
    doi: 10.1126/science.1180598 doi:10.1038/nbt1346doi:10.1038/nbt1346 OBO Portaland Foundry Portal and Foundrydoi: 10.1038/nbt.1411 Doing my fair share
  • 35.
    • Consumers: § Howdo I find the standards appropriate for my case? • Producers § How do I make my standards visible to others? Improving discoverability of standards
  • 38.
    Monitors the development and evolution of standards, their usein databases and the adoption of both in data policies, to inform and educate the user community
  • 39.
    Standard developing groups,incl:Journal, publishers, incl: Cross-links, data exchange, incl: Societies and organisations, incl: Institutional RDM services, incl: Projects, programmes: Working with and for producers and consumers
  • 40.
    Databases/data repositories Metadata standards Formats TerminologiesGuidelines Interlink standards among themselves and with repositories Data policies by funders, journals and other organizations
  • 41.
    Formats Terminologies Guidelines …andto indicate ‘adoption’ Databases/data repositories Data policies by funders, journals and other organizations Metadata standards
  • 42.
    270 48 23 2 97 87 4 204 9 68 Assign ‘indicators’ to describe their status… Paper in preparation, preliminary information as of July 2017 Ready for use, implementation, or recommendation In development Status uncertain Deprecated as subsumed or superseded All records are manually curated in-house and verified by the community behind each resource
  • 43.
    Help us mapthe neuroscience standards landscape
  • 50.
    Journal Recommendations Models/Formats ReportingGuidelines Terminology Artifacts Number of standards recommended by 68 journals/publishers policies (the top one) 6 out of 223 (ISA-Tab) 26 out of 118 (MIAME) 8 out of 343 (NCBI Tax) Paper in preparation, preliminary information as of July 2017 Activating the decision-making chain
  • 51.
    Models/Formats Reporting GuidelinesTerminology Artifacts Database Implementations Journal Recommendations Models/Formats Reporting Guidelines Terminology Artifacts Number of standards recommended by 68 journals/publishers policies (the top one) Number of standards implemented by 544 databases/repositories (the top one) 6 out of 223 (ISA-Tab) 26 out of 118 (MIAME) 8 out of 343 (NCBI Tax) 59 out of 116 (MIAME) 146 out of 223 (FASTA) 121 out of 343 (GO) Paper in preparation, preliminary information as of July 2017 Activating the decision-making chain
  • 52.
    Philippe Rocca-Serra, PhD Senior ResearchLecturer Alejandra Gonzalez-Beltran, PhD Research Lecturer Milo Thurston, DPhD Research Software Engineer Massimiliano Izzo, PhD Research Software Engineer Peter McQuilton, PhD Knowledge Engineer Allyson Lister, PhD Knowledge Engineer Eamonn Maguire, Dphil Contractor David Johnson, PhD Research Software Engineer Melanie Adekale, PhD Biocurator Contractor Delphine Dauga, PhD Biocurator Contractor Susanna-Assunta Sansone, PhD Principal Investigator, Associate Director
  • 53.
    The (long) roadto FAIR Interoperability standards are digital objects in their own right, with their associated research, development and educational activities