Data Consultant,
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Associate Director,
Principal Investigator
NIH BD2K Workshop: Frameworks for Community-Based Standards Efforts, Sept 25-26, 2013
Mapping the Landscape of Community Standards
Challenges and Opportunities
www.slideshare.net/SusannaSansone
§  Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed,
standards are pivotal to structure, enrich
the description and share
•  entities of interest
e.g., genes, metabolites,
phenotypes, models
•  experimental steps
e.g., provenance of study materials,
technology and measurement types
Growing movement for reproducible research
A community mobilization to develop standards, e.g.:
§  Structural and operational differences
•  organization types (open, close to members, society, WG etc.)
•  standards development (how to formulate, conduct and maintain)
•  adoption, uptake, outreach (link to journals, funders and commercial sector)
•  funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
Types of reporting standards
Nanotechnology Working Group
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
Technologically-delineated
views of the world
Biologically-delineated
views of the world
Generic features ( common core )
- description of source biomaterial
- experimental design components
Arrays
Scanning Arrays &
Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Fragmentation, duplications and gaps
To compare and integrate data we need interoperable standards
Growing number of reporting standards
+ 130
+ 150
+ 303
Source:BioPortal
Databases,
annotation,
curation
tools
implementing
standards
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
CIMR!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
Source:BioSharing
Source:BioSharing
But how much do we know about these standards
•  A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
•  A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
•  Progressively associate standards to data policies and databases
•  Develop assessment criteria for usability and popularity of standards
•  Help stakeholders to make informed decisions on e.g. what standards or
databases to use or recommend; identify efforts they have funded
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
11
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
12
Users can claim
records and
maintain them
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
13
Criteria to be used in evaluating standards for adoption
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
14
Help prospective users to select and use appropriate one
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
15
Classify, links standards and visualize relations
Example
The relationship among popular standard formats for pathway information
BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and
network data integration. SBML and CellML are designed to support mathematical simulations
of biological systems and SBGN represents pathway diagrams.
CREDIT:
Demir, et al., The BioPAX
community standard for
pathway data sharing, 2010.
Drug Discovery Today Volume 16, Numbers 21/22  November 2011
research with
The information landscape in the industrial sector
Big Life
Science
Company
Yesterday Today Tomorrow
Big Life
Science
Company
Proprietary
content
provider
Public
content
provider
Academic
group
Software vendor
CRO
Service provider
Regulatory
authorities
…evolving…
Credit: Pistoia Alliance
Michael Braxenthaler, Roche
Not just technological but also social challenges
§  Ownership of open standards can be problematic in broad,
grass-root collaborations
•  legal framework is still embryonic
•  it requires improved models, to encourage maintenance of and
contributions to these efforts, supporting their evolutions
•  Extensive community liaison needs to be
•  managed and funded
•  rewards and incentives need to be identified for all contributors
Acknowledgements
•  Jessica Tenenbaum
•  Michael Braxenthaler
•  Lee Harland
•  Bryn Williams-Jones
•  Ian Dix
•  Trish Whetzel
•  Mark Musen
•  Collaborators in
•  OBO Foundry
•  COSMOS
•  ISA Commons (especially ISA-Tab-Nano team)
•  GSC
•  Metabolomics Society
•  Data Dryad
•  Pistoia Alliance
•  Elixir UK
•  and many more….

"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

  • 1.
    Data Consultant, Honorary AcademicEditor Susanna-Assunta Sansone, PhD Associate Director, Principal Investigator NIH BD2K Workshop: Frameworks for Community-Based Standards Efforts, Sept 25-26, 2013 Mapping the Landscape of Community Standards Challenges and Opportunities www.slideshare.net/SusannaSansone
  • 2.
    §  Researchers andbioinformaticians in both academic and commercial arenas, along with funding agencies and publishers, embrace the concept that community-developed, standards are pivotal to structure, enrich the description and share •  entities of interest e.g., genes, metabolites, phenotypes, models •  experimental steps e.g., provenance of study materials, technology and measurement types Growing movement for reproducible research
  • 3.
    A community mobilizationto develop standards, e.g.: §  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering) de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  • 4.
    Types of reportingstandards Nanotechnology Working Group Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’ Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another
  • 5.
    Technologically-delineated views of theworld Biologically-delineated views of the world Generic features ( common core ) - description of source biomaterial - experimental design components Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns transcriptomics proteomics metabolomics plant biology epidemiology microbiology Fragmentation, duplications and gaps To compare and integrate data we need interoperable standards
  • 6.
    Growing number ofreporting standards + 130 + 150 + 303 Source:BioPortal Databases, annotation, curation tools implementing standards miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! CIMR! MIAPE! MIASE! MIQE! MISFISHIE….! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML…! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO…! TEDDY! PRO! XAO! DO VO! Source:BioSharing Source:BioSharing
  • 7.
    But how muchdo we know about these standards
  • 9.
    •  A coherent,curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains
  • 10.
    •  A coherent,curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains •  Progressively associate standards to data policies and databases •  Develop assessment criteria for usability and popularity of standards •  Help stakeholders to make informed decisions on e.g. what standards or databases to use or recommend; identify efforts they have funded
  • 11.
    The International Conferenceon Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 11
  • 12.
    The International Conferenceon Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 12 Users can claim records and maintain them
  • 13.
    The International Conferenceon Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 13 Criteria to be used in evaluating standards for adoption
  • 14.
    The International Conferenceon Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 14 Help prospective users to select and use appropriate one
  • 15.
    The International Conferenceon Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 15 Classify, links standards and visualize relations
  • 16.
    Example The relationship amongpopular standard formats for pathway information BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams. CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.
  • 17.
    Drug Discovery TodayVolume 16, Numbers 21/22 November 2011 research with The information landscape in the industrial sector Big Life Science Company Yesterday Today Tomorrow Big Life Science Company Proprietary content provider Public content provider Academic group Software vendor CRO Service provider Regulatory authorities …evolving… Credit: Pistoia Alliance Michael Braxenthaler, Roche
  • 18.
    Not just technologicalbut also social challenges §  Ownership of open standards can be problematic in broad, grass-root collaborations •  legal framework is still embryonic •  it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions •  Extensive community liaison needs to be •  managed and funded •  rewards and incentives need to be identified for all contributors
  • 19.
    Acknowledgements •  Jessica Tenenbaum • Michael Braxenthaler •  Lee Harland •  Bryn Williams-Jones •  Ian Dix •  Trish Whetzel •  Mark Musen •  Collaborators in •  OBO Foundry •  COSMOS •  ISA Commons (especially ISA-Tab-Nano team) •  GSC •  Metabolomics Society •  Data Dryad •  Pistoia Alliance •  Elixir UK •  and many more….