Honorary Academic Editor
Susanna-Assunta Sansone, PhD
(University of Oxford, UK)

Visit
nature.com/scientificdata

Managing Editor
Andrew L Hufton, PhD

Email
scientificdata@nature.com

Advisory Panel and Editorial
Board including senior researchers,
funders, librarians and curators

Tweet
@ScientificData
Now open for submissions!
Launching May 2014
Advisory Panel
Susanna-Assunta Sansone
Honorary Academic Editor
Andrew L Hufton
Managing Editor
Ruth Wilson
Publisher

Supported by

Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment
Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office
of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss
Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta,
UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute,
USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ●
Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics
Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical
Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ●
Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ●
Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN
Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ●
Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter
Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
Now open for submissions!
Launching May 2014

Introducing a new content type:
Data Descriptor
Supported by
Data Descriptor vs. Traditional Article
● The data descriptor is only concerned with the facts behind the
methodology of data generation/collection and processing
● A data descriptor can be:
– submitted prior to journal article
– submitted at the same time as the journal article
– submitted after journal article

Interpretation
Synthesis
Analysis

Facts
What is the
sample?

Data Descriptor
Conclusions
Data Descriptor

What did I do to
generate the data?
How was the data
processed?
Where is the data?

Who did what when?

Summary
of DD

Journal article
Prior Publication Policy
“Nature-titled journals will not consider prior Data Descriptor publications to
compromise the novelty of new manuscript submissions as long as those
manuscripts go substantially beyond a descriptive analysis of the data, and
report important new scientific findings appropriate for the journal. This policy
does not necessarily extend to subsequent journal articles whose primary
purpose is to describe a new dataset or resource.”
See the full text in our Editorial Policies online
Barriers to data sharing and reuse
● Datasets are not released
● Datasets are not reusable or discoverable
● Lack of credit for sharing data and making it
reusable
Two sample Data Descriptors now online

7
Data Descriptor has 2 components

Article
or
narrative component
(PDF and HTML)

Supported by

Experimental metadata
or
structured component
(in-house curated, machine-readable formats)
8
Data Descriptor - article
Sections:
• Title
• Abstract
• Background & Summary
• Methods
• Technical Validation
• Data Records
• Usage Notes
• Figures & Tables
• References

In traditional publications this is
not provided in a sufficiently
detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
Data Descriptor – experimental metadata
Submit ISA-Tab* files directly

OR

Submission tools and simple templates
help authors provide the information
without special tools

In-house curator
standardizes the
structured content
*Sansone et al., Nature Genetics, 2012
10
Discover similar datasets
Structured content allows users to link, with one click, to other datasets
studying the same tissue, disease, organism, or using the same experimental
platform
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content

SciData DD

Same tissue

Same organism

Structured
content

Same assay

SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content

SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content

11
Get Credit for Sharing Your Data
Publications will be listed in the major indexes and will be citeable

Open-access
Authors select from three Creative Commons licences for the main
Data Descriptor. Each publication supported by curated CC0 metadata

Focused on Data Reuse
All the information others need to reuse the data; no interpretative
analysis or hypothesis testing

Peer-reviewed
Rigorous peer-review managed by our Editorial Board of academic
researchers ensures data quality and standards

Promoting Community Data Repositories
Data stored in community data repositories
Complementary to both journal articles
and data repositories
Export to various formats
(ISA_tab, RDF, etc)
Scientific Data and GBIF: Roadmap
Partnership
between
GBIF and
NPG
Scientific
Data

Mapping the
DD article and
GBIF Metadata
Profile

Q4
2013

Q4
2013

Enhancement
to GBIF IPT to
export the DD
article

Call for
manuscript
submissions

1st set of
Data
Descriptors
published

Vishwas Chavan

PHASE 1
Q42
2014

Q43
2014

Q4
2014

Mapping the DD
experimental
metadata and
GBIF Metadata
Profile

Further
enhancements
to GBIF IPT

PHASE 2

The two components of the Data Descriptor (DD):
• DD article or narrative component
• DD experimental metadata or structured component (ISA-Tab format, progressively others e.g. RDF)

NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013

  • 1.
    Honorary Academic Editor Susanna-AssuntaSansone, PhD (University of Oxford, UK) Visit nature.com/scientificdata Managing Editor Andrew L Hufton, PhD Email [email protected] Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Tweet @ScientificData
  • 2.
    Now open forsubmissions! Launching May 2014 Advisory Panel Susanna-Assunta Sansone Honorary Academic Editor Andrew L Hufton Managing Editor Ruth Wilson Publisher Supported by Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
  • 3.
    Now open forsubmissions! Launching May 2014 Introducing a new content type: Data Descriptor Supported by
  • 4.
    Data Descriptor vs.Traditional Article ● The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing ● A data descriptor can be: – submitted prior to journal article – submitted at the same time as the journal article – submitted after journal article Interpretation Synthesis Analysis Facts What is the sample? Data Descriptor Conclusions Data Descriptor What did I do to generate the data? How was the data processed? Where is the data? Who did what when? Summary of DD Journal article
  • 5.
    Prior Publication Policy “Nature-titledjournals will not consider prior Data Descriptor publications to compromise the novelty of new manuscript submissions as long as those manuscripts go substantially beyond a descriptive analysis of the data, and report important new scientific findings appropriate for the journal. This policy does not necessarily extend to subsequent journal articles whose primary purpose is to describe a new dataset or resource.” See the full text in our Editorial Policies online
  • 6.
    Barriers to datasharing and reuse ● Datasets are not released ● Datasets are not reusable or discoverable ● Lack of credit for sharing data and making it reusable
  • 7.
    Two sample DataDescriptors now online 7
  • 8.
    Data Descriptor has2 components Article or narrative component (PDF and HTML) Supported by Experimental metadata or structured component (in-house curated, machine-readable formats) 8
  • 9.
    Data Descriptor -article Sections: • Title • Abstract • Background & Summary • Methods • Technical Validation • Data Records • Usage Notes • Figures & Tables • References In traditional publications this is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets
  • 10.
    Data Descriptor –experimental metadata Submit ISA-Tab* files directly OR Submission tools and simple templates help authors provide the information without special tools In-house curator standardizes the structured content *Sansone et al., Nature Genetics, 2012 10
  • 11.
    Discover similar datasets Structuredcontent allows users to link, with one click, to other datasets studying the same tissue, disease, organism, or using the same experimental platform SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD Same tissue Same organism Structured content Same assay SciData DD SciData DD SciData DD Structured content Structured content Structured content SciData DD SciData DD SciData DD Structured content Structured content Structured content 11
  • 12.
    Get Credit forSharing Your Data Publications will be listed in the major indexes and will be citeable Open-access Authors select from three Creative Commons licences for the main Data Descriptor. Each publication supported by curated CC0 metadata Focused on Data Reuse All the information others need to reuse the data; no interpretative analysis or hypothesis testing Peer-reviewed Rigorous peer-review managed by our Editorial Board of academic researchers ensures data quality and standards Promoting Community Data Repositories Data stored in community data repositories
  • 13.
    Complementary to bothjournal articles and data repositories Export to various formats (ISA_tab, RDF, etc)
  • 14.
    Scientific Data andGBIF: Roadmap Partnership between GBIF and NPG Scientific Data Mapping the DD article and GBIF Metadata Profile Q4 2013 Q4 2013 Enhancement to GBIF IPT to export the DD article Call for manuscript submissions 1st set of Data Descriptors published Vishwas Chavan PHASE 1 Q42 2014 Q43 2014 Q4 2014 Mapping the DD experimental metadata and GBIF Metadata Profile Further enhancements to GBIF IPT PHASE 2 The two components of the Data Descriptor (DD): • DD article or narrative component • DD experimental metadata or structured component (ISA-Tab format, progressively others e.g. RDF)