Privacy and Publication: challenges
and opportunities for clinical data
Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
varsha.khodiyar@nature.com
@varsha_khodiyar
@scientificdata
Big Data Opportunities Using the NDA, 17th October 2015
Reporting bias impacts human health
Oseltamvir: “only...effective...for the prevention
and treatment of symptoms of influenza”
Cochrane Database Syst Rev. 2012 DOI: 10.1002/14651858.CD008965.pub3
Reboxetine: “overall an ineffective and
potentially harmful antidepressant”
BMJ 2010;341:c4737
Statins: “beneficial effect…on atrial
fibrillation...is not supported by a
comprehensive review of published
and unpublished evidence”
BMJ 2011;342:d1250
3
Withholding data impacts human health
Increasing support for data transparency
• Funder/institution policy and mandates1
• Regulatory agencies (EMA)
• Legislation (FDAAA)
• Non-governmental/academic (IOM, YODA)
• Industry (CSDR)
• Journals and ICMJE2
4
1. Hahnel, Mark (2015): Global funders who require data archiving as a condition of grants.
figshare. http://dx.doi.org/10.6084/m9.figshare.1281141
2. http://www.icmje.org/news-and-editorials/principles_data_sharing_jan2014.html
Publishers/journals and data access
• More reliable evidence – and papers
• Journal mission/goals
• Help community derive maximum benefit from research
• Content innovation (facilitate more use and reuse)
• Reliability (peer review)
• Discoverability and visibility (bibliographic databases)
• Linking and licensing content (open access)
• Permanence (content and links)
• Credit/incentives (article types and citations)
• Encouraging and implementing good practice and policies
5
Journal data policies
• Willingness to share stated (Annals Internal Medicine)
• Data sharing implied by submission (BioMed Central*)
• Data sharing implied as a condition of publication (Nature*)
• Mandated data sharing with statement in paper (PLOS, BMJ -
for clinical trials)
• Mandated data sharing with statement and link to data (non-
medical journals e.g. ecology, animal genomics)
• Mandated open data as a condition of submission (Scientific
Data, GigaScience, F1000Research)
*Minimum requirement – some disciplines/journals may mandate
6
STRONGER
1. Vines, T. H. et al. Mandated data archiving greatly improves access to research data. FASEB J.
fj.12–218164– (2013). doi:10.1096/fj.12-218164
Data sharing via supplementary files
7
Sandercock et al: The International Stroke Trial database. Trials 2011, 12:101
doi:10.1186/1745-6215-12-101
Data sharing via repository links
8
Data sharing via repository links
9
Data sharing via repository links
10
Role of data journals/articles
• Data peer review
• Outlet for ‘unpublishable’ data
• Data discoverability
• Data reusability
• Permanence of datasets
• Robust links with repositories
• Credit/reward data generators
• “Intelligently open data”
11
Scientific Data
Scientific Data peer review
Peer review focuses on:
• Completeness (can others reproduce?)
• Consistency (were community standards
followed?)
• Integrity (are data in the best repository?)
• Experimental rigour and technical quality
(were the methods sound?)
Does not focus on:
• Perceived impact/importance
• Size/complexity of data
An example Data Descriptor
14
Human
readable
representation
of study
i.e. article
(HTML & PDF)
Human readable
representation
of study
i.e. article
(HTML & PDF)
Machine
readable
representation
of study
i.e. metadata
Scientific Data structured metadata
In-house curation team:
• assists users to submit the
structured content via
simple templates and an
internal authoring tool
• performs value-added
semantic annotation of the
experimental metadata
analysis
method
script
Data file or
record in a
database
Data on (reasonable) request - issues
16
• Meta-analysis fails to launch when <40% IPD
available – unanswered requests and refusal
to share
Systematic Reviews 2014, 3:97 doi:10.1186/2046-4053-3-97
• Poor availability of psychological research
data (only 64/249 datasets available)
American Psychologist 2006, 61(7) doi:10.1037/0003-066X.61.7.726
• Data received from 1/10 authors publishing in
PLOS Medicine and PLOS Clinical Trials
PLoS ONE 2009, 4(9): doi:10.1371/journal.pone.0007078
• 38% of 394 requested datasets received from
APA journal authors
Collabra 2015, 1(1): doi:10.1525/collabra.13
Clinical researchers support sharing
17
Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS:
Sharing of clinical trial data among trialists: a cross sectional survey. BMJ
2012;345:e7570
• Sharing de-identified data via repositories
should be required (236 respondents, 74%)
• Investigators should share de-identified data
on request (229 respondents, 72%)
What are researchers’ concerns?
18
Reproduced from: Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S,
Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among
trialists: a cross sectional survey. BMJ 2012;345:e7570
Better ways to share on request
19
Yale Open Data Access (YODA) & Clinical Study
Data Request (CSDR) projects:
• Data Use Agreements (DUAs)
• Controlled access environment
• Scientific validity of reanalysis checked
• Independent governance
• Data anonymisation checks
http://yoda.yale.edu/
https://www.clinicalstudydatarequest.com/
Better way to publish data on
request
20
• Sensitive data repositories (e.g. UKDA)
Permanence, curation, persistent identifiers,
versioning
• Data-on-request services (e.g. YODA)
Independent governance, scientific review and
transparency of access requests, DUAs
• Journals/publishers
Peer review, visibility, credit/citations, robust
links+
=
A robust data-on-request workflow?
21
Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of
non-public clinical datasets: guidance for researchers, repositories, editors and
funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
Open access Data Descriptor
22
http://www.nature.com/articles/sdata201531
Open access Data Descriptor
23
http://www.nature.com/articles/sdata201531
Linked to restricted access data
http://dx.doi.org/10.7910/DVN/25833
All approved repositories:
http://www.nature.com/sdata/data-policies/repositories
Key recommendations
25
• Clinical researchers: Prepare to share on request,
with short embargoes
• Repositories: Develop mechanisms to host clinical
data non-publicly and manage access requests;
collaborate with journals
• Editors and publishers: Check policy compliance
for every submission and facilitate peer reviewer
access to data; collaborate with repositories
• Sponsors and funders: Partner with trusted
repositories and ensure that data access requests are
proportionately reviewed without introducing
unnecessary barriers
Repositories for non-public data should
26
• Provide stable identifiers for metadata records
• Allows access to data with the minimum of
restrictions, codified in DUAs
• Ideally be independent of the study sponsors
• Have a transparent and persistent system for
requesting access to data and reviewing requests
to access data
• Allow access to data in a timely manner
• Ensure long-term preservation of data in their
non-public form
Visit nature.com/sdata
Email scientificdata@nature.com
Tweet @ScientificData
Honorary Academic Editor
Susanna-Assunta Sansone
Managing Editor
Andrew L. Hufton
Data Curation Editor
Varsha K. Khodiyar
Advisory Panel and Editorial
Board including senior researchers,
funders, librarians and curators
Supported by

Privacy and Publication: challenges and opportunities for clinical data

  • 1.
    Privacy and Publication:challenges and opportunities for clinical data Varsha Khodiyar, PhD Data Curation Editor, Scientific Data Nature Publishing Group [email protected] @varsha_khodiyar @scientificdata Big Data Opportunities Using the NDA, 17th October 2015
  • 2.
    Reporting bias impactshuman health Oseltamvir: “only...effective...for the prevention and treatment of symptoms of influenza” Cochrane Database Syst Rev. 2012 DOI: 10.1002/14651858.CD008965.pub3 Reboxetine: “overall an ineffective and potentially harmful antidepressant” BMJ 2010;341:c4737 Statins: “beneficial effect…on atrial fibrillation...is not supported by a comprehensive review of published and unpublished evidence” BMJ 2011;342:d1250
  • 3.
  • 4.
    Increasing support fordata transparency • Funder/institution policy and mandates1 • Regulatory agencies (EMA) • Legislation (FDAAA) • Non-governmental/academic (IOM, YODA) • Industry (CSDR) • Journals and ICMJE2 4 1. Hahnel, Mark (2015): Global funders who require data archiving as a condition of grants. figshare. http://dx.doi.org/10.6084/m9.figshare.1281141 2. http://www.icmje.org/news-and-editorials/principles_data_sharing_jan2014.html
  • 5.
    Publishers/journals and dataaccess • More reliable evidence – and papers • Journal mission/goals • Help community derive maximum benefit from research • Content innovation (facilitate more use and reuse) • Reliability (peer review) • Discoverability and visibility (bibliographic databases) • Linking and licensing content (open access) • Permanence (content and links) • Credit/incentives (article types and citations) • Encouraging and implementing good practice and policies 5
  • 6.
    Journal data policies •Willingness to share stated (Annals Internal Medicine) • Data sharing implied by submission (BioMed Central*) • Data sharing implied as a condition of publication (Nature*) • Mandated data sharing with statement in paper (PLOS, BMJ - for clinical trials) • Mandated data sharing with statement and link to data (non- medical journals e.g. ecology, animal genomics) • Mandated open data as a condition of submission (Scientific Data, GigaScience, F1000Research) *Minimum requirement – some disciplines/journals may mandate 6 STRONGER 1. Vines, T. H. et al. Mandated data archiving greatly improves access to research data. FASEB J. fj.12–218164– (2013). doi:10.1096/fj.12-218164
  • 7.
    Data sharing viasupplementary files 7 Sandercock et al: The International Stroke Trial database. Trials 2011, 12:101 doi:10.1186/1745-6215-12-101
  • 8.
    Data sharing viarepository links 8
  • 9.
    Data sharing viarepository links 9
  • 10.
    Data sharing viarepository links 10
  • 11.
    Role of datajournals/articles • Data peer review • Outlet for ‘unpublishable’ data • Data discoverability • Data reusability • Permanence of datasets • Robust links with repositories • Credit/reward data generators • “Intelligently open data” 11
  • 12.
  • 13.
    Scientific Data peerreview Peer review focuses on: • Completeness (can others reproduce?) • Consistency (were community standards followed?) • Integrity (are data in the best repository?) • Experimental rigour and technical quality (were the methods sound?) Does not focus on: • Perceived impact/importance • Size/complexity of data
  • 14.
    An example DataDescriptor 14 Human readable representation of study i.e. article (HTML & PDF) Human readable representation of study i.e. article (HTML & PDF) Machine readable representation of study i.e. metadata
  • 15.
    Scientific Data structuredmetadata In-house curation team: • assists users to submit the structured content via simple templates and an internal authoring tool • performs value-added semantic annotation of the experimental metadata analysis method script Data file or record in a database
  • 16.
    Data on (reasonable)request - issues 16 • Meta-analysis fails to launch when <40% IPD available – unanswered requests and refusal to share Systematic Reviews 2014, 3:97 doi:10.1186/2046-4053-3-97 • Poor availability of psychological research data (only 64/249 datasets available) American Psychologist 2006, 61(7) doi:10.1037/0003-066X.61.7.726 • Data received from 1/10 authors publishing in PLOS Medicine and PLOS Clinical Trials PLoS ONE 2009, 4(9): doi:10.1371/journal.pone.0007078 • 38% of 394 requested datasets received from APA journal authors Collabra 2015, 1(1): doi:10.1525/collabra.13
  • 17.
    Clinical researchers supportsharing 17 Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570 • Sharing de-identified data via repositories should be required (236 respondents, 74%) • Investigators should share de-identified data on request (229 respondents, 72%)
  • 18.
    What are researchers’concerns? 18 Reproduced from: Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
  • 19.
    Better ways toshare on request 19 Yale Open Data Access (YODA) & Clinical Study Data Request (CSDR) projects: • Data Use Agreements (DUAs) • Controlled access environment • Scientific validity of reanalysis checked • Independent governance • Data anonymisation checks http://yoda.yale.edu/ https://www.clinicalstudydatarequest.com/
  • 20.
    Better way topublish data on request 20 • Sensitive data repositories (e.g. UKDA) Permanence, curation, persistent identifiers, versioning • Data-on-request services (e.g. YODA) Independent governance, scientific review and transparency of access requests, DUAs • Journals/publishers Peer review, visibility, credit/citations, robust links+ =
  • 21.
    A robust data-on-requestworkflow? 21 Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non-public clinical datasets: guidance for researchers, repositories, editors and funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
  • 22.
    Open access DataDescriptor 22 http://www.nature.com/articles/sdata201531
  • 23.
    Open access DataDescriptor 23 http://www.nature.com/articles/sdata201531
  • 24.
    Linked to restrictedaccess data http://dx.doi.org/10.7910/DVN/25833 All approved repositories: http://www.nature.com/sdata/data-policies/repositories
  • 25.
    Key recommendations 25 • Clinicalresearchers: Prepare to share on request, with short embargoes • Repositories: Develop mechanisms to host clinical data non-publicly and manage access requests; collaborate with journals • Editors and publishers: Check policy compliance for every submission and facilitate peer reviewer access to data; collaborate with repositories • Sponsors and funders: Partner with trusted repositories and ensure that data access requests are proportionately reviewed without introducing unnecessary barriers
  • 26.
    Repositories for non-publicdata should 26 • Provide stable identifiers for metadata records • Allows access to data with the minimum of restrictions, codified in DUAs • Ideally be independent of the study sponsors • Have a transparent and persistent system for requesting access to data and reviewing requests to access data • Allow access to data in a timely manner • Ensure long-term preservation of data in their non-public form
  • 27.
    Visit nature.com/sdata Email [email protected] Tweet@ScientificData Honorary Academic Editor Susanna-Assunta Sansone Managing Editor Andrew L. Hufton Data Curation Editor Varsha K. Khodiyar Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Supported by

Editor's Notes

  • #17 Data on request policies do not result in data sharing