Use of data in safe havens: ethics
and reproducibility issues
Louise Corti
Service Director Data Publishing and Access
UK Data Service
UKRIO Research Integrity Webinar: Data Sharing and Ethics
Online
11 November 2020
The UK Data Service
• Social science data service, funded by the UKRI ESRC
• Curates and provides access to data for research and
teaching
• Trusted Digital Repository (TDR), accredited to ISO27001 Info
Security Management standard, Digital Economy Act
Processor application in process
• Work closely with research funders and key data producers
/institutions: research centres; UK NSIs, govt departments,
British Library etc
• Around 8000 data collections, from open to secure
Types of data held
• One off large scale survey e.g.
Health Survey for England
Social survey
• Detailed survey following people
over time
Longitudinal/cohort
• Interviews, focus groups, diariesQualitative
• UK census counts/tables, country
level statistics
Aggregate data
• Digitised databasesHistorical data
Spectrum of Access: UK Data Service
• No personal identification risk
• Open licence; few restrictions on reuseOpen
• Zero to low risk of personal
identification
• Authentication and authorisation
Safeguarded
• Risk of personal identification
• Authentication and authorisation
• Added safeguards
Controlled
Safeguards for data access
 Data access involves reduction of risk in a
manner acceptable to the data owner
 Risk mitigated by legal gateways and
appropriate safeguards
 5 Safes Framework offers a portfolio of
safeguards
Legal gateways to access
• Government Departments, Local Authorities, agencies,
public bodies face different legal restrictions on the
nature of the access they can provide for research
• In some cases, specific legal gateways have been
drafted to facilitate data sharing
• This can have an impact on the feasibility of some
research projects and data linking between various data
sources e.g. health and social data is often hard to link
• Digital Economy Act (DEA)
• Data Protection Act /General Data Protection Regulation (GDPR)
• Statistics and Registration Service Act (SRSA)
Gateways:
 Person-specific (‘fit and proper’)
 Project-specific
 Time-specific
 Dataset-specific
• Disclosing the identity or identifying information to someone
who should not have access is a breach of the law. Civil as
well as criminal sanctions for breaches
Legislation and legal gateways
Digital Economy Act, 2017
• The DEA covers: Access to digital services, Digital infrastructure. Online
pornography, Intellectual Property and Digital government
• Has a useful Research strand that broadly enables de-identified
information held by a public authority to be disclosed for the purposes of
research in the public good
• The research framework is underpinned by the Research Code of
Practice and Accreditation Criteria, approved by the UK Parliament in
July 2018
• The UK Statistics Authority is the statutory accrediting body
• Research Accreditation Panel to oversee the independent accreditation
of processors, researchers and research projects
Digital Economy Act 2017
Digital Economy Act data example
What makes this controlled data?
• Low level of geography
• Date of birth, including day
• Education and training: detailed training and qualifications
• Household and family characteristics: detail on family and
extended family units
• Employment: industry code of main job, employment details
• Unemployment and job hunting: details
• Temporary leave from work
• Accidents at work and work-related health problems
• Nationality, national identity and country of birth
• Occurrence of learning difficulty or disability
• Benefits, including detail on type of benefits claimed
Controlled data
Data access/ availability
• 5 SAFES framework (ONS)
• Access only via an accredited (safe haven)
• High access bar
• Process for reproduction not set up by
journals
5 Safes framework for Safe Data Access
12
SAFE PROJECTS
+
SAFE PEOPLE
+
SAFE SETTING
+
SAFE DATA
+
SAFE OUTPUTS
= SAFE USE
Is this an appropriate use of the data?
How trustworthy are the researchers?
Does the environment prevent misuse?
Is the data detail appropriate?
Is there any confidentiality risk from
publication?
Balancing the Five Safes: Example
Adapted by the Australian Bureau of Statistics from ‘Five Safes: designing data access for research’, Desai, T, Ritchie, F and Welpton, R, 2016
Safe
Unsafe
13
DEA Processors
Accreditation process =
Information Security + Capability
https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-
statistics/better-access-to-data-for-research-information-for-
processors/
DEA Accredited Researchers
Accreditation process = Safe
Researcher Training + Test +
Application
https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-useofdata-for-research-information-for-researchers/list-of-
accredited-researchers-and-research-projects-under-the-research-strand-of-the-digital-economy-act/
Understanding of disclosure
risk + producing Safe Outputs
Good attitudes and behaviour
= Trust
https://uksa.statisticsauthority.gov.uk/wp-content/uploads/2020/09/Example_RAP_Project_Application.pdf
Ethics component
• University REC submission for projects undertaking
secondary analysis of data often not always complete
enough to pass ethics test; often focussed on data
collection
• Clear guidance designed by the National Statistician's
Data Ethics Advisory Committee's (NSDEC) to support
researchers and statisticians to complete an ethical self-
assessment form
Ethics Self-Assessment
• UK Statistics Authority Ethics Self-Assessment Tool
• Easy-to-use framework to review the ethics of a project
• Helps identify & mitigate any ethical issues
• 6 main principles
 Public good, potential harm
 Identification, data security
 Training, technologies
 Legal gateways, frameworks
 Public view, engagement
 Access, use & sharing of data
Weightings for risk
‘Ethical risk’ defined as the ‘negative consequences of
unethical actions’.
Differential complexities of various ethical decisions
assessed using weighted measures
• Data linkage projects
• Sensitive personal data and processing
• Patient level health data
• Research including children and vulnerable adults
• Data sources
The Ethics Self-Assessment form
https://uksa.statisticsauthority.gov.uk/about-the-authority/committees/national-statisticians-data-
ethics-advisory-committee/ethics-self-assessment-tool/
• For each item – provide a score
• Enter the score in the spreadsheet
• The spreadsheet calculates an overall score of risk level
Reproducibility factors in safe havens
 Impact of the research outputs is a key requirement
 Code can be taken out, but is reviewed for risk
✘ Hard to reproduce as data and code behind a big gate
✘ Reproducer needs Accredited Researcher status
✘ Requires mechanism for reproducer to be added to an
accredited project
✘ Difficult provenance chain for ‘research-ready’ admin
data (creation, cleaning and versioning)
✘ Journals not ready to enable this review process
Using the 5 Safes network
 Strong 5 Safes framework for access to unsafe
data
 Very simple ethics assessment – easy to mark
 Utilise Accredited Researchers to undertake
reproducibility work
 Better training on how to be reproducible
 Advocate cleaning, value-added, final code
available in safe haven to enable reproduction and
reuse
Questions
Louise Corti
UK Data Service
corti@essex.ac.uk

Use of data in safe havens: ethics and reproducibility issues

  • 1.
    Use of datain safe havens: ethics and reproducibility issues Louise Corti Service Director Data Publishing and Access UK Data Service UKRIO Research Integrity Webinar: Data Sharing and Ethics Online 11 November 2020
  • 2.
    The UK DataService • Social science data service, funded by the UKRI ESRC • Curates and provides access to data for research and teaching • Trusted Digital Repository (TDR), accredited to ISO27001 Info Security Management standard, Digital Economy Act Processor application in process • Work closely with research funders and key data producers /institutions: research centres; UK NSIs, govt departments, British Library etc • Around 8000 data collections, from open to secure
  • 3.
    Types of dataheld • One off large scale survey e.g. Health Survey for England Social survey • Detailed survey following people over time Longitudinal/cohort • Interviews, focus groups, diariesQualitative • UK census counts/tables, country level statistics Aggregate data • Digitised databasesHistorical data
  • 4.
    Spectrum of Access:UK Data Service • No personal identification risk • Open licence; few restrictions on reuseOpen • Zero to low risk of personal identification • Authentication and authorisation Safeguarded • Risk of personal identification • Authentication and authorisation • Added safeguards Controlled
  • 5.
    Safeguards for dataaccess  Data access involves reduction of risk in a manner acceptable to the data owner  Risk mitigated by legal gateways and appropriate safeguards  5 Safes Framework offers a portfolio of safeguards
  • 6.
    Legal gateways toaccess • Government Departments, Local Authorities, agencies, public bodies face different legal restrictions on the nature of the access they can provide for research • In some cases, specific legal gateways have been drafted to facilitate data sharing • This can have an impact on the feasibility of some research projects and data linking between various data sources e.g. health and social data is often hard to link
  • 7.
    • Digital EconomyAct (DEA) • Data Protection Act /General Data Protection Regulation (GDPR) • Statistics and Registration Service Act (SRSA) Gateways:  Person-specific (‘fit and proper’)  Project-specific  Time-specific  Dataset-specific • Disclosing the identity or identifying information to someone who should not have access is a breach of the law. Civil as well as criminal sanctions for breaches Legislation and legal gateways
  • 8.
    Digital Economy Act,2017 • The DEA covers: Access to digital services, Digital infrastructure. Online pornography, Intellectual Property and Digital government • Has a useful Research strand that broadly enables de-identified information held by a public authority to be disclosed for the purposes of research in the public good • The research framework is underpinned by the Research Code of Practice and Accreditation Criteria, approved by the UK Parliament in July 2018 • The UK Statistics Authority is the statutory accrediting body • Research Accreditation Panel to oversee the independent accreditation of processors, researchers and research projects Digital Economy Act 2017
  • 9.
    Digital Economy Actdata example
  • 10.
    What makes thiscontrolled data? • Low level of geography • Date of birth, including day • Education and training: detailed training and qualifications • Household and family characteristics: detail on family and extended family units • Employment: industry code of main job, employment details • Unemployment and job hunting: details • Temporary leave from work • Accidents at work and work-related health problems • Nationality, national identity and country of birth • Occurrence of learning difficulty or disability • Benefits, including detail on type of benefits claimed
  • 11.
    Controlled data Data access/availability • 5 SAFES framework (ONS) • Access only via an accredited (safe haven) • High access bar • Process for reproduction not set up by journals
  • 12.
    5 Safes frameworkfor Safe Data Access 12 SAFE PROJECTS + SAFE PEOPLE + SAFE SETTING + SAFE DATA + SAFE OUTPUTS = SAFE USE Is this an appropriate use of the data? How trustworthy are the researchers? Does the environment prevent misuse? Is the data detail appropriate? Is there any confidentiality risk from publication?
  • 13.
    Balancing the FiveSafes: Example Adapted by the Australian Bureau of Statistics from ‘Five Safes: designing data access for research’, Desai, T, Ritchie, F and Welpton, R, 2016 Safe Unsafe 13
  • 14.
    DEA Processors Accreditation process= Information Security + Capability https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research- statistics/better-access-to-data-for-research-information-for- processors/
  • 15.
    DEA Accredited Researchers Accreditationprocess = Safe Researcher Training + Test + Application https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-useofdata-for-research-information-for-researchers/list-of- accredited-researchers-and-research-projects-under-the-research-strand-of-the-digital-economy-act/ Understanding of disclosure risk + producing Safe Outputs Good attitudes and behaviour = Trust
  • 18.
  • 19.
    Ethics component • UniversityREC submission for projects undertaking secondary analysis of data often not always complete enough to pass ethics test; often focussed on data collection • Clear guidance designed by the National Statistician's Data Ethics Advisory Committee's (NSDEC) to support researchers and statisticians to complete an ethical self- assessment form
  • 20.
    Ethics Self-Assessment • UKStatistics Authority Ethics Self-Assessment Tool • Easy-to-use framework to review the ethics of a project • Helps identify & mitigate any ethical issues • 6 main principles  Public good, potential harm  Identification, data security  Training, technologies  Legal gateways, frameworks  Public view, engagement  Access, use & sharing of data
  • 21.
    Weightings for risk ‘Ethicalrisk’ defined as the ‘negative consequences of unethical actions’. Differential complexities of various ethical decisions assessed using weighted measures • Data linkage projects • Sensitive personal data and processing • Patient level health data • Research including children and vulnerable adults • Data sources
  • 22.
    The Ethics Self-Assessmentform https://uksa.statisticsauthority.gov.uk/about-the-authority/committees/national-statisticians-data- ethics-advisory-committee/ethics-self-assessment-tool/ • For each item – provide a score • Enter the score in the spreadsheet • The spreadsheet calculates an overall score of risk level
  • 23.
    Reproducibility factors insafe havens  Impact of the research outputs is a key requirement  Code can be taken out, but is reviewed for risk ✘ Hard to reproduce as data and code behind a big gate ✘ Reproducer needs Accredited Researcher status ✘ Requires mechanism for reproducer to be added to an accredited project ✘ Difficult provenance chain for ‘research-ready’ admin data (creation, cleaning and versioning) ✘ Journals not ready to enable this review process
  • 24.
    Using the 5Safes network  Strong 5 Safes framework for access to unsafe data  Very simple ethics assessment – easy to mark  Utilise Accredited Researchers to undertake reproducibility work  Better training on how to be reproducible  Advocate cleaning, value-added, final code available in safe haven to enable reproduction and reuse
  • 25.