It's great to see this collaborative work from Scottish Longitudinal Study (SLS), Research Data Scotland (Sophie McCall) and National Records of Scotland published in International Journal of Population Data Science (IJPDS) today. 🎉 The paper sets out a framework for producing low-fidelity synthetic data against four key concepts. The four checks identified each cover important criteria that should be met when data controllers are developing synthetic data. These checks cover both the minimum quality standard researchers require to make use of them in code development, and the disclosure risk involved in making this data available. These checks have been embedded into the synthetic data creation process at RDS, giving us a framework to meet what we consider key requirements for low-fidelity synthetic data. We hope the recommended four checks within this paper will form the standard for synthetic data generation in the future.
𝗢𝗽𝗲𝗻𝗶𝗻𝗴 𝘂𝗽 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝗼𝗳𝗳𝗶𝗰𝗶𝗮𝗹 𝗱𝗮𝘁𝗮 𝘄𝗶𝘁𝗵 𝗟𝗼𝘄 𝗙𝗶𝗱𝗲𝗹𝗶𝘁𝘆 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 (𝗟𝗙𝗦𝗗) In the latest study on Low Fidelity Synthetic Data, published in IJPDS, a new framework of four essential checks that balance the use of LFSD whilst protecting patient confidentiality is presented. Professor Gillian Raab of the Scottish Longitudinal Study (SLS), University of Edinburgh and co-authors Sophie McCall of RDS Metadata Catalogue and Liam Cavin from the National Records of Scotland NRS have developed the following four checks: - labelling – to make sure whoever sees the data knows that it is not the original - structure - so that it resembles the original - disclosure - no real or apparent disclosure of confidential information - documentation – to understand the original records and how the LFSD was created These four checks, already been embedded in the LFSD creation process at Research Data Scotland, provide a framework to meet key requirements. The study, ‘Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation’ provides assurance that LFSD that satisfies these checks can be made more widely available for research. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗴𝗼𝘃𝗲𝗿𝗻𝗺𝗲𝗻𝘁 𝗱𝗮𝘁𝗮 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗹𝗲 𝘀𝗮𝗳𝗲𝗹𝘆 Large numbers of confidential records about citizens are held by National Statistics Organizations and other bodies, such as the NHS. They are used to inform policy and enable vital health and other research to take place. Researchers can apply for access to these records, usually in the restricted settings known as Trusted Research Environments (TREs). In her recent report on UK Health Data Prof Cathie Sudlow argues that we are letting patients and their families down by the difficulties researchers meet in accessing these records. However, there are also legitimate concerns about data breaches that might undermine individual privacy and lead to a loss of reputation for the data custodian. 𝗜𝘀 𝗛𝗶𝗴𝗵 𝗙𝗶𝗱𝗲𝗹𝗶𝘁𝘆 𝗦𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 (𝗛𝗙𝗦𝗗) 𝘁𝗵𝗲 𝗮𝗻𝘀𝘄𝗲𝗿 𝘁𝗼 𝘁𝗵𝗲𝘀𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀? Synthetic datasets are created from original confidential data but contain no records that correspond to real individuals. HFSD is intended to reproduce the results that a researcher would get if the original data had been used. Examples are... 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗲 𝗥𝗲𝗮𝗱𝗶𝗻𝗴... https://lnkd.in/eC2-zc4Q 𝗖𝗹𝗶𝗰𝗸 𝗵𝗲𝗿𝗲 𝘁𝗼 𝗿𝗲𝗮𝗱 𝘁𝗵𝗲 𝗳𝘂𝗹𝗹 𝗮𝗿𝘁𝗶𝗰𝗹𝗲 https://lnkd.in/ebuBmSwA Gillian Raab, Professor Emirita, Edinburgh Napier University, part-time Research Fellow, Scottish Centre for Administrative Data (SCADR), University of Edinburgh Raab, G., McCall, S. and Cavin, L. (2025) “Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation”, International Journal of Population Data Science, 10(2). doi: 10.23889/ijpds.v10i2.2972.