Abstract
The diffusion model, a cutting-edge deep generative technique, is gaining traction in biomedical informatics, showcasing promising applications across various domains. This review presents an overview of the working principles, categories, and numerous applications of diffusion models in biomedical research. In medical imaging, these models, through frameworks like Denoising Diffusion Probabilistic Models (DDPMs) and Stochastic Differential Equations (SDE), offer advanced solutions for image generation, reconstruction, segmentation, and denoising. Notably, they’ve been employed in synthesizing 2D/3D medical images, MRI, and PET image reconstruction, and segmentation tasks such as labeled MRI generation. In the realm of structured Electronic Health Records (EHR) data, diffusion models excel in data synthesis, offering innovative approaches in the face of challenges like data privacy and data gaps. Furthermore, these models are proving pivotal in physiological signal domains, such as EEG and ECG, for signal generation and restoration amidst data loss and noise disruptions. Another significant application lies in the design and prediction of small molecules and protein structures. These models unveil profound insights into the vast molecular space, guiding endeavors in drug design, molecular docking, and antibody construction. Despite their potential, there are inherent limitations, emphasizing the need for further research, validation, interdisciplinary collaboration, and robust benchmarking to ensure practical reliability and efficiency. This review seeks to shed light on the profound capabilities and challenges of diffusion models in the rapidly evolving landscape of biomedical research.
Avoid common mistakes on your manuscript.
Introduction
In the ever-evolving landscape of deep learning, generative models [1,2,3] have emerged as a cornerstone for various applications, paving the way for innovative solutions to complex problems. Among these models, the denoising diffusion model (DDM) [3,4,5,6] stands out, having garnered significant attention in recent years due to its powerful image generation capabilities. Characterized by its forward and reverse diffusion stages, this model offers the ability to produce high-quality images, albeit with a known computational cost, while addressing limitations such as mode collapse in Generative Adversarial Networks (GANs) and poor sample quality in Variational Autoencoders (VAEs).
The realm of medical imaging, a critical subset of computer vision, has not remained untouched by the allure of diffusion models. As medical imaging continually seeks to enhance the clarity, accuracy, and utility of its outputs, the potential of diffusion models to revolutionize this field becomes increasingly evident [7]. From foundational theories to practical applications in medical image generation, reconstruction, segmentation, denoising, and more, diffusion models are reshaping how we perceive and utilize medical images.
Moreover, the application of diffusion models extends beyond imaging, with emerging attempts and explorations in medical informatics. Electronic health records (EHRs) [8], which store comprehensive health information about patients during their hospitalization, encompassing both structured and unstructured data, have garnered significant attention. Deep learning, in particular, has amplified the importance of EHRs in disease diagnosis, treatment, and monitoring [9], challenges related to data privacy, security, and accessibility persist [10]. In this context, diffusion models, with their generative capabilities, offer a promising approach for generating synthetic static data, dynamic data, or mixed-type data. This provides an alternative to the often-challenging GANs.
Furthermore, the bioinformatics world of small molecules and proteins, essential for myriad biological functions, has also witnessed the transformative potential of diffusion models. As the state-of-the-art generative model, its applications in the purposeful design and generation of these biomolecules are rapidly expanding, such as small molecules conformation prediction, drug design, protein design, protein structure generation, especially in the last two years.
This survey aims to provide an in-depth exploration of diffusion models and their burgeoning applications in biomedical informatics, offering readers a holistic understanding of their theoretical underpinnings and practical implications.
A brief introduction of diffusion model
The basic principle of diffusion model
Diffusion models, inspired by non-equilibrium thermodynamics, have emerged as a powerful tool in the realm of deep learning. They define a Markov chain of diffusion steps that gradually add random noise to data, and then learn to reverse this diffusion process, constructing the desired data samples from noise. Unlike VAEs or flow models, diffusion models are designed to learn using a fixed process, and their latent variables are highly dimensional (similar to the original dataset).
Forward process
The forward process is a foundational concept in diffusion models. It simulates the gradual corruption of original data by progressively adding Gaussian noise over a series of time steps (Fig. 1 (a)). This process can be thought of as the diffusion of an original data point into a noisier space, governed by a pre-defined variance schedule.
(a) illustrates the forward diffusion process, and (b) represents the reverse process [3]
Given a data point \({x_0}\) sampled from the true data distribution, the forward process incrementally perturbs this sample by adding a small amount of Gaussian noise at each time step. The magnitude of the noise added at each step is determined by a variance schedule, denoted as \({\beta _t}\). Formally, the forward process is described by Eq. 1.
where \({x_t}\) is the data at time step \(t\), \({\epsilon _t}\) is noise sampled from a standard normal distribution, \({\beta _t}\) is the variance schedule at time step \(t\).
Reverse process
While the forward process describes the corruption of data, the reverse process aims to restore or denoise the corrupted data (Fig. 1 (b)). The objective is to reverse the effects of the forward diffusion, enabling the generation of authentic data samples starting from a Gaussian noise input. This reverse process is crucial for sampling from the model, as it provides a mechanism to traverse from the noisy space back to the original data distribution.
The reverse process can be mathematically formulated as Eq. 2.
In the domain of diffusion models, the complex interplay between the forward and reverse processes is of primary importance. The forward process involves the incremental degradation of data, whereas the reverse process, typically driven by advanced neural networks, denoted as \({\epsilon _\theta }\left( {{x_t},t} \right)\), focuses on predicting the noise \({\epsilon _t}\) and other crucial parameters. This ensures that the denoised samples are faithful representations of the true data distribution. The denoising diffusion model, in essence, is a probabilistic architecture designed to learn a data distribution \(p\left( x \right)\) by progressively estimating a variable \(x\) from its normal distribution. This can be visualized as learning the reverse dynamics of a fixed Markov chain spanning a length \(T\). Such models can be conceptualized as a series of equally weighted autoencoders, \({\epsilon _\theta }\left( {{x_t},t} \right)\) for \(t = 1 \ldots T\), each trained to forecast a cleaner version of their respective inputs, \({x_t}\). Together, these processes provide a robust framework for tasks related to data generation and denoising, underscoring the versatility and efficacy of diffusion probabilistic models. This objective can be simplified as Eq. 3, with \(t\) uniformly sampled from \(\left\{ {1, \ldots ,T} \right\}\).
Comparison with other generative models
Generative models currently comprise three mainstream frameworks, including GANs [1], VAEs [2], and Diffusion Models. GANs are among the most popular generative models, excelling in generating high-quality, visually realistic samples. However, GANs are frequently plagued by issues of mode collapse and training instability. Mode collapse refers to GANs’ tendency to generate certain specific types of samples while neglecting other potential samples. Furthermore, the training process of GANs requires meticulous tuning.
VAEs provide principled probabilistic modeling, enabling the generation of diverse samples and offering a relatively stable training process. However, VAEs may face challenges in generating high-quality samples, especially when dealing with complex data distributions.
Diffusion Models combine the strengths of both GANs and VAEs. Unlike the adversarial training of GANs, which requires maintaining a dynamic balance between generators and discriminators, Diffusion Models learn the transformation process from random noise to clear data and directly capture the data distribution. This process inherently promotes diversity, serving as the primary mechanism to avoid mode collapse. By progressively approaching the true data distribution, Diffusion Models effectively cover the data’s latent space, better reflecting the diversity of the dataset and generating highly detailed samples in a fine-grained manner. Unlike GANs, they do not disproportionately generate a few specific samples. Additionally, Diffusion Models avoid the issues of blurriness or distortion commonly observed in samples generated by VAEs. However, their training process demands substantial data and computational resources, with the iterative sampling procedure leading to slow generation speeds. To address this, advancements such as Denoising Diffusion Implicit Models (DDIMs) [11] and Consistency Models [12] have enabled faster generation, reducing the process to as few as twenty steps or even a single step, while maintaining comparable quality. Furthermore, diffusion in latent space with compressed dimensions significantly lowers computational demands, accelerating both training and generation. Based on different approaches to model optimization and data generation, diffusion models can be broadly categorized into two main perspectives: the variational perspective and the score-based perspective.
The variational perspective encompasses models that utilize variational inference techniques to approximate the target distribution. This is typically achieved by minimizing the Kullback-Leibler (KL) divergence between the approximate distribution and the target distribution. An example of such models is DDPMs, which employs variational inference methods to estimate the parameters of the diffusion process. On the other hand, models within the score-based perspective rely on maximum likelihood estimation methods, using the score function of the log-likelihood of the data to estimate the parameters of the diffusion process. This category includes models like Noise Conditional Score Networks (NCSN) [5], which focuses on estimating the derivatives of the logarithmic density function of perturbed data distributions under different levels of noise, and Stochastic Differential Equations (SDE) [6], an extension that incorporates characteristics from both DDPM and NCSN.
Latent diffusion model
The Latent Diffusion Model (LDM) [13] represents a cutting-edge approach in the realm of generative modeling, capitalizing on the strengths of diffusion processes within a latent space. Unlike traditional models that operate directly in the data space, LDM focuses on a more compact and semantically rich latent space, which abstracts away many of the intricacies of the original data. This abstraction not only enhances the model’s efficiency but also emphasizes the most salient features of the data. Central to the LDM is its objective function, given by Eq. 4.
This equation encapsulates the model’s aim to minimize the discrepancy between the noise term \(\epsilon \) and the model’s prediction \({\epsilon _\theta }\) over the latent representations. In essence, the LDM offers a harmonious blend of diffusion principles and latent space modeling, paving the way for more efficient and semantically meaningful generative tasks.
Conditioning mechanisms
Similar to other generative paradigms [14], diffusion models are capable of modeling conditional distributions, exemplified by \(p\left( {z\mid y} \right)\). This is achieved through a conditional denoising autoencoder, denoted as \({\epsilon _\theta }\left( {{z_t},t,y} \right)\), which facilitates steering of the synthesis process via various inputs \(y\), ranging from textual descriptions to semantic maps to image-to-image translations. A growing interest is developing in harnessing the generative power of DM for image synthesis and extending their conditioning beyond class labels or basic variants. A notable advance in this direction is the integration of cross-attention mechanisms into the foundational U-Nets. The mechanism is adept at handling diverse input modalities, particularly in attention-centered models. A domain-specific encoder \({\tau _\theta }\) transforms \(y\) into an intermediate representation, which may span different modalities like language prompts. The representation is then seamlessly integrated into the U-Net layers through a cross-attention layer. The objective of the conditional LDM is captured succinctly by Eq. 5.
Application of diffusion model in medical image
As computer vision advances, the medical imaging field has also shown increasing interest in diffusion models. To aid researchers in staying abreast of the latest developments in utilizing diffusion models in the realm of medical imaging, this section provides a comprehensive overview of diffusion models in the discipline of medical imaging. We delve into two widely used diffusion modeling frameworks, namely denoising diffusion probability models (DDPMs) [3] and stochastic differential equations (SDEs) [6]. We highlight the broad applications of diffusion models in medical imaging, including image generation [15], image super-resolution [16], image inpainting [17], classification [18], image segmentation [19], anomaly detection [20], and other medically relevant tasks. We present a summarized list of models tailored for medical image processing (Refer to Table 1).
Medical image generation
Image generation is a primary objective of diffusion models, which have been widely employed in various domains, including the synthesis of 2D/3D medical images [33, 34], reconstruction of 3D cells from 2D cell images [35], etc. This section provides an overview of diffusion-based approaches for medical image generation. One method used in medicine to track anatomical changes over time and detect abnormalities and disease progression is 4D imaging. Although primarily obtained through MRI, this process is relatively time-consuming. Recently, Kim and Ye [36] proposed a diffusion deformable model (DDM) that obtains source and target images and generates intermediate frames along a continuous trajectory. The method consists of two main modules: the denoising diffusion probability model module and the deformation module. In the DDPM module, a latent code is constructed by learning from the source and target images, while the deformation module utilizes the obtained latent code and the source image to render deformed images.
During the training phase, a diffusion model based on Ho et al. [3] acquires source, target, and perturbed target images, outputting latent codes. The latent codes learned from the source image are inputted into the deformation module, which adopts the approach of Balakrishnan et al. [37] to create deformation fields. Subsequently, using the spatial transformer layer (STL) [38] and trilinear interpolation, the source volume is distorted with the deformation field to construct the deformed source image. Inference then commences with the diffusion module providing latent codes that contain spatial information from the source to the target. The deformation module is then used to generate deformed intermediate frames by scaling the latent codes with factors in the range of [0,1].
Packhäuser et al. [22] utilized a LDM to generate high-quality class-conditioned X-ray images of the chest and proposed a sampling strategy to preserve sensitive biomarker information privacy during the generation process. To evaluate the potential utility of the generated dataset, the images were assessed in a chest abnormality classification task, and the results demonstrated the superiority of the proposed method over GAN-based approaches. The generated results of the comparison between PGGAN and LDM are shown in Fig. 2.
Histopathology involves studying tissues and cells at the microscopic level to diagnose diseases and cancers [39]. However, histological images are rare for certain cancer subtypes, thereby increasing the importance of generative models to fill this gap. To address this, Moghadam et al. [34] conducted the first study on generating histopathology images using DDPM. Specifically, they employed genotype-guided DDPM to synthesize images containing various morphological and genomic information. To address the issue of data inconsistency and enforce the model to focus more on morphological patterns, they first inputted the images into a color normalization module to standardize the domains of all images. Additionally, they applied a morphology-level priority module [40], which assigned higher weight values to early-level losses to emphasize perceptual information and lower weight values to later-level losses, resulting in higher fidelity samples. Experimental results on The Cancer Genome Atlas (TCGA) dataset [41] showed that the proposed method outperformed GAN-based methods [42].
In the case of diffusion-based MRI synthesis models, single-modal approaches are typically employed. However, these models often have high memory requirements due to their reliance on the original image domain and are less practical for multi-modal synthesis purposes. To alleviate this issue, Jiang et al. [43] proposed the first diffusion-based multi-modal MRI synthesis model, named Conditional Latent Diffusion (CoLa-Diff). Specifically, they introduced an architecture aimed at reducing memory consumption by operating in the latent space. To address the potential issues of compression and noise in the latent space, they utilized a collaborative filtering approach inspired by the technique of collaborative filtering. Furthermore, to ensure the preservation of anatomical structures, they considered brain region masks as priors for density distribution to guide the diffusion process. Additionally, they implemented an automatic weight adaptation technique to effectively utilize multi-modal information. Their experiments demonstrated that the proposed method outperformed other state-of-the-art MRI synthesis methods, highlighting CoLa-Diff as an effective tool.
Medical image reconstruction
Medical image reconstruction plays a crucial role in medical imaging. Its primary objective is to produce high-quality medical images for clinical use while minimizing costs and patient risks [44, 45]. CT and MRI are the most popular imaging modalities in medicine. However, their physical characteristics limit their efficacy, directly impacting their performance and diminishing the desired outcomes. Achieving high-resolution and comprehensive results from subjects requires higher radiation doses and relatively longer rest times inside the scanner, which are only partially applicable due to health precautions and patient cooperation. Hence, faster acquisition speed is crucial in CT, positron emission tomography (PET), and MRI, for reducing examination time, improving access to imaging services, and decreasing waiting times, but more importantly, for generating accurate images, especially in dynamic studies that require rapid imaging sequences. As a result, their radiation exposure is reduced from standard doses or the imaging process is conducted with undersampling or sparse views [46,47,48,49]. To overcome these drawbacks, such as low signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), medical image reconstruction must tackle the aforementioned challenges and address this ill-posed inverse problem. This section provides an overview of diffusion-based paradigms for medical image reconstruction and enhancement.
MRI is a popular non-invasive imaging tool in medical diagnosis and treatment. However, due to its inherent physical characteristics, it is a time-consuming imaging process, and patient motion can lead to various artifacts in the images. Thus, to reduce scan time and accelerate the solution to the inverse problem from the spatial domain (or k-space) to the image level, various solutions have been proposed within the framework of supervised learning. However, these methods lack robustness to distributional shifts or drifts in their training/testing sets. Jalal et al. [50] presented the first study on compressed sensing in MRI reconstruction using a generative model, termed Compressed Sensing with Generative Modeling (CSGM). For this purpose, CSGM trained a score-based generative model [51] on MRI images and used it as prior information for inverting undersampled MRI reconstructions through a posterior sampling scheme based on Langevin dynamics [52]. Compared to end-to-end supervised learning paradigms, CSGM demonstrated superior performance on fastMRI [53] and Stanford MRI [54] datasets with metrics such as SSIM and PSNR.
Chung and Ye [25] proposed a fractional diffusion framework that addresses the inverse problem of accelerated MRI scan image reconstruction. In the first step, a single continuous-time fractional correlation function with noise removal is trained using only magnitude images. Then, during the reverse SDE process, Variance Expansion (VE)-SDE [6] is utilized to sample from the pre-trained score model distribution conditioned on measurements. At each step, the image is divided into real and imaginary components. Each part is input to a predictor and undergoes data consistency mapping for image reconstruction. The obtained images are segmented again, and a corrector and data consistency mapping are applied to compensate for errors during diffusion and enhance the reconstructed images. The results demonstrate that the proposed model outperforms previous state-of-the-art (SOTA) methods and can even reconstruct data that goes well beyond the high-fidelity training distribution, such as anatomical structures not seen during the training process. Furthermore, the proposed framework is highly effective for reconstructing images in the presence of multiple coils. To address the aforementioned issues, they propose two approaches: (1) parallel reconstruction of each coil image, and (2) considering the correlations between coil images by injecting dependencies among them during the reverse SDE. The final image is obtained by taking the sum of squares (SSoS) of each reconstructed coil image. Although both techniques have shown good results in terms of quality and practicality, they are time-consuming. Quantitative metrics in Table 2 also confirm the superiority.
Figure 3 shows the visualization results of the comparison between the fractional diffusion framework and TV, U-Net, DuDoRNet. From the visualization, it can be seen that the fractional diffusion framework achieved the best results in structure and contrast.
Reconstruction results of various anatomic structure/contrast. (a) TV, (b) U-Net, (c) DuDoRNet, (d) proposed method, (e) ground truth. 1st row: cardiac MRI, 2nd row: foot axial scan, 3rd row: lower extremity axial scan, 4th row: LT bone. Figure 3 adapted from [25] (Chung and Ye, arXiv:2209.14566, CC BY-NC-ND 4.0)
Medical image segmentation
Image segmentation is a vital task in the field of computer vision, investigating the simplification of image complexity by decomposing it into meaningful image segments [55]. Specifically, it facilitates medical analysis by providing valuable information about anatomically relevant regions. However, deep learning models typically require a substantial amount of annotated training data with diverse pixels to yield generalizable results [56]. Nevertheless, due to the limitations in time, cost, and expertise, the availability of images and labels for medical image segmentation is restricted [57]. As such, diffusion models have emerged as a promising approach in image segmentation research, synthesizing labeled data and eliminating the necessity of pixel annotations. brainSPADE [26] proposes a generative model for synthesizing labeled brain MRI images, which can be used to train segmentation models. brainSPADE consists of label generator and image generator sub-modules. The former is responsible for creating synthetic segmentation maps, while the latter generates images based on the generated labels. In the label generator, the input segmentation map is encoded using a spatial VAE encoder during training, constructing a latent space. Subsequently, the compressed latent codes undergo diffusion and denoising through LDM [13], producing an effective latent space where imperceptible details are disregarded and semantic information is emphasized. The spatial VAE decoder then constructs artificial segmentation maps from the latent space. In the image generator, Fernandez et al. [26] utilize SPADE, a VAE-GAN model, to construct a style latent space from arbitrary input styles and use it alongside the artificial segmentation maps for decoding output images. The performance is evaluated using nnU-Net [58]. The results demonstrate that when trained on synthetic data, the model achieves comparable results to training on real data, and their combination significantly improves the model’s outcomes. Kim et al. [27] propose a novel Diffusion Adversarial Representation Learning (DARL) model for self-supervised vessel segmentation, aiming at diagnosing vascular diseases. The proposed DARL model is shown in Fig. 4.
The proposed diffusion adversarial representation model for self-supervised vessel segmentation. In path (A), given a real noisy angiography image, the model estimates vessel segmentation masks. In path (B), given a real noisy background image and a vessel-like fractal mask, the model generates a synthetic angiography image. Figure 4 adapted from [27] (Kim et al., arXiv:2110.05243, CC BY-NC-ND 4.0)
The proposed DARL model comprises two primary modules: a diffusion module that learns the background image distribution and a generative module that employs a switchable SPADE algorithm to generate vessel segmentation masks or synthesized angiographic images. The entire model includes two paths. In path (A), real noisy angiography images are fed into the model to produce segmentation masks, with SPADE switch turned off. In path (B), real noisy background images are inputted into the model, SPADE becomes active, receives vessel-like fractal masks, and generates synthesized angiography images. Then, by feeding the generated synthesized angiography images back into path (A), a loop is formed, aiding in learning vessel information. Furthermore, during inference, path (A) is executed in one step, where the model generates masks solely by inputting noisy angiography images into the model. The results validate the proposed method’s generality, robustness, and superiority compared to state-of-the-art unsupervised/self-supervised learning methods. Bieder et al. [28] introduce PatchDDM, a memory-efficient patch-based diffusion model applicable to large 3D volumes, making it suitable for medical tasks. The authors evaluate PatchDDM on the tumor segmentation task in the BraTS2020 dataset, demonstrating its capability to generate meaningful 3D segmentations while requiring fewer computational resources than conventional diffusion models.
Medical image denoising
The primary challenge in medical imaging lies in acquiring images without compromising important information. During the acquisition and subsequent processing stages, images may suffer from noise or artifacts that can undermine their integrity [59, 60]. Noise interferes with image quality, particularly when imaging small objects with low contrast. Given the nature of generative models, diffusion models are well-suited for addressing various denoising challenges [61]. Hu et al. [29] employed unsupervised denoising of volumetric retinal data obtained from optical coherence tomography (OCT) using DDPM, referred to as DenoOCT DDPM. The practicality of OCT imaging is hindered by a limited spatial frequency bandwidth, resulting in speckle noise in the generated images. Speckle noise impedes diagnosis by ophthalmologists and significantly affects tissue visibility. Classical methods, such as averaging multiple b-scans at the same location, suffer from extreme drawbacks including prolonged acquisition time and registration artifacts. Due to the multiplicative nature of speckle noise, these methods enrich rather than reduce noise. Deep learning-based models have shown promising performance. However, their effectiveness relies on the availability of noise-free images, which is a rare and expensive process.
To address this, DenoOCT DDPM leverages the feasibility of modeling the noise pattern rather than the true data distribution using DDPM. They employ self-fusion [62] as a preprocessing step to provide a clear reference image for DDPM and train parameterized Markov chains. Their study demonstrates state-of-the-art results on the pseudo-modal fusion network (PMFN), which utilizes information from a single-frame noisy b-scan and a pseudo-modal created with the help of self-fusion, in terms of signal-to-noise ratio (SNR) metric.
Positron emission tomography (PET) is a non-invasive imaging tool that plays a crucial role in cancer screening and diagnosis. However, similar to OCT devices, PET suffers from low signal-to-noise ratio and resolution due to low photon counts from patients. Deep learning methods for PET image denoising have made progress, but excessive smoothing remains a notable drawback of CNN-based approaches. Therefore, conditional generative adversarial networks (CGANs) [63] alleviate the aforementioned limitation, yet still rely on the distribution of training and test sets. Gong et al. [30] proposed a DDPM-based PET denoising framework and incorporated auxiliary modality embedding as prior information in the DDPM formulation, namely PET DDPM. Gong et al. employed PET and MR imaging datasets using 18F-FDG and 18F-MK-6240 respectively. PET-DDM represents an interdisciplinary study investigating over-collaborative patterns by learning noise distributions from PET images. This intuition follows the generative paradigm outlined in the original paper [15], whereby a guiding classifier helps converge the learned distribution to the desired distribution. PET-DDM achieves state-of-the-art results in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), when compared to denoising networks based on U-Net [64]. Based on 20 test datasets, the ablation study in PET-DDM showed that DDPM-PETMR had the best performance and was more robust to the noise levels during testing, followed by DDPMMR-PETCon.
Applications of diffusion model in medical informatics
The accumulation of Electronic Health Record data has propelled the integration of deep learning into healthcare, encompassing areas such as disease diagnosis, prognosis, treatment recommendations, and vital sign monitoring. However, privacy concerns, security issues, and data gaps impose constraints on accessing high-quality EHR data, hindering the progress of intelligent healthcare. Synthetic data generation, often rooted in Generative Adversarial Networks (GANs), has offered a solution. However, it is widely recognized that GANs are notorious for their difficulty in training and susceptibility to mode collapse. In contrast, diffusion models have demonstrated greater stability in data generation, a fact well-established in the field of medical image generation [15].
Currently, the primary focus of diffusion models in the EHR domain lies in tasks such as structured data synthesis and the imputation of missing values. Structured data in EHRs typically includes static data (e.g., age, gender), dynamic data (e.g., biochemical indicators, vital signs, electroencephalogram [EEG], and electrocardiogram [ECG]), and mixed-type data. The common diffusion model, which operates in continuous spaces where the forward and reverse processes are characterized by Gaussian distributions, works well for continuous values. However, for categorical variables, using Gaussian noise may not be appropriate. After encoding data into a one-hot representation, a multinomial forward diffusion process, defined as a categorical distribution that corrupts the data with uniform noise over classes, might be more suitable [65]. Unlike image data, which is generally regular and consistent, EHR data presents unique challenges due to its combination of discrete and continuous variables, as well as irregular time series, significantly increasing the complexity of applying diffusion models.
For static data, patient information is typically processed into fixed-size vectors that serve as input for diffusion models. For example, the synthesis of EEG and ECG data can be performed directly using diffusion models. However, for mixed-type data, the Latent Diffusion Model (LDM) approach is often employed. In this method, the encoder of an autoencoder is first used to map the data into a latent space. A diffusion model is then applied within this latent space, and finally, the decoder restores the data to its original form (Fig. 5). This section delves into the adoption of Diffusion Models in these areas. We present a summarized list of models tailored for EHRs data generation (Refer to Table 3).
Static data generation
The synthesis of EHRs has become a focal point in medical data research, offering a pathway to enhance healthcare analytics while safeguarding patient privacy. In the synthesis of static data, the data is generally simply concatenate into a fixed-length vector, and the categorical data is processed as one-hot encoding. He et al. [66] introduced MedDiff for EHR synthesis. This approach leverages class-conditional sampling to maintain label information, enabling the creation of synthetic EHRs that align closely with the distribution of the respective labels. To expedite the synthesis process, Anderson Acceleration was integrated. The study demonstrated that MedDiff could produce superior synthetic EHRs, outpacing current state-of-the-art methods (MedGAN and CorGAN). A limitation to note, the model can only generate continuous values. In a separate endeavor, Yuan et al. [67] crafted EHRDIFF, a novel EHR synthesis approach that models both the forward and reverse processes of diffusion models using stochastic differential equations (SDE). When juxtaposed with GANs (MedGAN, CorGAN and EMR-WGAN), EHRDIFF showcases a pronounced edge in the fidelity of synthetic EHR data, adeptly mirroring the distribution of authentic EHR data. Nonetheless, there’s potential to enhance the model’s performance in areas like attribute inference risk and membership inference risk.
Although Kuo et al. [68] claimed to have broken new ground as the inaugural study to deploy diffusion models for synthesizing longitudinal EHRs, accommodating mixed-type variables—both numeric and categorical. In reality, they simply merged these variable types when handling them. Their findings highlighted that the generated datasets surpassed those derived from GANs in terms of realism. However, modeling numeric variables with extensive tails introduced potential biases. Furthermore, when training reinforcement learning (RL) agents on this synthetic data, agents guided by data simulated using diffusion probabilistic models (DPMs) made recommendations more closely aligned with those informed by real data compared to agents trained on GAN-simulated data. Ceritli et al. [70] employed TabDDPM [85] to craft realistic mixed-type tabular EHRs. This model, a variant of the DDPM, is versatile enough for a broad spectrum of tabular tasks. TabDDPM ingeniously marries Gaussian and Multinomial diffusion processes to generate continuous and categorical data, respectively. Empirical results revealed TabDDPM’s superiority over competing models (VAE, MedGAN and CorGAN) across most evaluation matrices, with privacy considerations being a notable exception. Based on the data from Birth Cohort Linked Birth – Infant Death Data in the USA between 1989 and 1991, we applied DDPM and CTABGAN+ to generate static data (Fig. 6).
Dynamic data generation
Diffusion models have made a series of advancements in handling and enhancing the quality of dynamic data, such as ECGs. They have addressed challenges such as incomplete data, noise interference, and the need for accurate imputation. Jiwoon & Cheolsoo [71] set out to rectify incomplete time-series medical data, which can arise from various issues, including communication errors and sensor device malfunctions. Their diffusion model-centric signal restoration successfully replenished electrocardiogram (ECG) signals, even with a 50% missing ratio. However, it’s noteworthy that the performance of DDPM was on par with linear interpolation, not surpassing it.
Li, Ditzler et al. [72] advanced a groundbreaking ECG noise removal and baseline wander correction technique, named DeScoD-ECG, which learns the conditional distribution using conditional score-based diffusion models. By integrating a multi-shot averaging strategy and a self-ensemble approach, they enhanced signal reconstruction quality and bolstered algorithmic stability, especially under severe noise conditions. This method, when benchmarked, displayed a superior performance profile, achieving closer approximations to genuine data distributions and demonstrating resilience under intense noise disruptions.
Alcaraz & Strodthoff [73] introduced SSSD, an innovative imputation model that blends conditional diffusion models with structured state space models, tailor-made for navigating the complexities of long-term dependencies in time series data. Within the SSSD framework, noise specific to the diffusion process is solely introduced to regions requiring imputation. Impressively, SSSD either matches or surpasses leading probabilistic imputation and forecasting methods across an array of datasets and various missingness contexts. Building on the foundation of SSSD, Alcaraz & Strodthoff [61] later developed SSSD-ECG, a technique to synthetically generate short 12-lead ECGs based on over 70 ECG specifications. Although this method outshone its GAN-based counterparts, it’s essential to note that SSSD-ECG samples couldn’t fully emulate real data during classifier training.
Adib et al. [74] homed in on generating Normal Sinus Beat ECG signals using an unconditional model. Their proposed pipeline utilizes 2D PDM to craft synthetic 1D ECG time series. After undergoing transformations using Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF), they fashioned a 3-channel RGB-like dataset. Subsequently, they used DDPM to produce 2D, 3-channel synthetic ECG data, which was reverted to 1D signals using GASF channel diagonals. Their findings highlight the Wasserstein GAN with Gradient Penalty model’s consistent supremacy over DDPM. Neifar et al. [75] presented DiffECG, a versatile framework anchored on DDPMs for ECG signal synthesis. This model encompasses heartbeat generation, partial signal completion, and comprehensive heartbeat forecasting. Notably, DiffECG’s synthetic ECG signals considerably augmented the performance of state-of-the-art ECG arrhythmia classification.
Shome et al. [77] introduced the Region-Disentangled Diffusion Model (RDDM) to decode the intricate temporal dynamics of ECG, aiming for precise Photoplethysmography-to-ECG translation. RDDM sidesteps the limitations associated with indiscriminate noise addition by targeting specific regions of interest (ROI) and employs a reverse process to disentangle ROI and non-ROI denoising. Boasting efficiency and computational prowess, RDDM swiftly generates high-fidelity ECG from PPG in a mere 10 diffusion steps, outclassing DDPM and earlier models, especially on quantitative measures and the CardioBench platform.
Li, Chen et al. [78] employed DDPM to generate synthetic sleep samples, vital for automatic sleep stage classification. This study was pivotal in addressing the classification of sleep stages challenged by varying data distributions, inaccessible historical data, and evolving label granularity. While DDPM showcased more accurate results than WGAN-GP, the generative techniques fell short when juxtaposed with genuine historical datasets.
Tosato et al. [79] crafted a methodology for generating synthetic electroencephalogram (EEG) data using DDPMs. In this initiative, EEGs were first translated into electrode-frequency distribution maps (EFDMs), squared by padding, inverted, and eventually converted into RGB images. Interestingly, classifiers trained solely on genuine data achieved over a 90% average accuracy when assessed against the synthetic data. Furthermore, hybrid classifiers, trained on a mix of real and synthetic data, consistently outstripped classifiers limited to genuine data.
Duan et al. [80] rolled out DS-DDPM, a novel conditional model tailored for domain separation in brain dynamics. This model envisions human artifact removal as a generative denoising process, adept at both generating and discerning longstanding domain variances tied to human and invariant brain signals. Despite DS-DDPMs commendable performance in correlation analysis and classification, it’s worth noting that the dataset encompassed EEG signals from only a limited number of subjects.
Mixed-type data generation
Generally, the synthesis of mixed data refers to the ability to process static data and time series data simultaneously. Ping et al. [69] developed TDSTF, an innovative model tailored for forecasting vital signs in the Intensive Care Unit (ICU), including metrics like Heart Rate, Systolic Blood Pressure, and Diastolic Blood Pressure. To adeptly manage sparse data, TDSTF transformed the sparse matrix into a triplet form and embedded a residual network founded on the Transformer architecture. The model exhibited a rapid and precise prediction capability for sparse time-series data in the ICU milieu, outclassing several benchmark models (MQ-RNN, DeepAR, DeepFactor and CSDI). A potential constraint, however, is the model’s limited input size tied to the Transformer, potentially impacting its efficiency.
Some authors [81,82,83,84] have adopted the LDM architecture for synthesizing mixed data. Naseer et al. [81] introduce ScoEHR, which combines an autoencoder with a continuous-time diffusion model. Real data is transformed into a low-dimensional space using the encoder of a pre-trained autoencoder. In the low dimensional space, a forward SDE diffuses the data, and the reverse SDE is learned and used to produce new synthetic data. This synthetic data is then transformed back into its original form using the autoencoder’s decoder to generate the final synthetic EHR data. A comparison of ScoEHR with state-of-the-art EHR generation models (MedGAN, MedWGAN, and MedBGAN) shows that ScoEHR outperforms these models while demonstrating a low risk of privacy disclosure.
He et al. [82] introduced FLEXGEN-HER to overcome the performance degradation of the diffusion model when generating missing patterns in heterogeneous tabular EHR data. Clinical EHR data is inherently complex and heterogeneous, some EHRs contain only static measurements, others only temporal measurements, and some a blend of both. To handle this, FLEXGEN-EHR first projects temporal and static features into a latent space using two separate encoders to obtain embeddings for the respective feature types. Then, a modified Gromov-Wasserstein-based manifold alignment algorithm, combined with available label information, is applied to align the two spaces. A LDM is trained on the fused representations, Finally, the latent space is decoded back into its original form uusing decoders. Experimental results demonstrate that compared with MedGAN, CorGAN, TabDDPM, HER-M-GAN and VAE, the FLEXGEN-EHR achieves higher fidelity and utility, and can be effectively employed in privacy-sensitive settings.
Tian [83] introduce TIMEDIFF for generating diverse and realistic synthetic EHR time series data using DDPM that uses a bidirectional recurrent neural network (BRNN) architecture for high-utility time series data generation. By combining multinomial and Gaussian diffusions, TIMEDIFF can simultaneously generate both real and discrete valued time series directly. Experimental results demonstrate that TIMEDIFF outperforms state-of-the-art methods (HER-M-GAN, GT-GAN, TimeGAN, RCGAN) by a big margin in terms of data utility. Additionally, TIMEDIFF requires less training effort compared to GANs, and can generate useful synthetic samples for ML model development while ensuring patient privacy.
Zhong [84] proposed EHRPD, a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. This approach addresses limitations in existing methods, which often replicate input visits, inadequately model temporal dependencies between visits, and fail to generate time-related information—a critical component in EHR data. EHRPD introduce a novel time-aware visit embedding module and a pioneering predictive DDPM. Additionally, it employs a predictive U-Net for further optimization. Experimental results show that compared to MEDGAN, EVA, TabDDPM, MedDiff, ScoEHR, HALO and PromptEHR, the EHRPD achieves superior utility and privacy preservation.
Application of diffusion model in small molecules and proteins
Both small molecules and proteins are essential biomolecules that perform a wide range of biological functions. The rational design and synthesis of these molecules play a vital role in areas such as drug discovery, molecular docking, and antibody engineering. In recent years, diffusion models, which represent a leading class of deep generative frameworks, have demonstrated strong capabilities in modeling complex distributions and generating diverse molecular structures under conditional guidance. The following sections present recent progress in small molecule conformation prediction, drug design, protein structure generation, and specialized tasks in protein modeling. Table 4 summarizes key representative models, architectural features, and available code implementations.
Small molecules conformation prediction
Diffusion models have made significant progress in learning and generating accurate 3D conformations of small molecules. ConfGF [86] introduced score matching based on interatomic distance gradients to generate molecular conformations via Langevin dynamics. Building on this, DGSM [87] incorporates long-range interactions between non-bonded atoms by dynamically constructing molecular graphs. GEODIFF [88] introduces a rotationally and translationally equivariant Markov kernel and optimizes the variational lower bound for training. EDM [89] jointly models atomic types and coordinates using an E(3) equivariant network, significantly outperforming prior methods. The torsional diffusion model [90] c reduces denoising steps by modeling torsional angles, achieving state-of-the-art results on the GEOM-DRUGS dataset.
Drug design
Diffusion models facilitate ligand generation within the vast chemical space (estimated between 1023 and 1060 pharmacologically relevant molecules). EEGSDE [91] uses time-dependent energy functions and equivariant noise prediction to guide molecular design for multiple properties. DiffLinker [92] enables fragment-based molecule construction with E(3)-equivariant neural networks. MiDi [99] incorporates formal charges and bond types into 3D molecular graphs using a mixed noise schedule, enhancing complex molecule generation. MDM [100] categorizes interatomic bonds by distance and introduces a VAE for diverse outputs. GEOLDM [101], a latent diffusion model, learns a point cloud-based latent space for efficient, chemically realistic molecule synthesis.
Protein structure generation
Diffusion models have emerged as powerful tools for generating protein structures with high accuracy and flexibility. ProteinSGM [93] adopts continuous-time SDEs and 6D coordinate representations to create variable-length protein structures. Chroma [94] uses constraints such as domain information and long-range graphs for efficient complex sampling. RFdiffusion [102] builds on RoseTTAFold and uses self-conditioning in reverse diffusion to generate accurate sequences. FrameDiff [103] removes the need for pretraining by applying a Brownian motion-based approach. A study [95] replicates folding by using internal backbone sequences to reconstruct complete proteins. EIGENFOLD [104] and related works [105] apply projection techniques and attention mechanisms for sequence-to-structure prediction. LatentDiff [106] compresses sequences into latent space using equivariant networks, while DiffSDS [107] reinterprets protein design as a language modeling task with masking.
Special tasks of protein design
Diffusion models also support advanced tasks such as motif-scaffolding, antibody design, and molecular docking. SMCDiff [96] stabilizes motifs via denoising of concatenated backbone structures. DiffAb [97] co-designs CDR sequences and antigen complexes, optimizing antibody structure. NOS [108], combined with LaMBO, generates high-quality sequences for antibody engineering. For molecular docking, DIFFDOCK [98] and DIFFDOCK-PP [109] model ligand-protein and protein-protein interactions in non-Euclidean space. PLANTAIN [110] uses L-BFGS optimization for pose refinement. DiffBP [111] contextualizes the target protein alongside molecular features, while another study [112] uses language models for ligand-protein structure distribution. DynamicBind [113] constructs an equivariant energy landscape for biologically relevant conformational transitions.
Recent advances in diffusion models have considerably expanded the possibilities in small molecule and protein modeling. While these tasks primarily support early-stage biomedical research, their implications in therapeutic development and precision medicine are profound. The development of rigorous benchmarks will help further assess their performance. Additionally, other applications such as scPhere and models for continuous hierarchical representation [114, 115] open promising directions in single-cell bioinformatics.
Conclusion
This article provides a concise overview of the working principles, classifications, and diverse applications of the diffusion model in biomedical informatics, spanning areas like medical digital imaging, structured electronic health records, physiological signal data generation, and small molecule structure prediction. In the domain of medical imaging, the diffusion model has shown immense promise. Through various frameworks such as denoising diffusion probabilistic models and stochastic differential equations, they offer cutting-edge solutions for tasks like image generation, reconstruction, segmentation, and denoising. Particularly in image generation, these models have been deployed to synthesize 2D/3D medical images and to reconstruct 3D cells from 2D cellular images. For image reconstruction, diffusion models have presented more efficient solutions for MRI and PET images. Moreover, in the segmentation domain, they’ve paved the way for generating labeled MRI images, presenting new techniques for vascular segmentation. In the realm of denoising, diffusion models have been utilized for diminishing noise and artifacts in retinal data and PET images. While this exposition has covered multiple imaging applications, due to space constraints, it offers only a representative selection, leaving out comprehensive coverage of other imaging applications, such as cross-modal image generation. Though these models demonstrate potential, the computational complexity of the diffusion models often results in higher computational overheads. Furthermore, the quality of images generated by these models might occasionally fall short of other generative models, and their iterative nature could hinder applications demanding real-time feedback.
In the field of medical informatics, diffusion models have shown significant potential when applied to EHRs. Facing challenges related to data privacy, security, and data gaps, these models present a feasible solution, particularly in static data, time series data and mix-type data generation, and imputing missing values. While diffusion models have demonstrated efficacy in producing realistic synthetic data, including discrete, continuous, and time-series formats, to tackle challenges in biomedical informatics like label sparsity and imbalanced classifications, they still possess intrinsic limitations. The authenticity and quality of the generated data remain points of contention, demanding verification against real datasets to ensure the reliability and applicability of synthetic datasets. Wei et al. [116] revealed a weak privacy guarantee in discrete denoising diffusion models, showing a surge in privacy leakage during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Therefore, to meet practical needs, other privacy-preserving technologies may have to be incorporated. As a probabilistic generation model, the diffusion model achieves conditional generation through known part of observations when implementing imputation of missing data. Considering the data and computing resource requirements of the diffusion model, it has good potential to generate high-quality synthetic data by pre-training on large public datasets (such as MIMIC and eICU) and then fine-tuning on private data. Unlike images, EHRs is highly complex, and it is difficult to have a framework that can cover all data types and variables, so how to unified EHRs data templates and representations, while maintaining the universality and scalability of the model is a huge challenge, and effective parameter fine-tuning strategies need to be further explored.
Diffusion models have also made significant strides in the design of small molecules and proteins. In the realm of small molecules, they’ve been utilized for molecular conformation prediction, drug design, and other specialized tasks, outpacing many conventional methodologies and reaching unprecedented levels of accuracy. In the protein sector, diffusion models not only offer innovative avenues for protein structure generation but have also been applied to unique tasks like antibody design and molecular docking. These models exhibit a keen understanding of the intricate interiors and interactions of complex biomacromolecules, opening new frontiers in biotechnology and drug discovery. However, the establishment of rigorous benchmark tests is essential to ensure the reliability of these models.
In summary, diffusion models are garnering increasing attention in the realm of biomedical research, albeit their applications are yet in the embryonic stages. These models exhibit promising potential in data generation, notwithstanding their inherent limitations. As we venture forth, securing the practical reliability and efficiency of these models is imperative. This necessitates interdisciplinary cooperation, stringent standardized benchmarking, and an unwavering commitment to technological advancements. We anticipate a proliferation of diffusion models in the ensuing years, propelling significant strides in various domains of biomedical informatics.
Data availability
No datasets were generated or analysed during the current study.
References
Goodfellow I, et al. Generative adversarial networks. Commun of the ACM. 2020;63(11):139–44.
Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. International conference on machine learning. 2014. PMLR.
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;33:6840–51.
Sohl-Dickstein J, et al. Deep unsupervised learning using nonequilibrium thermodynamics. International conference on machine learning. 2015. PMLR.
Song Y, Ermon S. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems. 2019;32.
Song Y, et al. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv: 2011.13456, 2020.
Kazerouni A, et al. Diffusion models in medical imaging: a comprehensive survey. Med Image Anal. 2023;102846.
Shickel B, et al. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed And Health Inf. 2017;22(5):1589–604.
Harutyunyan H, et al. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019;6(1):1–18.
Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inf. assoc. 2018;25(10):1419–28.
Song J, Meng C, Ermon S. Denoising diffusion implicit models. arXiv preprint arXiv: 2010.02502, 2020.
Song Y, et al. Consistency models. arXiv preprint arXiv: 2303.01469, 2023.
Rombach R, et al. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv: 1411. 1784, 2014.
Dhariwal P, Nichol A. Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst. 2021;34:8780–94.
Li H, et al. Srdiff: single image super-resolution with diffusion probabilistic models. Neurocomputing. 2022;479:47–59.
Lugmayr A, et al. Repaint: inpainting using denoising diffusion probabilistic models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Zimmermann RS, et al. Score-based generative classifiers. arXiv preprint arXiv: 2110.00473, 2021.
Amit T, et al. Segdiff: image segmentation with diffusion probabilistic models. arXiv preprint arXiv: 2112.00390, 2021.
Wyatt J, et al. Anoddpm: anomaly detection with denoising diffusion probabilistic models using simplex noise. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Kim B, JC. Ye diffusion deformable Model for 4D temporal medical image generation. 2022. arXiv: 2206.13295. https://doi.org/10.48550/arXiv.2206.13295.
Packhäuser K, et al. Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. 2022. p. arXiv: 2211.01323. https://doi.org/10.48550/arXiv.2211.01323.
Azadi Moghadam P, et al. A morphology focused diffusion probabilistic model for synthesis of histopathology images. 2022. arXiv: 2209.13167. https://doi.org/10.48550/arXiv.2209.13167.
Jiang L, et al. CoLa-diff: conditional latent diffusion model for multi-modal MRI synthesis. 2023. arXiv: 2303.14081. https://doi.org/10.48550/arXiv.2303.14081.
Chung H, Ye JC. Score-based diffusion models for accelerated MRI. arXiv preprint arXiv: 2110.05243, 2021.
Fernandez V, et al. Can segmentation models be trained with fully synthetically generated data? In: international workshop on simulation and synthesis in medical imaging. Springer; 2022.
Kim B, Oh Y, Ye JC. Diffusion adversarial representation learning for self-supervised vessel segmentation. arXiv preprint arXiv: 2209.14566, 2022.
Bieder F, et al. Memory-efficient 3D denoising diffusion models for medical image processing. Med Imag With Deep Learn. 2023.
Hu D, Tao YK, Oguz I. Unsupervised denoising of retinal OCT with diffusion probabilistic model. In: medical imaging, 2022: image processing. SPIE; 2022.
Gong K, et al. PET image denoising based on denoising diffusion probabilistic models. arXiv preprint arXiv: 2209.06167, 2022.
Huijben EM, Pluim JP, van Eijnatten MA. Denoising diffusion probabilistic models for addressing data limitations in chest X-ray classification. Inf Med unlocked. 2024;50:101575.
Kim HK, et al. A feasibility study on the adoption of a generative denoising diffusion model for the synthesis of fundus photographs using a small dataset. Discover Appl Sci. 2024;6(4):188.
Pinaya WH, et al. Brain imaging generation with latent diffusion models. In: MICCAI workshop on deep generative models. Springer; 2022.
Moghadam PA, et al. A morphology focused diffusion probabilistic model for synthesis of histopathology images. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.
Waibel DJ, et al. A diffusion model predicts 3d shapes from 2d microscopy images. 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). 2023. IEEE.
Kim B, Ye JC. Diffusion deformable model for 4D temporal medical image generation. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2022.
Balakrishnan G, et al. An unsupervised learning model for deformable medical image registration. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Jaderberg M, Simonyan K, Zisserman A. Spatial transformer networks. Adv Neural Inf Process Syst. 2015;28.
Di Ruberto C, et al. A feature learning framework for histology images classification. Emerging trends in applications and infrastructures for computational biology, bioinformatics, and systems biology: systems and applications. 2016;37–48.
Choi J, et al. Perception prioritized training of diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Grossman RL, et al. Toward a shared vision for cancer genomic data. NEJM evid. 2016;375(12):1109–12.
Karras T, et al. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv: 1710.10196, 2017.
Jiang L, et al. CoLa-diff: conditional latent diffusion model for multi-modal MRI synthesis. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2023.
Levac B, Jalal A, Tamir JI. Accelerated motion correction for MRI using score-based generative models. 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). 2023. IEEE.
Cao C, et al. SPIRiT-Diffusion: SPIRiT-driven score-based generative modeling for vessel wall imaging. arXiv preprint arXiv: 2212.11274, 2022.
Hyun CM, et al. Deep learning for undersampled MRI reconstruction. Phys Med Biol. 2018;63(13):135007.
Korkmaz Y, et al. Unsupervised MRI reconstruction via zero-shot learned adversarial transformers. IEEE Trans Med Imaging. 2022;41(7):1747–63.
Feng C-M, et al. Task transformer network for joint MRI reconstruction and super-resolution. Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27-October1, 2021, Proceedings, Part VI 24. Springer; 2021.
Korkmaz Y, et al. Deep MRI reconstruction with generative vision transformers. Machine Learning for Medical Image Reconstruction: 4th International Workshop, MLMIR 2021, Held in Conjunction with MICCAI 2021. Strasbourg, France: springer; 2021Proceedings 4, October 1, 2021.
Jalal A, et al. Robust compressed sensing mri with deep generative priors. Adv Neural Inf Process Syst. 2021;34:14938–54.
Song Y, Ermon S. Improved techniques for training score-based generative models. Adv Neural Inf Process Syst. 2020;33:12438–48.
Bakry D, Émery M. Diffusions hypercontractives. Séminaire de Probabilités XIX 1983/84:proceedings. Springer; 2006. 177–206.
Zbontar J, et al. fastMRI: an open dataset and benchmarks for accelerated MRI. arXiv preprint arXiv: 1811.08839, 2018.
University S. Stanford University. 2022.
Heidari M, et al. Hiformer: hierarchical multi-scale representations using transformers for medical image segmentation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.
Chen J, et al. Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv: 2102.04306, 2021.
Cao H, et al. Swin-unet: unet-like pure transformer for medical image segmentation. European conference on computer vision. Springer; 2022.
Isensee F, et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11.
Zhang Z, et al. TransCT: dual-path transformer for low dose computed tomography. Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, month September month 27-October 1, 2021, Proceedings, Part VI 24. Springer; 2021.
Luthra A, et al. Eformer: edge enhancement based transformer for medical image denoising. arXiv preprint arXiv: 2109.08044, 2021.
Gao Q, et al. CoreDiff: contextual error-modulated generalized diffusion model for low-dose CT denoising and generalization. arXiv preprint arXiv: 2304.01814, 2023.
Oguz I, et al. Self-fusion for OCT noise reduction. In: medical imaging, 2020: Image processing. SPIE; 2020.
Zhou L, et al. Supervised learning with cyclegan for low-dose FDG PET image denoising. Medical image analysis. 2020;65:101770.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October5-9, 2015, Proceedings, Part III 18. Springer; 2015.
Hoogeboom E, et al. Argmax flows and multinomial diffusion: learning categorical distributions. Adv Neural Inf Process Syst. 2021;34:12454–65.
He H, et al. MedDiff: generating electronic health records using accelerated denoising diffusion model. arXiv preprint arXiv: 2302.04355, 2023.
Yuan H, Zhou S, Yu S. Ehrdiff: exploring realistic EHR synthesis with diffusion models arXiv. arXiv; 2023.
Kuo NIH, Jorm L, Barbieri S. Synthetic health-related longitudinal data with mixed-type variables generated using diffusion models arXiv. arXiv; 2023.
Ping C, et al. TDSTF: transformer-based diffusion probabilistic model for sparse time series forecasting arXiv. arXiv; 2023.
Ceritli T, et al. Synthesizing mixed-type electronic health Records using diffusion models arXiv. arXiv; 2023.
Jiwoon L, Cheolsoo P. Restoration of time-series medical data with diffusion model. 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia). 2022.1–4.
Li H, et al. DeScoD-ECG: deep score-based diffusion model for ECG baseline wander and noise removal. IEEE journal of biomedical and health informatics. 2023.
Lopez Alcaraz JM, Strodthoff N. Diffusion-based time series imputation and forecasting with structured atate apace models. Trans On Mach Learn Res. 2023;1–36.
Adib E, et al. Synthetic ECG signal generation using probabilistic diffusion models. Vol. 11. Ieee Access; 2023. p. 75818–28.
Neifar N, et al. DiffECG: a generalized probabilistic diffusion model for ECG signals synthesis arXiv. arXiv; 2023.
Alcaraz JML, Strodthoff N. Diffusion-based conditional ECG generation with structured state space models. Comput Biol Med. 2023;163.
Shome D, Sarkar P, Etemad A. Region-disentangled diffusion model for high-fidelity PPG-to-ECG translation. arXiv preprint arXiv: 2308.13568, 2023.
Li H, et al. Sleep stage classification with learning from evolving datasets. 2023.
Tosato G, Dalbagno CM, Fumagalli F. EEG synthetic data generation using probabilistic diffusion models arXiv. arXiv; 2023.
Duan Y, et al. Domain-specific denoising diffusion probabilistic models for brain dynamics. arXiv preprint arXiv: 2305.04200, 2023.
Naseer AA, et al. ScoEHR: generating synthetic electronic health records using continuous-time diffusion models. Machine Learning for Healthcare Conference. 2023.PMLR.
Bagherzadeh-Khiabani F, et al. A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol. 2016;71:76–85.
Tian M, et al. Reliable generation of ehr time series via diffusion models. arXiv e-prints. 2023: p. arXiv: 2310.15290.
Zhong Y, et al. Synthesizing multimodal electronic health records via predictive diffusion models. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024.
Kotelnikov A, et al. Tabddpm: modelling tabular data with diffusion models. International Conference on Machine Learning. 2023. PMLR.
Shi C, et al. Learning gradient fields for molecular conformation generation. International conference on machine learning. PMLR; 2021.
Luo S, et al. Predicting molecular conformation via dynamic graph score matching. Adv Neural Inf Process Syst. 2021;34:19784–95.
Xu M, et al. Geodiff: a geometric diffusion model for molecular conformation generation. arXiv preprint arXiv: 2203.02923, 2022.
Hoogeboom E, et al. Equivariant diffusion for molecule generation in 3d. International conference on machine learning. 2022.PMLR.
Jing B, et al. Torsional diffusion for molecular conformer generation. Adv Neural Inf Process Syst. 2022;35:24240–53.
Bao F, et al. Equivariant energy-guided sde for inverse molecular design. arXiv preprint arXiv: 2209.15408, 2022.
Igashov I, et al. Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv: 2210.05274, 2022.
Lee JS, Kim J, Kim PM. ProteinSGM: score-based generative modeling for de novo protein design. bioRxiv, 2022;2022.07. 13.499967.
Ingraham J, et al. Illuminating protein space with a programmable generative model. BioRxiv. 2022;2022.12. 01.518682.
Wu KE, et al. Protein structure generation via folding diffusion. 2022.
Trippe BL, et al. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. arXiv preprint arXiv: 2206.04119, 2022.
Luo S, et al. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. Adv Neural Inf Process Syst. 2022;35:9754–67.
Corso G, et al. Diffdock: diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv: 2210.01776, 2022.
Vignac C, et al. Midi: mixed graph and 3d denoising diffusion for molecule generation. arXiv preprint arXiv: 2302.09048, 2023.
Huang L, et al. Mdm: molecular diffusion model for 3d molecule generation. Proceedings of the AAAI Conference on Artificial Intelligence. 2023.
Xu M, et al. Geometric latent diffusion models for 3d molecule generation. International Conference on Machine Learning. 2023. PMLR.
Watson JL, et al. De Novo design of protein structure and function with RFdiffusion. Nature. 2023;620(7976):1089–100.
Yim J, et al. SE (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv: 2302.02277, 2023.
Jing B, et al. EigenFold: generative protein structure prediction with diffusion models. arXiv preprint arXiv: 2304.02198, 2023.
Ni B, Kaplan DL, Buehler MJ. Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Chem; 2023.
Fu C, et al. A latent diffusion model for protein structure generation. arXiv preprint arXiv: 2305.04120, 2023.
Gao Z, Tan C, Li SZ. DiffSDS: a language diffusion model for protein backbone inpainting under geometric conditions and constraints. arXiv preprint arXiv: 2301.09642, 2023.
Gruver N, et al. Protein design with guided discrete diffusion. arXiv preprint arXiv: 2305.20009, 2023.
Ketata MA, et al. DiffDock-PP: rigid protein-protein docking with diffusion models. arXiv preprint arXiv: 2304.03889, 2023.
Brocidiacono M, et al. PLANTAIN: diffusion-inspired pose score minimization for fast and accurate molecular docking. ArXiv; 2023.
Lin H, et al. Diffbp: generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv: 2211.11214, 2022.
Nakata S, Mori Y, Tanaka S. End-to-end protein-ligand complex structure generation with diffusion-based generative models. BMC BioInf. 2023;24(1):1–18.
Lu W, et al. DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. 2023.
Ding J, Regev A. Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces. Nat Commun. 2021;12(1):2554.
Mathieu E, et al. Continuous hierarchical representations with poincaré variational auto-encoders. Adv Neural Inf Process Syst. 2019;32.
Wei R, et al. On the inherent privacy properties of discrete denoising diffusion models. arXiv preprint arXiv: 2310.15524, 2023.
Acknowledgements
Not applicable.
Funding
Luo, et al., are partially funded by the Sichuan Science and Technology Program (2019YFS0147), 1·3 ·5 project for disciplines of excellence, West China Hospital, Sichuan University (ZYJC18010, and 2019HXFH022). Wang and Zhou are partially funded by Carl V Vartian Professorship. The funding bodies played NO role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
Jiawei Luo, Liren Yang, Yan Liu, Changbao Hu, Grant Wang, Yan Yang, Tie-Lin Yang and Xiaobo Zhou came up the ideas and the outline of this review, wrote the original manuscript and proofread the whole article.
Corresponding authors
Ethics declarations
Ethics approval
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Luo, J., Yang, L., Liu, Y. et al. Review of diffusion models and its applications in biomedical informatics. BMC Med Inform Decis Mak 25, 390 (2025). https://doi.org/10.1186/s12911-025-03210-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1186/s12911-025-03210-5





