- Research
- Open access
- Published:
Assessing the causal effects of type 2 diabetes and obesity-related traits on COVID-19 severity
Human Genomics volume 19, Article number: 43 (2025)
Abstract
Background
Type 2 diabetes (T2D) and obesity-related traits are highly comorbid with coronavirus disease 2019 (COVID-19), but their causal relationships with disease severity remain unclear. While recent Mendelian randomization (MR) studies suggest a causal link between obesity-related traits and COVID-19 severity, findings regarding T2D are inconsistent, particularly when adjusting for body mass index (BMI). This study aims to clarify these relationships.
Methods
We applied various MR methods to assess the causal effects of BMI-adjusted T2D (T2DadjBMI) and obesity-related traits (BMI, waist circumference, and waist-hip ratio) on COVID-19 severity. Genetic instruments were obtained from large-scale genome-wide association studies (GWAS), including 898K participants for T2D and 2M for COVID-19 severity. To address potential bias from sample overlap, we conducted large-scale simulations comparing MR results from overlapping and independent samples.
Results
Our MR analysis identified a significant causal relationship between T2DadjBMI and increased COVID-19 severity (OR = 1.057, 95% CI = 1.012–1.105). Obesity-related traits were also causally associated with COVID-19 severity. Simulations confirmed that MR results remained robust to sample overlap, demonstrating consistency between overlapping and independent datasets.
Conclusions
These findings highlight the causal role of T2D and obesity-related traits in COVID-19 severity, emphasizing the need for targeted prevention and management strategies for high-risk populations. The robustness of our MR analysis, even in the presence of sample overlap, strengthens the reliability of these causal inferences.
Introduction
Recent studies have demonstrated that various clinical risk factors contribute to the increased severity of coronavirus disease 2019 (COVID-19), a highly transmissible infectious disease [1,2,3,4,5]. Global evidence indicates that severe COVID-19 is strongly associated with pre-existing comorbidities such as diabetes, obesity, respiratory diseases, hypertension, and cardiovascular disease. Specifically, epidemiological evidence suggests that type 2 diabetes (T2D) and obesity-related traits such as body mass index (BMI), waist circumference (WC), and waist-hip ratio (WHR) are significantly linked to adverse COVID-19 outcomes [6, 7]. Individuals with a higher polygenic risk score (PRS) for BMI have been found to be more susceptible to severe COVID-19 [8, 9]. Additionally, genetic predisposition for T2D is significantly associated with the severity and mortality of COVID-19, with T2D PRS demonstrating utility in estimating the risk of severe infection [10,11,12].
While associations between obesity-related traits, T2D, and COVID-19 severity have been established, it remains unclear whether these factors are causally related to disease severity. Mendelian randomization (MR) has been widely used to explore causal relationships between comorbidities and health outcomes using genetic variants [13]. Recent MR studies have identified a causal link between obesity-related traits and COVID-19 severity [14, 15]. However, findings regarding T2D have been inconsistent—some studies have demonstrated a causal relationship between T2D and COVID-19 severity when using BMI-unadjusted data [16, 17], whereas others have not observed a causal relationship when BMI-adjusted data is considered [18, 19]. This uncertainty necessitates further investigation into whether the observed effects are driven solely by T2D or influenced by BMI.
In this study, we evaluated the causal effects of T2D and obesity-related traits on two COVID-19 outcomes—SARS-CoV-2 infection and COVID-19 severity—using various MR methods. First, we performed genome-wide association study (GWAS) analyses on SARS-CoV-2 infection and COVID-19 severity using data from the UK Biobank (updated until December 2022) and compared it with GWAS summary statistics from the Host Genetics Initiative (HGI, round 7, April 2022) [20, 21]. Second, we obtained GWAS summary statistics for T2D from the Diabetes Genetics Replication and Meta-analysis (DIAGRAM) consortium and for three obesity-related traits (BMI, WC, and WHR) from the Genetic Investigation of Anthropometric Traits (GIANT) consortium. We then applied multiple MR methods, including two-sample MR, two-stage least square MR, and non-linear MR, to assess potential causal relationships. Lastly, we conducted large-scale simulations to investigate the impact of sample overlap on our MR analyses. Given that overlapping GWAS samples can introduce biases, we explored whether our findings were affected by overfitting. Recent research suggests that two-sample MR approaches remain robust even in cases of 100% sample overlap within large biobanks, except when employing MR-Egger regression [22]. To better interpret our findings, we performed simulations closely mimicking real data and assessed whether sample overlap could result in overestimation of causal effects.
Methods
Study design
We systematically investigated the causal relationships between T2D, obesity-related traits, and the risk of COVID-19 using a two-sample MR approach. MR studies must satisfy three fundamental assumptions: (1) the Single Nucleotide Polymorphisms (SNPs) must be directly associated with the exposure (T2D and obesity-related traits in this study); (2) the SNPs should be independent of the outcome (COVID-19 traits in this study) and of any known or unknown confounders; and (3) the effect of the instrumental variables (IVs) on the outcome should be mediated exclusively through the exposure of interest. To prevent sample overlap, genetic data for T2D, obesity-related traits, and COVID-19 were obtained from separate GWAS. The study methods adhered to the STROBE-MR guidelines [23]. The overall study design is shown in Fig. 1, and the baseline characteristics of the UK Biobank data are presented in Table 1.
Overview of Study. We evaluated the causal effects of T2D and obesity-related traits on SARS-CoV-2 infection and COVID-19 severity using Mendelian randomization (MR) methods. GWAS data for COVID-19 outcomes were obtained from the UK Biobank and the HGI, while GWAS data for T2D, BMI, WC, and WHR were sourced from the DIAGRAM and GIANT consortia. We applied various MR methods, including two-sample MR, two-stage least squares MR, and non-linear MR. Additionally, we conducted large-scale simulations to evaluate the impact of sample overlap on MR analyses, demonstrating that overlapping samples do not result in the overestimation of causal effects
GWAS for SARS-CoV-2 infection COVID-19 severity
We utilized individual-level data from the UK Biobank and GWAS summary statistics for SARS-CoV-2 infection and COVID-19 severity from the HGI [20, 21]. Our analysis focused on 459,119 individuals of European ancestry from the UK biobank [24]. We filtered SNPs based on minor allele frequency (MAF) > 0.01 and imputation quality > 0.8, resulting in a final set of 9,572,559 SNPs. The COVID-19 data in the UK Biobank included 101,271 cases and 357,848 controls for SARS-CoV-2 infection and 7,478 cases and 451,641 controls for the COVID-19 severity (Table 2). For our analysis, we also used summary statistics for SARS-CoV-2 infection and hospitalized COVID-19 in individuals of European ancestry from the HGI [20, 21]. The SARS-CoV-2 infection data included 14,496,979 SNPs for 122,616 cases and 2,475,240 controls, and the hospitalized COVID-19 data included 12,469,432 SNPs for 24,274 cases and 2,061,529 controls (Table 2). Each study underwent filtering based on a MAF > 0.001 and an imputation quality > 0.6.
GWAS for T2D and obesity-related traits
We utilized summary statistics for T2D and obesity-related traits from the DIAGRAM and GIANT consortia, respectively. We accessed two sets of summary statistics for T2D from the DIAGRAM consortium: T2D adjusted for BMI and unadjusted T2D [25]. The T2D adjusted for BMI data, focused on participants of European ancestry, included 21,635,867 SNPs with a MAF > 0.0001 for 898,190 individuals, nearly half of whom overlapped with the UK Biobank, consisting of 74,124 cases and 824,006 controls (Table 2). The unadjusted T2D data included 21,508,699 SNPs with a MAF > 0.0001 for 455,313 individuals, with no overlap with the UK Biobank, comprising 55,005 cases and 400,308 controls (Table 2). Meanwhile, summary statistics for obesity-related traits, such as BMI, WC and WHR, were obtained from the GIANT consortium [26, 27]. The GIANT summary statistics initially covered approximately 2.4M SNPs, which we imputed using Summary Statistics Imputation (SSIMP) [28, 29] with data from the 1000 Genomes Project as the reference panel. GWAS Z-statistics were imputed within 1 Mb-wide regions. After filtering with MAF > 0.01 and imputation quality > 0.8, we obtained 8,275,606 SNPs for BMI (N = 339,224), 8,326,651 SNPs for WC (N = 224,459), and 8,327,771 SNPs for WHR (N = 224,459), with no sample overlap with the UK Biobank.
Heritability and genetic correlation
We estimated the heritability (\(h^{2}\)) and the genetic correlation (\(r_{g}\)) between T2D, obesity-related traits, SARS-CoV-2 infection and severe COVID-19 using Linkage disequilibrium score regression (LDSC) [30] with GWAS summary statistics. We utilized the 1000 Genomes project data (phase 3 version 5) as a reference panel and restricted our analysis to individuals of European ancestry, focusing on common and well-imputed HapMap3 SNPs.
Selection of genetic instruments
Significant SNPs for T2D (p-value < \(1 \times 10^{{{-}5}}\)), obesity-related traits, SARS-CoV-2 infection, and severe COVID-19 were initially selected as the IVs for MR analyses. To refine the set of significant SNPs, we performed an LD clumping procedure using PLINK (–clump-r2 0.1, –clump-kb 500) with LD based on the 1000 Genomes Project reference panel of the European population. We computed \(R^{2}\) of each SNP and calculated the F statistic. A threshold of F statistic greater than 10 indicated lower risk of weak instrument bias [31]. We removed pleiotropic effects could introduce bias and potentially lead to inflated false positives in MR analysis using the two methods: the heterogeneity in dependent instrument (HEIDI) outlier method in generalized summary-data-based mendelian randomization (GSMR) [32], and the MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO) [33]. We excluded pleiotropic SNPs from the analysis with heterogeneity criteria (p-value < 0.01) for HEIDI in GSMR. Cochran's Q statistics, and the MR-PRESSO global test were used to identify heterogeneity between causal estimates, and the MR-PRESSO outlier test (p-value < 0.05) was employed to remove the pleiotropy from the genetic instruments.
Two-sample MR analysis
We performed a two-sample MR analysis to investigate a potential causal relationship between T2D, obesity-related traits and COVID-19 severity. Bi-directional MR analysis was conducted in GSMR, and p-values were adjusted using the false discovery rate (FDR) method. We also performed additional sensitivity MR analyses using Inverse Variance Weighted (IVW) [34, 35], MR-Egger [36], weighted median estimator (WME) [37] and weighted mode based estimator (WMBE) [38] methods in TwoSampleMR [39, 40]. MR results were considered significant if the FDR-adjusted p-value from GSMR was less than 0.05, and if IVW, MR-Egger, WME, and WMBE produced consistent results.
For linear MR analyses, we employed two-stage least-squares (TSLS) regression in a two-sample setting with individual-level data using the ivreg package in R [41]. TSLS regression first estimates the parameters of the first-stage regression in one dataset for the exposure (e.g., T2D or obesity-related traits), then uses these estimates and genotype data from a second dataset to construct a single PRS for the risk factor. In the second-stage regression, the outcome and PRS of the risk factor are no longer correlated due to the removal of confounding [42]. PRS scores for the exposure were computed using the SNPs in the GSMR analysis and used as an IV in the MR analysis. These PRS scores were calculated by summing genotype data from the UK Biobank and multiplying them by the SNP effects from the DIAGRAM and GIANT consortia. The estimate from the TSLS method is equivalent to the estimate from the IVW method in finite samples when the correlations between the IVs are exactly zero [43]. TSLS regression also produces three diagnostic tests: the weak instrument test, the Wu-Hausman test, and the Sargan test. The weak instrument test is conducted using an F-test on the first-stage regression. The Wu-Hausman test checks whether there is a significant difference between ordinary least squares (OLS) and TSLS results. The Sargan test checks for overidentification, meaning there are more IVs than coefficients to estimate.
To investigate non-linear causal effects of T2D and obesity-related traits on SAS-CoV-2 and COVID-19, we performed a non-linear MR analysis using non-linear MR [44], SUMnlmr [45] and PolyMR [46]. To assess the shape of the association between exposure and outcome using a single PRS, the non-linear MR stratified the population into strata based on the risk factor distribution and estimated a causal effect, referred to as the localized average causal effect (LACE), in each stratum. The fractional polynomial method performed meta-regression on these LACE estimates. SUMnlmr implements non-linear MR methods using stratified summarized data, while PolyMR requires individual-level genotype and phenotype data and conveniently fits multiple SNPs as IVs, using phenotype data adjusted for covariates in the MR analysis.
Large-scale simulation
We obtained 400K individuals with 100 independent SNPs (\(p = 100\)) with a MAF of 0.05 or higher from the UK Biobank. The model for generating exposure and outcome data is as follows: \(x_{i} = \mathop \sum \limits_{j = 1}^{p} G_{ij} \gamma_{j} + u_{i} + \epsilon_{{x_{i} }} , y_{i} = \mathop \sum \limits_{j = 1}^{p} G_{ij} \alpha_{j} + \beta_{x} x_{i} + u_{i} + \epsilon_{{y_{i} }} , u_{i} \sim N\left( {0,1^{2} } \right),\) \(\epsilon_{{x_{i} }} \sim N\left( {0,1^{2} } \right), \epsilon_{{y_{i} }} \sim N\left( {0,1^{2} } \right)\). Here, \(x_{i}\), \(y_{i}\) represent the exposure and outcome values for \(i\) th sample, \(G_{ij}\) is the genotypic value (0, 1, 2) for \(j\) th SNP, \(\gamma_{j}\) is the IV strength, \(u_{i}\) is the confounding effect, \(\alpha_{j}\) is the pleiotropic effect, \(\beta_{x}\) is the causal effect, and \(\epsilon_{{x_{i} }}\), \(\epsilon_{{y_{i} }}\) are the error terms. We generated simulation data to mimic real datasets for T2D from the UK Biobank. The IV strength was set to 0.12 (\(\gamma_{j} = 0.12\)) and the confounding effect was set to 0.25 (\(u_{i} = 0.25\)). The causal effect was set to 0 (\(\beta_{x} = 0\)), and 0.1 (\(\beta_{x} = 0.1\)) to verify type I error in the null causal scenario. Simulations were conducted for two setups: In setup 1, the sample size of the exposure data is equal to that of the outcome data, and in setup 2, the sample size of the exposure data is smaller than that of the outcome data. The MR methods used for simulation study were TSLS, IVW, and GSMR. For setup 1, we set the sample sizes to 100K and 200K to examine the impact of varying sample sizes on MR analysis results. For setup 2, we fixed the sample size of the exposure data at 100K and varied the sample size of the outcome data, starting from 100K, then increasing to 200K and 300K. This allowed us to examine how differing sample sizes influence the results and robustness of MR analyses.
Results
GWAS analysis
We performed GWAS for SARS-CoV-2 infection and COVID-19 severity, as well as T2D, BMI, WC, and WHR using the UK Biobank with BOLT-LMM v2.4.1 [47] and SAIGE v1.1.9 software [48]. These statistics were adjusted for potential confounders such as age, age squared, gender, genotype principal components (PCs), assessment array, and genotyping array. For our analysis, we utilized summary statistics for SARS-CoV-2 infection and hospitalized COVID-19 from the HGI, and summary statistics for T2D, BMI, WC and WHR from the DIAGRAM and GIANT consortia. Details of the data sources are summarized in Table 2. The Manhattan plots for each dataset are shown in Additional file 1: Figs. S1 and S2.
We accessed two summary statistics for T2D: unadjusted T2D and T2D adjusted for BMI (T2DadjBMI) from the DIAGRAM consortium (Fig. 2A). Through the clumping procedure, we obtained 462 SNPs from the unadjusted T2D data and 654 SNPs from the T2DadjBMI data, with 274 overlapping SNPs (Fig. 2B). The 188 SNPs were significant for the unadjusted T2D data but not for the T2DadjBMI data. Therefore, it was necessary to determine whether these 188 SNPs were associated with T2D or BMI to exclude those associated with BMI only. We searched for SNPs among the 188 SNPs that were not previously reported as associated with T2D in the GWAS catalog (as of August 14, 2024), resulting in 106 SNPs. We removed these 106 SNPs from the 188 and then used the remaining 356 SNPs (188-106 SNPs + 274 common SNPs) for our MR analysis. These SNPs were regarded as T2D-associated SNPs with BMI effects removed (T2DrmBMI). Meanwhile, summary statistics for obesity-related traits such as BMI, WC, and WHR were obtained from the GIANT consortium, and a clumping procedure was performed. This procedure yielded 317 independent SNPs for BMI, 197 SNPs for WC, and 142 SNPs for WHR. The prediction R2 and AUCs of PRS for T2D, BMI, WC, and WHR, based on SNPs significantly associated with exposures but not with outcomes, are presented in Additional file 1: Table S1.
GWAS summary statistics for unadjusted T2D data and T2D adjusted for T2D (T2DadjBMI) data, and the plots of GSMR analyses for causal effects of T2DadjBMI and T2D with BMI effects removed (T2DrmBMI) on COVID-19 severity. a Scatter plot for -log10 p-value of unadjusted T2D and T2DadjBMI summary data. b Venn-diagram of SNPs for SNPs between unadjusted T2D and T2DadjBMI. The top plot represents the number of SNPs before clumping procedure and the bottom plot represents the number of SNPs after clumping procedure. c GSMR plot for T2DadjBMI (DIAGRAM)—COVID-19 severity (HGI). d GSMR plot for T2DrmBMI (DIAGRAM)—COVID-19 severity (HGI)
Heritability and Genetic correlation
We estimated heritability and genetic correlations between various phenotypes based on GWAS summary statistics. Additional file 1: Table S2 presents the estimated heritability in the diagonal elements and the estimated genetic correlations between pairs of traits in the non-diagonal elements. We observed that severe COVID-19 in the HGI dataset showed significant genetic correlations with T2DadjBMI (\(r_{g}\) = 0.16, p-value = \(7.6 \times 10^{ - 5}\)), BMI (\(r_{g}\) = 0.31, p-value = \(6.3 \times 10^{ - 11}\)), WC (\(r_{g}\) = 0.29, p-value = \(2.1 \times 10^{ - 8}\)), and WHR (\(r_{g}\) = 0.28, p-value = \(1.8 \times 10^{ - 7}\)). Additionally, the genetic correlations between SARS-CoV-2 infection in HGI dataset and T2DadjBMI (\(r_{g}\) = 0.10, p-value = 0.01), BMI (\(r_{g}\) = 0.24, p-value = \(2.6 \times 10^{ - 8}\)), and WC (\(r_{g}\) = 0.25, p-value = \(2.6 \times 10^{ - 8}\)), WHR (\(r_{g}\) = 0.24, p-value = \(4.5 \times 10^{ - 6}\)) were also significant. The genetic correlation of SARS-CoV-2 infection and COVID-19 in the UK biobank showed similar results to those obtained from the HGI dataset.
Causal relationship between T2D and COVID-19 severity
We performed two-sample MR analyses to estimate the causal effects of T2DadjBMI and T2DrmBMI in DIAGRAM on SARS-CoV-2 infection and COVID-19 severity in HGI. Due to the considerable sample overlap with T2DadjBMI, we conducted additional analyses with T2DrmBMI, which has no sample overlap, to provide further clarity. We employed GSMR with GWAS summary data and used IVW, MR-Egger, WME, WMBE, and TSLS to verify the GSMR results. The MR results from GSMR analyses are presented in Table 3, Fig. 2, and Additional file 1: Figs. S3 and S4. The MR results from IVW, MR-Egger, WME, WMBE, and TSLS are provided in Additional file 1: Tables S3 and S4, and Figs. S5 and S6. The GSMR analysis investigating the causal effects of T2DadjBMI and T2DrmBMI on SARS-CoV-2 infection in both the UK Biobank and HGI datasets did not show a significant association overall. However, the GSMR analysis investigating the relationship between T2DadjBMI and COVID-19 severity showed a significant causal association in the HGI dataset (OR = 1.057, 95% CI = 1.012–1.105), as shown in Table 3. Additionally, the GSMR analysis investigating the relationship between T2DrmBMI and COVID-19 severity also showed a significant causal association (OR = 1.177, 95% CI = 1.063–1.303) in Table 3. The sensitivity analysis results, including IVW, MR-Egger, WME, WMBE, and TSLS, showed consistent effect directions with the GSMR results, further supporting our findings. Notably, the IVW method showed significant causal associations of COVID-19 severity with T2DadjBMI (OR = 1.064, 95% CI = 1.013–1.117) and T2DrmBMI (OR = 1.183, 95% CI = 1.061–1.319), as shown in Additional file 1: Table S3. The results of the GSMR bi-directional analysis are provided in Additional file 1: Table S5. The analysis indicated that T2DrmBMI and COVID-19 severity have a causal relationship in both directions.
Causal relationship between obesity-related traits and COVID-19 severity
We also conducted MR analyses to investigate the causal effects of obesity-related traits (i.e., BMI, WC, and WHR) on SARS-CoV-2 infection and COVID-19 severity. The GSMR results are presented in Table 3, Fig. 3 and Additional file 1: Figs. S7–S9. Sensitivity analysis results from IVW, MR-Egger, WME, WMBE, and TSLS are provided in Additional file 1: Tables S3 and S4, as well as in Additional file 1: Figs. S10–S13. The MR analysis of obesity-related traits and their causal relationship with SARS-CoV-2 infection revealed differing results between the UK Biobank and HGI datasets. In the UK Biobank dataset, no significant causal relationship was found between these traits and SARS-CoV-2 infection. However, in the HGI dataset, a significant causal relationship was observed, as shown in Table 3 (BMI: OR = 1.155, 95% CI = 1.108–1.205; WC: OR = 1.144, 95% CI = 1.094–1.196; WHR: OR = 1.083, 95% CI = 1.03–1.139). This discrepancy may be attributed to the differences in SARS-CoV-2 infection prevalence between the two datasets: approximately 0.22 in UK Biobank and 0.047 in HGI. The substantial difference in prevalence likely influenced the MR results for causal association. In contrast, when examining the relationship between obesity-related traits and COVID-19 severity, both the UK Biobank and HGI datasets demonstrated a significant causal relationship. The GSMR method showed significant causal associations of COVID-19 severity with BMI (OR = 1.486, 95% CI = 1.358–1.625), WC (OR = 1.532, 95% CI = 1.391–1.686) and WHR (OR = 1.194, 95% CI = 1.071–1.331) in HGI dataset, as presented in Table 3. These findings suggest that obesity-related traits have a consistent and significant impact on COVID-19 severity. Additionally, sensitivity analyses supported the GSMR results, showing consistent effect directions across different MR methods, which further reinforces the reliability of our findings regarding the relationship between obesity-related traits and COVID-19 severity. The results of the GSMR bidirectional analysis, found in Additional file 1: Table S3, confirmed that both BMI and WC have a bidirectional causal relationship with COVID-19 severity in the HGI dataset.
Non-linear MR analysis was conducted to further investigate the causal effects of obesity-related traits on SARS-CoV-2 infection and COVID-19 severity. Significant non-linear causal associations between obesity-related traits and COVID-19 severity were observed using the non-linear MR and SUMnlmr methods, as presented in Additional file 1: Tables S6 and S7, and Figs. S14–S17. Although these significant non-linear causal associations were detected, we also note that the association between obesity-related traits and COVID-19 severity appeared largely linear, suggesting that a linear causal association is more likely. The results of the non-linear causal associations of obesity-related traits on SARS-CoV-2 infection and COVID-19 severity using the PolyMR method are shown in Additional file 1: Table S8 and Figs. S18 and S19.
Evaluating effects of overlapping sample in MR analyses
To evaluate the effect of sample overlap between HGI, DIAGRAM, and UK Biobank, we performed large-scale simulations using data that mimic real datasets in UK Biobank. We estimated the causal effect, pleiotropic effect, and confounding effect of T2D on COVID-19 severity based on the UK Biobank as follows. For our simulation, we considered 485 independent SNPs from our MR analysis to assess the causal effects of T2D on COVID-19 severity, with age, sex, BMI, and the top 4 PCs included as confounders. We computed the T2D PRS using those 485 SNPs. Given that the heritability of T2D and COVID-19 are 20% and 2.69%, respectively, we estimated the IV strength as 0.12 and the proportion of variance in COVID-19 severity explained by T2D PRS as 2.72%, and confounders as 5.59%. Subsequently, we estimated the causal effect as 0.1 and confounding effect as 0.25. These estimates were then used in the large-scale simulation.
For setup 1, the simulation results showed similar trends across the MR methods when comparing equal sample sizes of 100K and 200K for exposure and outcome data. Figure 4A shows the estimates when the causal effect is 0, while Fig. 4B shows the estimates when the causal effect is 0.1. Additional file 1: Table S9, Figs. S20A and S20B present the simulation results for cases where the sample size for exposure and outcome data is the same. In both scenarios, where the causal effect is 0 and 0.1, we observed that with a sample size of 100K, the estimates slightly increased when the samples overlapped. However, with a sample size of 200K, the estimates remained almost the same regardless of sample overlap. With a sample size of 100K, the Type 1 error slightly decreased when there was 100% sample overlap, but with a sample size of 200K, no noticeable change was observed due to sample overlap. This phenomenon, also observed in small-scale simulations, likely occurred because estimates tend to be smaller than the true value when there is no sample overlap and a confounding effect is present. When examining coverage in the presence of a causal effect, we found that coverage was higher with a sample size of 200K compared to 100K, indicating that larger sample sizes provide more accurate estimates of the true causal effect.
Results of the large-scale simulation evaluating the impact of sample overlap between HGI, DIAGRAM, and the UK Biobank on causal effect estimates. a Large-scale simulation results with equal sample sizes for exposure and outcome data with no causal effect. (\(\beta_{x}\) = 0). b Large-scale simulation results with equal sample sizes for exposure and outcome data with causal effect of 0.1 (\(\beta_{x}\) = 0.1). c Large-scale simulation results with differing sample sizes for exposure and outcome data with no causal effect. (\(\beta_{x}\) = 0). d Large-scale simulation results with differing sample sizes for exposure and outcome data with causal effect of 0.1 (\(\beta_{x}\) = 0.1)
For setup 2, we investigated how the results of MR analysis change based on the size of the overlapping samples when the sample sizes of the exposure and outcome differ. Figure 4C shows the estimates when the causal effect is 0, while Fig. 4D shows the estimates when the causal effect is 0.1. Additional file 1: Table S10 and Additional file 1: Figs. S20C and S20D present the simulation results for cases where the sample sizes for exposure and outcome data differ. We observed that the estimates became more similar to the true values for both scenarios, where the causal effect is 0 and 0.1, as the sample size for outcome data increased. Additionally, when the sample size was large, the estimates remained almost the same regardless of sample overlap. Regarding Type 1 error and coverage, we found that with larger sample sizes, the results were similar irrespective of the degree of overlap. This indicates that larger sample sizes provide more robust estimates, minimizing the impact of sample overlap on MR analysis results.
The results of the large-scale simulations for setups 1 and 2 demonstrate that when the sample size is sufficiently large in biobank-scale data, the performance of different methods is nearly identical. Furthermore, in scenarios with considerable sample sizes, the outcomes are nearly identical regardless of the presence or absence of sample overlap. This indicates that with a substantial sample size, the influence of sample overlap on MR analysis outcomes is minimal, and the methodologies perform consistently well regardless of sample overlap.
Sensitivity analyses
To evaluate the robustness of our findings to the choice of p-value threshold in LD clumping, we performed a sensitivity analysis using a genome-wide significance threshold of p-value < 5 × 10−8 instead of the primary threshold of p-value < 1 × 10−5, applying GSMR. As expected, the number of instruments decreased, and some associations lost nominal significance due to reduced power. However, the overall pattern of results, including effect directions and magnitudes, remained consistent across all exposures: T2DadjBMI, T2DrmBMI, BMI, WC, and WHR. These findings suggest that our main conclusions are not sensitive to the choice of instrument selection threshold. Our decision to use a more lenient threshold in the primary analysis was based on prior methodological studies [49,50,51], which argue that including more variants, even those below genome-wide significance, can enhance MR performance, particularly in two-sample MR settings where any weak instrument bias tends to attenuate estimates toward the null.
Discussion
In this study, we found evidence of causal effects of BMI-adjusted T2D and obesity-related traits (i.e., BMI, WC, WHR) on COVID-19 severity through real data analysis and examined how bias changes with sample overlap across various MR methods using large-scale simulations. Using MR analysis, we estimated the causal effects of T2D adjusted for BMI (T2DadjBMI), T2D with BMI effects removed (T2DrmBMI), BMI, WC, and WHR on SARS-CoV-2 infection and COVID-19 severity. For T2DadjBMI and T2DrmBMI, no significant causal relationship with SARS-CoV-2 infection was found, but significant causal relationships with COVID-19 severity were detected in the HGI dataset. For obesity-related traits such as BMI, WC, and WHR, significant causal relationships with both SARS-CoV-2 infection and COVID-19 severity were observed in the HGI dataset. Our large-scale simulations indicated that two-sample MR methods yielded similar results with 100% sample overlap compared to 0% sample overlap, given a sufficiently large sample size in biobank-scale GWAS. This suggests that these methods are robust to sample overlap, providing consistent estimates of causal effects in large-scale analyses. The findings from both simulation and real data analysis suggest that large sample sizes mitigate bias introduced by sample overlap, ensuring robustness and reliability. The consistency of results across multiple MR methods further strengthens the validity of our findings.
The results of our MR analysis, which demonstrate significant causal relationships between T2D and COVID-19 severity, as well as between obesity-related traits and COVID-19 severity, have important implications for the management of infectious diseases. The direct impact of T2D and obesity on COVID-19 severity suggests that individuals with these conditions may be more vulnerable to severe viral infections due to a combination of physiological and pathological mechanisms. These include impaired immune function, chronic low-grade inflammation, dysregulated glucose and lipid metabolism, and increased platelet aggregation, all of which can worsen infectious disease outcomes. For example, hyperglycemia in T2D can impair neutrophil and macrophage activity, while insulin resistance and adiposity promote a pro-inflammatory cytokine environment, disrupting both innate and adaptive immune responses. In addition, heightened platelet activity and endothelial dysfunction, common in T2D and obesity, may contribute to the hypercoagulable state observed in severe COVID-19 cases. These mechanistic insights highlight the need to identify high-risk populations and implement targeted management and prevention strategies, such as early treatment, vaccination prioritization, and additional protective measures for patients with T2D and obesity. From a research standpoint, our findings underscore the importance of further studies aimed at elucidating the specific inflammatory mediators, immune pathways, and metabolic alterations that link chronic metabolic conditions with infectious disease severity. Genetic studies such as MR can continue to play a crucial role in clarifying causal relationships between metabolic and infectious diseases, ultimately contributing to the development of more effective prevention and treatment strategies.
In addition to T2D and obesity-related traits, atherogenic dyslipidemia, defined by elevated triglyceride levels and reduced HDL cholesterol, has also emerged as a key metabolic factor associated with increased COVID-19 severity in hospitalized patients, as reported by Bellia et al [52]. Although this trait was not included in the present analysis, we acknowledge its potential role as both an independent risk factor and a confounder in the relationship between insulin resistance-related traits and COVID-19 outcomes. To maintain a clear analytical scope in this initial study, we focused on T2D and adiposity measures. However, future research will expand on these findings by applying MR to investigate the causal relationship between atherogenic dyslipidemia and COVID-19 severity, and to examine its potential role as a confounder in the causal pathway from T2D to COVID-19 outcomes. These additional analyses will help to further elucidate the complex interactions between lipid metabolism, insulin resistance, and disease severity in the context of SARS-CoV-2 infection.
This study has several limitations. First, we used HGI data collected before April 2022 and UK Biobank data collected until December 2022. However, the number of infections increased due to the Omicron variant after this collection period, so future research using more recent COVID-19 data is necessary. Second, when analyzing real data, we used LD clumping to select SNPs that met the criteria of \({\text{r}}^{2}\) < 0.1 and p-value < \(1 \times 10^{ - 5}\). Further investigation is needed to determine if these criteria are optimal. Third, through large-scale simulations, we found that as the sample size increases, overfitting does not occur, and bias due to sample overlap decreases. However, we did not provide a theoretical basis for this finding, indicating the need for future research. Lastly, while we performed the simulations using independent SNPs, SNPs in real data are often in LD; therefore, additional simulations using SNPs in LD are required.
In conclusion, based on these robust and consistent results, we interpret that there is a causal relationship between BMI-adjusted T2D and the severity of COVID-19. Through simulations, we found that with larger sample sizes, the impact of sample overlap decreases, resulting in minimal bias, which supports the interpretation of our MR analysis results. This highlights the importance of managing T2D to mitigate the severity of COVID-19 outcomes. Additionally, BMI, WC, and WHR are also significantly related to COVID-19 severity, emphasizing the need to address obesity-related traits in managing infectious disease risk and outcomes.
Data availability
No datasets were generated or analysed during the current study.
References
Popkin BM, et al. Individuals with obesity and COVID-19: a global perspective on the epidemiology and biological relationships. Obes Rev. 2020;21: e13128.
Barron E, et al. Associations of type 1 and type 2 diabetes with COVID-19-related mortality in England: a whole-population study. Lancet Diabetes Endocrinol. 2020;8:813–22.
Aveyard P, et al. Association between pre-existing respiratory disease and its treatment, and severe COVID-19: a population cohort study. Lancet Respir Med. 2021;9:909–23.
Pranata, R., Lim, M.A., Huang, I., Raharjo, S.B. & Lukito, A.A. Hypertension is associated with increased mortality and severity of disease in COVID-19 pneumonia: a systematic review, meta-analysis and meta-regression. Journal of the renin-angiotensin-aldosterone system: JRAAS 21(2020).
Aggarwal G, et al. Association of cardiovascular disease with coronavirus disease 2019 (COVID-19) severity: a meta-analysis. Curr Probl Cardiol. 2020;45: 100617.
Holman N, et al. Risk factors for COVID-19-related mortality in people with type 1 and type 2 diabetes in England: a population-based cohort study. Lancet Diabetes Endocrinol. 2020;8:823–33.
Cai Q, et al. Obesity and COVID-19 severity in a designated hospital in Shenzhen. China Diabetes care. 2020;43:1392–8.
Aung N, Khanji MY, Munroe PB, Petersen SE. Causal inference for genetic obesity, cardiometabolic profile and COVID-19 susceptibility: a Mendelian randomization study. Front Genet. 2020;11: 586308.
Zhu Z, et al. Association of obesity and its genetic predisposition with the risk of severe COVID-19: Analysis of population-based cohort data. Metabolism. 2020;112: 154345.
Lee A, et al. Type 2 diabetes and its genetic susceptibility are associated with increased severity and mortality of COVID-19 in UK Biobank. Communications Biology. 2024;7:122.
Chung W. Statistical models and computational tools for predicting complex traits and diseases. Genomics & Informatics. 2021;19: e36.
Chung W, et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat Commun. 2019;10:569.
Sanderson E, et al. Mendelian randomization. Nature Reviews Methods Primers. 2022;2:6.
Yoshiji S, et al. Causal associations between body fat accumulation and COVID-19 severity: A Mendelian randomization study. Front Endocrinol. 2022;13: 899625.
Freuer D, Linseisen J, Meisinger C. Impact of body composition on COVID-19 susceptibility and severity: A two-sample multivariable Mendelian randomization study. Metabolism. 2021;118: 154732.
Ni J, Qiu L-J, Yin K-J, Chen G-M, Pan H-F. Shared genetic architecture between type 2 diabetes and COVID-19 severity. J Endocrinol Invest. 2023;46:501–7.
Gao M, et al. Associations between body composition, fat distribution and metabolic consequences of excess adiposity with severe COVID-19 outcomes: observational study and Mendelian randomisation analysis. Int J Obes. 2022;46:943–50.
Cao H, Baranova A, Wei X, Wang C, Zhang F. Bidirectional causal associations between type 2 diabetes and COVID-19. J Med Virol. 2023;95: e28100.
Huang C, et al. Human serum metabolites as potential mediators from type 2 diabetes and obesity to COVID-19 severity and susceptibility: evidence from mendelian randomization study. Metabolites. 2022;12:598.
Initiative C-HG. The COVID-19 host genetics initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet. 2020;28:715–8.
2, W.g.W.g.l.P.G.A.A.S.J.K.M. et al. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Minelli C, et al. The use of two-sample methods for Mendelian randomization analyses on single large datasets. Int J Epidemiol. 2021;50:1651–9.
Skrivankova VW, et al. Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: the STROBE-MR statement. JAMA. 2021;326:1614–21.
Allen, N.E., Sudlow, C., Peakman, T., Collins, R. & biobank, U. UK biobank data: come and get it. Vol. 6 224ed4–224ed4 (American Association for the Advancement of Science, 2014).
Mahajan A, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50:1505–13.
Locke AE, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206.
Shungin D, et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–96.
Rüeger S, McDaid A, Kutalik Z. Evaluation and application of summary statistic imputation to discover new height-associated loci. PLoS Genet. 2018;14: e1007371.
Rüeger, S., McDaid, A. & Kutalik, Z. Improved imputation of summary statistics for realistic settings. bioRxiv, 203927 (2017).
Bulik-Sullivan BK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–5.
Palmer TM, et al. Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat Methods Med Res. 2012;21:223–42.
Zhu Z, et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun. 2018;9:1–12.
Verbanck M, Chen C-Y, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50:693–8.
Johnson, T. & Uk, S. Efficient calculation for multi-SNP genetic risk scores. in American Society of Human Genetics Annual Meeting Vol. 10 (San Francisco., 2012).
Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–65.
effect estimation and bias detection through Egger regression. Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments. Int J Epidemiol. 2015;44:512–25.
Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–14.
Hartwig, F.P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. International journal of epidemiology 46, 1985–1998 (2017).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. elife 7, e34408 (2018).
Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13: e1007081.
Sanderson E, Windmeijer F. A weak instrument F-test in linear IV models with multiple endogenous variables. Journal of econometrics. 2016;190:212–21.
Burgess S, Davies NM, Thompson SG. Bias due to participant overlap in two-sample Mendelian randomization. Genet Epidemiol. 2016;40:597–608.
Burgess S, Dudbridge F, Thompson SG. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med. 2016;35:1880–906.
Staley JR, Burgess S. Semiparametric methods for estimation of a nonlinear exposure-outcome relationship using instrumental variables with application to Mendelian randomization. Genet Epidemiol. 2017;41:341–52.
Mason, A.M. & Burgess, S. Software Application Profile: SUMnlmr, an R package that facilitates flexible and reproducible non-linear Mendelian randomization analyses. (Oxford University Press, 2022).
Sulc, J., Sjaarda, J. & Kutalik, Z. Polynomial Mendelian randomization reveals non-linear causal effects for obesity-related traits. Human Genetics and Genomics Advances 3(2022).
Loh P-R, et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47:284.
Zhou W, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018;50:1335–41.
Schmidt AF, et al. Genetic drug target validation using Mendelian randomisation. Nat Commun. 2020;11:3255.
Zhang R, Niu P-P, Li S, Li Y-S. Mendelian randomization analysis reveals causal effects of migraine and its subtypes on early-onset ischemic stroke risk. Sci Rep. 2024;14:31505.
Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9: e1003348.
Bellia A, et al. Atherogenic dyslipidemia on admission is associated with poorer outcome in people with and without diabetes hospitalized for COVID-19. Diabetes Care. 2021;44:2149–57.
Acknowledgements
We appreciate the individuals who participated in the study. This research has been conducted using the UK Biobank Resource (application numbers 45052, 58105, 77890). This research was supported by the Bio and Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (2021M3E5E3081425). This research was also supported by the NRF grant funded by the Korea government (2020R1C1C1A01012657) and Basic Science Research Program through NRF funded by the Ministry of Education (RS-2021-NR060140).
Author information
Authors and Affiliations
Contributions
W.C. and T.P. conceived and designed the experiments. J.S., G.K., S.P., A.L. and W.C. performed the experiments and analyzed the data. They were also responsible for generating the necessary data, materials, and analysis tools. J.S., G.K. and W.C primarily wrote the manuscript. J.S., G.K., L.L., T.P. and W.C. reviewed and revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Seo, J., Kim, G., Park, S. et al. Assessing the causal effects of type 2 diabetes and obesity-related traits on COVID-19 severity. Hum Genomics 19, 43 (2025). https://doi.org/10.1186/s40246-025-00747-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1186/s40246-025-00747-4



