Skip to main content
Advertisement
  • Loading metrics

SMetABF: A rapid algorithm for Bayesian GWAS meta-analysis with a large number of studies included

Abstract

Bayesian methods are widely used in the GWAS meta-analysis. But the considerable consumption in both computing time and memory space poses great challenges for large-scale meta-analyses. In this research, we propose an algorithm named SMetABF to rapidly obtain the optimal ABF in the GWAS meta-analysis, where shotgun stochastic search (SSS) is introduced to improve the Bayesian GWAS meta-analysis framework, MetABF. Simulation studies confirm that SMetABF performs well in both speed and accuracy, compared to exhaustive methods and MCMC. SMetABF is applied to real GWAS datasets to find several essential loci related to Parkinson’s disease (PD) and the results support the underlying relationship between PD and other autoimmune disorders. Developed as an R package and a web tool, SMetABF will become a useful tool to integrate different studies and identify more variants associated with complex traits.

Author summary

MetABF is a Bayesian GWAS meta-analysis framework but the efficiency is restricted by the number of studies included. In this article, we propose SMetABF by introducing SSS, an improved edition of traditional MCMC, to speed the MetABF algorithm. We develop an R package and a web tool based on R Shiny to make SMetABF practical for biomedical research. Comparing with the exhaustive approach and MCMC, we validate the effectiveness of SSS in terms of speed and accuracy through simulations. We applied SMetABF to identify several important variants associated with Parkinson’s disease and other autoimmune diseases, and explore the relationship between them. We hope this method can benefit future GWAS meta-analyses, help to identify more risk variants associated with complex traits, and improve the prediction of diseases.

1 Introduction

Genome-wide association study (GWAS), a powerful tool to find out the associations between genetic variations and phenotypes, has received more and more attention in the field of statistical genetics and epidemiology [1]. Numerous variants, typically many common single nucleotide polymorphisms (SNPs), are identified linked with complex traits. However, since single variant’s genetic effect on polygenic traits is relatively small, large sample sizes are often required to increase the statistical power [2]. Besides, due to the population stratification and other unobserved confounders, the estimated effect sizes in different studies are divided or even contradictory [3]. Therefore, it has become an increasingly essential challenge to make sufficient use of summary statistics derived from a wide range of studies and to attain pooled statistics through meta-analysis [4], especially when the requirement of data security and privacy makes individual-level data increasingly difficult to obtain [5, 6].

Either the fixed-effect model (FEM) [7] or the random-effect model (REM) [8] is conventionally used to derive a pooled effect size, depending on the assumption on heterogeneity [9]. However, the p-value is dependent on the sample size and minor allele frequency (MAF) of the variant. Therefore, it is improper to use a single threshold [10]. Besides, the relationships between true effect sizes in different studies are hard to be considered in both FEM and REM [11]. On the contrary, it is easy to involve them into the model as a prior in the Bayesian framework. The Bayesian method is also prevalent for researchers for it is more intuitively explainable [12]. Recently, a promising method based on the Bayesian framework named MetABF has already been proposed [13]. With GWAS summary statistics, it could conveniently estimate the pooled associations between multiple traits and genetic variations in different associated models across studies. However, with the rapidly increasing data of GWAS, the method also confronts the challenge of exponential explosion in both time and space consumption. Since it requires traversing 2n subsets represented by n-dimensional vectors to compute the optimal ABF, the considerable time and memory consumption required makes the computation almost impossible as the number of studies n increases.

In this article, we propose SMetABF, a method based on the Markov chain Monte Carlo (MCMC) method and its extension named shotgun stochastic search (SSS) [14] to speed the process of subset selection. SSS is proved to be superior in speed, accuracy, and stability through simulation. Based on SSS, we introduce SMetABF to obtain the maximum ABF in a large-scale meta-analysis quickly. SMetABF is implemented as an R package and the code is available at https://github.com/sjl-sjtu/GWAS_meta.

2 Method

2.1 Asymptotic Bayes factor

Different from the traditional statistical framework based on the p-value for statistical inference, the Bayes factor (BF) is used in Bayesian statistical framework. BF is defined as the relative size of the likelihood to observe data under the null hypothesis (H0) or the alternative one (H1), where D stands for the data observed, β is the effect parameter we are interested in, γ is the parameter vector of confounders, and π(⋅) stands for the prior of β and γ. In general, BF > 1 means more inclined to accept H1, and on the contrary, 0 < BF < 1 means more inclined to accept H0. Since BF is difficult to calculate directly in many studies, an asymptotic Bayes factor (ABF) is proposed as an alternative [10]. If P(Y|β, γ) is replaced by the asymptotic distribution , and the marginal prior for β instead of the joint prior π(β, γ) is considered, the probability of obtaining the parameter β under a certain hypothesis could replace the probability of observing data Y, written as

For a study aimed to measure the association between several risk factors and specific outcomes, like GWAS, let be the estimated size of the association. It is assumed to obey the normal distribution where β is the true effect size of the variant, and represents the estimated standard error. The true effect size β is also assumed to follow a normal distribution where σ2 represents the prior variance of the true effect size. When σ = 0, the distribution of β degenerates to a point, which means β = 0. In other words, the genetic variant has no effect on the outcome. Under H0: σ = 0, ABF can be calculated as where f(x; m, s2) is the probability density of normal distribution N(m, s2) at x.

Each study included in the meta-analysis provides an estimated effect size, . When there are n studies in the meta-analysis, let be the estimated effect vector and follows a multivariate normal distribution , where β stands for the true effect vector, and represents the covariance matrix of the estimated standard errors, expressed as in which SEi is the standard error of in the i-th study, and ri,j is the correlation between the i-th and j-th studies. For each study, the prior effect size is σi, and the prior correlation coefficient between two studies is ρi,j, then the prior matrix Σ is

With the estimated effect vector and covariance matrix of the estimated standard errors, ABF for meta-analysis could be calculated as

Similarly, H0 is defined that Σ equals to the zero matrix.

2.2 Prior

The assumption on the heterogeneity among studies is critical in prior selection. Table 1 provides different models for prior under the assumption that σ and ρ remain the same for all studies included in the meta-analysis.

Since both the null model and the complete model can be regarded as special cases of the subset model, the subset model is adopted in the meta-analysis. But it also brings a tricky issue to determine the optimal subsets. It is preferred to get a higher ABF score in the meta-analysis, for it means that H0 is harder to be accepted and the probability of the type II error decreases. In other words, it increases the statistical power, which equals one minus the probability of the type II error.

2.3 Model selection

2.3.1 Subset-exhaustive.

The former study [13] performs model selection by traversing subsets to select the highest ABF, named the subset-exhaustive method (EXH). For a meta-analysis including n studies, there are in total 2n different subsets. It requires taking all these subsets as a prior one by one to calculate ABF, and then find the optimal one. The time consumed will explode exponentially as the number of studies included increases. At the same time, the memory required to store all subsets (2n n-dimensional 0-1 vectors) has also expanded dramatically. Therefore, it becomes quite essential to introduce a method to get higher ABF quickly.

2.3.2 MCMC.

A commonly used method to quickly find the best subset is the Markov chain Monte Carlo method (MCMC). Here MC3 algorithm [15, 16] is used to define the transition function and the Metropolis-Hastings algorithm is used for sampling. The whole process is carried out as Algorithm 1.

Algorithm 1 MCMC Pseudocode

Require: Ω: Universe of subsets.

Ensure: x: Sample with stable distribution.

x0 ← random sample from Ω

for t = 0…T do

  Update nbd(xt)                   ⊳ Neighborhood

     ⊳ Transition Probability

Generate y based on Π(xt)                   ⊳ Alternative Model

                ⊳ Discriminant Function

Generate u from uniform distribution U(0, 1)

  if u < h then

   xt+1 = y

  else

   xt+1 = xt

  end if

end for

  1. Randomly select a subset as the initial prior model x0, and calculate the ABF.
  2. For the current model xt, define the neighborhood as a set constituted of subsets formed by adding or deleting an element from the current subset, as well as the current model itself. The proposal distribution is defined by equalizing the sampling probability of all models in the neighborhood. In other words, the sampling probability of all models in the neighborhood remains equal, and the transition probability of all models outside the neighborhood is 0. Since each model has the same size of the neighborhood, the proposal distribution is symmetric.
  3. Generate the alternative prior model y according to the transition probability, and then calculate the ABF. The discriminant function is defined as , where ABF(x) represents the ABF value with subset x as the prior.
  4. Generate a random number u that follows the uniform distribution U(0, 1). If u < h, accept y as a new step of xt+1, and otherwise, xt+1 = xt.
  5. Repeat steps 2-4 until the maximum number of iterations or stable distribution is reached.

The first half of the entire iterative sequence is used for the warm-up and the second for the final sampling.

2.3.3 Shotgun stochastic search.

Here we introduce an extension of MCMC for variable selection named shotgun stochastic search (SSS) [14]. It can be used to fast detect the optimal ABF following the procedures as below (see Algorithm 2):

  1. Let Γ donate a set containing up to B optimal models. Randomly select the initial model x0, set Γ = {x0}, and calculate the score of the model S(x) = ABF(x).
  2. For the current model xt, define models that add or delete or replace an element from the current subset to constitute the sets Γ+, Γ, and Γ, respectively, and then define the neighborhood
    Then update Γ = Γ ∪ nbd(xt). If |Γ| > B, remove (|Γ| − B) models with the lowest scores.
  3. Sample x+, x, and x from Γ+, Γ, and Γ, with the score S(x) as sampling weight, respectively.
  4. Then take a sample from x+, x, and x, with the score S(x) as sampling weight, and let the sample be the new model xt+1.
  5. Repeat steps 2-4 until the maximum number of iterations is reached.

Algorithm 2 SSS Pseudocode

Require: Ω: Universe of subsets.

Ensure: x: Sample with stable distribution.

x0 ← random sample from Ω

 Γ = {x0}

S(x)←ABF(x)

for t = 0…T do

  Constitute Γ+, Γ, Γ

  nbd(xt) = Γ+ ∪ Γ ∪ Γ

  Update Γ = Γ ∪ nbd(xt)

  if |Γ| > B then

   Remove (|Γ| − B) models with lowest S.

  end if

  Sample x+ from Γ+, weight = S(x)

  Sample x from Γ, weight = S(x)

  Sample x from Γ, weight = S(x)

  Sample xt+1 from x+, x, x, weight = S(x)

  if xt+1 satisfies stable distribution then

   break

  end if

end for

The former study [13] has provided R code for EXH. Here R functions for meta-analysis by MCMC and SSS are constructed.

3 Simulation

3.1 The construction of simulated datasets

Several parameters are given to build the simulated datasets, including the incidence of the disease in the population (p), the frequency of the major allele of the studied variant (f, which equals to 1-MAF under the assumption that there are only two alleles in the SNP), the effect size (odds ratio, OR), and the sample size of both case and control groups (which is assumed to be the same, n). For example, suppose A is the risk allele while G is the non-risk allele. If the dominant model is applied, both AA and AG can be considered as equivalent risk genotypes while GG is non-risk. Suppose baseline effect is α, the increased effect on prevalence by risk genotype is θ, then where D represents the outcome (disease). Then we can get and

Then α and θ can be calculated. According to the Bayes Theorem, the probability of risk and non-risk genotypes in the case and control groups can be calculated.

And then, the simulated genotypes in both the case and control groups could be randomly generated under binomial distribution. The estimated can be calculated. The effect size is defined as β = ln OR, and similarly, . The standard error can be estimated from the contingency table, as .

Suppose there are N studies included in the meta-analysis. For each study, the true effect ORi obeys the normal distribution N(OR, SE2). The sample size ni in each study is generated as a random integer in a given range, and p and f remain the same for all studies. Through the process above, and of each study can be estimated.

3.2 Results

The ABFs calculated under different true ORs are shown in Fig 1. The overall trends of the ABF obtained by EXH, MCMC, and SSS remain consistent, corresponding to the p-value obtained by the traditional method. When the true OR approaches 1, the p-value increases, while the ABF value decreases to 0. However, when the sample size of the study included is small (Fig 1A), the change of p-value will be unstable if true OR is near to 1, which will affect the analysis. Besides, the ABF calculated by SSS almost coincides with the optimal ABF curve obtained by EXH, which shows the validity of SMetABF.

thumbnail
Fig 1. The comparison under various true ORs.

Curves representing ABF (EXH) and ABF (SSS) are nearly coincide. Curves representing p-value (FEM) and p-value (REM) are nearly coincide as well. The parameters are set as follows: p = 0.05, f = 0.8 (which equally means MAF = 0.2), the number of studies included (N) is set to be 20 (Fig 1A) and 25 (Fig 1B) respectively. For the i-th study, ORiN(OR, 0.01), the sample size ni is sampled from 100 to 2000 and 100 to 5000, respectively.

https://doi.org/10.1371/journal.pcbi.1009948.g001

Figs 2 and 3 show the performance of each algorithm in accuracy and speed under different priors and iterations, respectively. Since in SSS, ABF is calculated under all models in the neighborhood in one iteration, much more models will be calculated by the SSS with the same number of iterations. Therefore, the number of iterations of SSS is set to be 100, 200, 500, 1000 and 2000; while that of MCMC is set to be 1000, 5000, 10000 and 20000. To compare the averaged ABF and time consumed, the algorithm is repeated 100 times under each condition. The SSS algorithm can reach the maximum ABF in a short time with a small number of iterations. On the contrary, the MCMC algorithm can hardly find the maximum ABF in even longer time.

thumbnail
Fig 2. The comparison in accuracy and speed of the three algorithms under different priors.

Priors setting: corr 1 (correlated model, σ = 0.5, ρ = 0.7); corr 2 (correlated model, σ = 0.5, ρ = 0.3); corr 3 (correlated model, σ = 0.8, ρ = 0.7); fixed (fixed model, σ = 0.5); indep (independent model, σ = 0.5).

https://doi.org/10.1371/journal.pcbi.1009948.g002

thumbnail
Fig 3. The comparison in accuracy and speed of the three algorithms under different iterations.

https://doi.org/10.1371/journal.pcbi.1009948.g003

When repeating 100 times of MCMC (10000 and 20000 iterations) and SSS (500 and 1000 iterations), as shown in Fig 4, the ABF values obtained by MCMC are relatively small, while the results of SSS are relatively stable, very close to the maximum ABF.

4 Application

4.1 Meta-analysis on the variants related to PD and other autoimmune disorders

Here an application is performed to measure the risk variants associated with Parkinson’s disease (PD), a common chronic neurodegenerative disease among the elderly population. Its common clinical manifestations include tremors, slow movement, and disorders in balance and movement posture. PD has been reported to be associated with both genetic variations [17] and environmental factors like personal lifestyles such as smoking and drinking [18, 19], but the detailed mechanism remains unclear. Recent studies discuss the potential relationship between PD and autoimmune disorders [20]. To explore the underlying relationships, we conduct a GWAS meta-analysis across PD and three common autoimmune disorders: inflammatory bowel disease, multiple sclerosis, and systemic sclerosis.

Through the websites accommodating GWAS datasets, including DistiLD [21] (http://distild.jensenlab.org), Open Targets Genetics [22] (https://genetics.opentargets.org/) and GWAS Catalog [23] (https://www.ebi.ac.uk/gwas/), 59 studies in which the summary statistics ( and ) are provided or can be calculated are included in this application. Tables 2 and 3 show detailed information about the studies included. A pure meta-analysis across 29 studies on PD is conducted firstly, and then all 59 studies are analyzed jointly to obtain a mixed association pattern. The effects of over 10 million variants are assessed through parallel computing. The Manhattan plots for both the pure pattern and the mixed pattern are shown in Fig 5.

thumbnail
Table 3. The information of studies on other autoimmune disorders included in the application.

https://doi.org/10.1371/journal.pcbi.1009948.t003

thumbnail
Fig 5. Manhattan plots.

A. Results of pure meta-analysis, which includes 29 studies on PD. B. Results of mixed meta-analysis, which includes 59 studies on PD, inflammatory bowel disease (including its two subtypes: Crohn’s disease and ulcerative colitis), multiple sclerosis, and systemic sclerosis. lg ABF: log10ABF.

https://doi.org/10.1371/journal.pcbi.1009948.g005

We can find PD is highly associated with several loci located within gene SCNA on chromosomes 4. A peak also appears on chromosome 17, around gene MAPT, KANSL1, and NSF. When other autoimmune disorders are included in the meta-analysis, the peaks appear on chromosomes 1 and 6 in the mixed pattern. Some significant variants can also be found on chromosome 4. Table 4 shows several SNPs identified in the analysis. Detailed results are available at https://figshare.com/articles/dataset/Table_S_zip/19179179.

The results supports the previous reports on several essential loci related to PD, such as SCNA, MAPT and KANSL1 [17]. Additionally, some degree of underlying relationships between PD and other autoimmune disorders are revealed by comparing the mixed pattern to the pure pattern, for they have some shared risk variants. For example, variants on BTNL2 have high ABF in both the pure pattern (∼1017) and the mixed pattern (∼10127), and the subsets indicates that BTNL2 is associated with all four disorders. For the top 4 variants on BTNL2 in the mixed pattern, on average 66.7% studies on PD, 65.7% studies on inflammatory bowel disease, 100% studies on multiple sclerosis, and 62.5% studies on systemic sclerosis are included in the final subsets to calculate ABF. However, although variants on SCNA also have a high ABF value in both pure pattern and mixed pattern, we found most of the studies in the final subsets to calculate ABF are from those studies on PD and the ABF values remain similar (∼10135) in both patterns. In other words, SCNA has weaker associations with other autoimmune disorders. The same is also true for those variants on MAPT. These two loci may relate more to the diseases in nervous system instead of autoimmune disorders. On the contrary, variants on KANSL1, reported as a factor in the immune system [71], show associations with both PD and other autoimmune disorders. In breif, he results of the meta-analysis indicate the presence of potential biological pathways and functional interactions between PD and autoimmune disorders. Tools like GESLM can use the shared variants to further identify causal variants [72].

4.2 Software

We implement all the algorithms as an R package named GWASmeta. Besides, to help researchers to use SMetABF to quickly find key SNPs, we develop a web tool based on R Shiny as well. The requirements of the file uploaded can be found in the website. Multiple variants can be analyzed at once. This tool is accessible at https://sunjianle-sjtu.shinyapps.io/analycode.

5 Discussion

Meta-analysis has been widely conducted on GWAS data to discover essential loci associated with some complex genetic diseases during recent years [7375], satisfying the requirements of large sample size in GWAS. However, the traditional p-value method used in meta-analysis is facing increasing criticisms. For example, it is not proper to use a single threshold since p-value is dependent on the MAF and sample size [10]. Moreover, the sophisticated relationships between different studies are tricky to deal with in traditional methods. FEM relies on the assumption that all studies in the meta-analysis share a common true effect size. The true effect size is allowed to vary in different studies in REM, but the detail information is hard to be included in the model. And the test for heterogeneity to determine whether FEM or REM should be used is often regarded poor in power [11]. Instead, the structure among different studies can be easily integrated into the Bayesian model as a prior. The BF compares the relative size between P(H0|Y) and P(H1|Y), and therefore is a better alternative to the p-value. Based on the Bayesian framework, a useful statistical model named MetABF has been proposed, which could easily measure the associations between multiple phenotypes and variants at the same time using GWAS summary statistics but confronts challenges in computation.

In this article, we propose SMetABF, an improved tool to attain the optimal ABF in a large-scale meta-analysis efficiently. Through simulation, we confirm that SSS is superior to MCMC in terms of speed, accuracy, and stability. To a certain extent, our improvements effectively overcome the calculation problems due to the increase in the number of studies included. We performe an application to PD and other autoimmune disorders, illustrating the effectiveness of SMetABF. With more research conducted on various traits among a larger population and the increasing accumulations of GWAS summary statistics, the large-scale multi-phenotypic meta-analyses will be possible through SMetABF. Another possible application is to analyze the effect size across different variants in one study, where σi represents the prior variation of the i-th variant on the outcome, and ρi,j stands for the linkages between different variants. Furthermore, since many traits related to some complex diseases are correlated, it is necessary to consider the effect of multiple loci on the outcome across a large number of studies simultaneously [76]. In this case, the prior correlation matrix Σ will transform to a three-dimensional array, which will bring more challenges in computation.

The method still confronts many challenges. The choice of prior parameters is an example. Sensitive analysis reveals that different values of σ and ρ will affect the ABF values but seem not to change the relative effect size between different variants. Besides, the considerable size of human genome still brings challenges in computation.

The pooled statistics derived through meta-analysis can be further used for other post-GWAS analysis, for example, to identify causal genes through statistical fine-mapping [77] or to infer the causal relationships between traits by Mendelian randomization [78]. GWAS summary statistics from different studies can be conveniently integrated to a powerful pooled statistic by SMetABF. We believe the method will benefit to the integration of previous studies and help to reveal the genetic mechanisms of complex diseases.

References

  1. 1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22. pmid:28686856
  2. 2. Evangelou E, Ioannidis JP. Meta-analysis methods for genome-wide association studies and beyond. Nature Reviews Genetics. 2013;14(6):379–389. pmid:23657481
  3. 3. McClellan J, King MC. Genetic heterogeneity in human disease. Cell. 2010;141(2):210–217. pmid:20403315
  4. 4. Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics. 2017;18(2):117–127. pmid:27840428
  5. 5. Almadhoun N, Ayday E,Ulusoy Ö. Differential privacy under dependent tuples the case of genomic privacy. Bioinformatics. 2020;36(6):1696–1703. pmid:31702787
  6. 6. Mohammed Yakubu A, Chen YPP. Ensuring privacy and security of genomic data and functionalities. Briefings in Bioinformatics. 2019;21(2):511–526.
  7. 7. Pfeiffer R, Gail M, Pee D. On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs. Statistical Science. 2009;24:547–560.
  8. 8. Han B, Eskin E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. American Journal of Human Genetics. 2011;88 5:586–98. pmid:21565292
  9. 9. Evangelou E, Ioannidis J. Meta-analysis methods for genome-wide association studies and beyond. Nature Reviews Genetics. 2013;14:379–389. pmid:23657481
  10. 10. Wakefield J. Bayes factors for genome-wide association studies: comparison with P-values. Genetic Epidemiology. 2009;33(1):79–86. pmid:18642345
  11. 11. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods. 2010;1(2):97–111. pmid:26061376
  12. 12. Kruschke J, Liddell T. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review. 2018;25:178–206. pmid:28176294
  13. 13. Trochet H, Pirinen M, Band G, Jostins L, McVean G, Spencer CCA. Bayesian meta-analysis across genome-wide association studies of diverse phenotypes. Genetic Epidemiology. 2019;43(5):532–547. pmid:30920090
  14. 14. Hans C, Dobra A, West M. Shotgun Stochastic search for “Large p” regression. Journal of the American Statistical Association. 2007;102(478):507–516.
  15. 15. Raftery A, Madigan D, Hoeting J. Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association. 1997;92:179–191.
  16. 16. Lamnisos D, Griffin J, Steel M. Adaptive MC3 and Gibbs algorithms for Bayesian Model Averaging in Linear Regression Models. arXiv: Computation. 2013;.
  17. 17. Deng H, Wang P, Jankovic J. The genetics of Parkinson disease. Ageing research reviews. 2018;42:72–85. pmid:29288112
  18. 18. Kim R, Yoo D, Jung YJ, Han K, Lee JY. Sex differences in smoking, alcohol consumption, and risk of Parkinson’s disease: A nationwide cohort study. Parkinsonism & Related Disorders. 2020;71:60–65. pmid:31882374
  19. 19. Paul KC, Chuang YH, Shih IF, Keener A, Bordelon Y, Bronstein JM, et al. The association between lifestyle factors and Parkinson’s disease progression and mortality. Movement Disorders. 2019;34(1):58–66. pmid:30653734
  20. 20. McFarland NR, McFarland KN, Golde TE. Parkinson Disease and Autoimmune Disorders What Can We Learn From Genome-wide Pleiotropy? JAMA neurology. 2017;74(7):769–770. pmid:28586798
  21. 21. Pallejà A, Horn H, Eliasson S, Jensen L. DistiLD Database: diseases and traits in linkage disequilibrium blocks. Nucleic Acids Research. 2012;40:D1036–D1040. pmid:22058129
  22. 22. Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt EM, Hercules A, et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic acids research. 2021;49(D1):D1311–D1320. pmid:33045747
  23. 23. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic acids research. 2017;45(D1):D896–D901. pmid:27899670
  24. 24. Maraganore DM, De Andrade M, Lesnick TG, Strain KJ, Farrer MJ, Rocca WA, et al. High-resolution whole-genome association study of Parkinson disease. The American Journal of Human Genetics. 2005;77(5):685–693. pmid:16252231
  25. 25. Pankratz N, Wilk JB, Latourelle JC, DeStefano AL, Halter C, Pugh EW, et al. Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Human Genetics. 2009;124(6):593–605. pmid:18985386
  26. 26. Satake W, Nakabayashi Y, Mizuta I, Hirota Y, Ito C, Kubo M, et al. Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson’s disease. Nature Genetics. 2009;41(12):1303–1307. pmid:19915576
  27. 27. Simon-Sanchez J, Schulte C, Bras JM, Sharma M, Gibbs JR, Berg D, et al. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nature Genetics. 2009;41(12):1308–1312. pmid:19915575
  28. 28. Sutherland GT, Halliday GM, Silburn PA, Mastaglia FL, Rowe DB, Boyle RS, et al. Do polymorphisms in the familial Parkinsonism genes contribute to risk for sporadic Parkinson’s disease? Movement disorders: official journal of the Movement Disorder Society. 2009;24(6):833–838. pmid:19224617
  29. 29. Edwards TL, Scott WK, Almonte C, Burt A, Powell EH, Beecham GW, et al. Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Annals of Human Genetics. 2010;74(2):97–109. pmid:20070850
  30. 30. Hamza TH, Zabetian CP, Tenesa A, Laederach A, Montimurro J, Yearout D, et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nature Genetics. 2010;42(9):781–785. pmid:20711177
  31. 31. Saad M, Lesage S, Saint-Pierre A, Corvol JC, Zelenika D, Lambert JC, et al. Genome-wide association study confirms BST1 and suggests a locus on 12q24 as the risk loci for Parkinson’s disease in the European population. Human Molecular Genetics. 2011;20(3):615–627. pmid:21084426
  32. 32. Do CB, Tung JY, Dorfman E, Kiefer AK, Drabant EM, Francke U, et al. Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease. PLoS Genetics. 2011;7(6):e1002141. pmid:21738487
  33. 33. Liu X, Cheng R, Verbitsky M, Kisselev S, Browne A, Mejia-Sanatana H, et al. Genome-wide association study identifies candidate genes for Parkinson’s disease in an Ashkenazi Jewish population. BMC Medical Genetics. 2011;12(1):1–16.
  34. 34. International Parkinson Disease Genomics Consortium, Nalls MA, Plagnol V, Hernandez DG, Sharma M, Sheerin UM, et al. Imputation of sequence variants for identification of genetic risks for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet (London, England). 2011;377(9766):641–649. pmid:21292315
  35. 35. Spencer C, Plagnol V, Strange A, Gardner M, Paisán-Ruiz C, Band G, et al. Dissection of the genetics of Parkinson’s disease identifies an additional association 5’ of SNCA and multiple associated haplotypes at 17q21. Human Molecular Genetics. 2011;20:345–353. pmid:21044948
  36. 36. Simón-Sánchez J, Van Hilten JJ, Van De Warrenburg B, Post B, Berendse HW, Arepalli S, et al. Genome-wide association study confirms extant PD risk loci among the Dutch. European Journal of Human Genetics. 2011;19(6):655–661. pmid:21248740
  37. 37. Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, Schjeide BMM, et al. Comprehensive research synopsis and systematic meta-analyses in Parkinson’s disease genetics: The PDGene database. PLoS Genetics. 2012;8(3):e1002548. pmid:22438815
  38. 38. Nalls MA, Pankratz N, Lill CM, Do CB, Hernandez DG, Saad M, et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease. Nature genetics. 2014;46(9):989–993. pmid:25064009
  39. 39. Hill-Burns EM, Wissemann WT, Hamza TH, Factor SA, Zabetian CP, Payami H. Identification of a novel Parkinson’s disease locus via stratified genome-wide association study. BMC genomics. 2014;15:118. pmid:24511991
  40. 40. Foo JN, Tan LC, Irwan ID, Au WL, Low HQ, Prakash KM, et al. Genome-wide association study of Parkinson’s disease in East Asians. Human Molecular Genetics. 2017;26(1):226–232. pmid:28011712
  41. 41. Chang D, Nalls M, Hallgrímsdóttir I, Hunkapiller J, van der Brug MP, Cai F, et al. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nature Genetics. 2017;49:1511–1516. pmid:28892059
  42. 42. Bandres-Ciga S, Ahmed S, Sabir MS, Blauwendraat C, Adarmes-Gmez AD, Bernal-Bernal I, et al. The Genetic Architecture of Parkinson Disease in Spain: Characterizing Population-Specific Risk, Differential Haplotype Structures, and Providing Etiologic Insight. Movement disorders: official journal of the Movement Disorder Society. 2019;34(12):1851–1863. pmid:31660654
  43. 43. Blauwendraat C, Heilbron K, Vallerga CL, Bandres-Ciga S, Von Coelln R, Pihlstrøm L, et al. Parkinson’s disease age at onset genome-wide association study: defining heritability, genetic loci, and α-synuclein mechanisms. Movement Disorders. 2019;34(6):866–875. pmid:30957308
  44. 44. Nalls MA, Blauwendraat C, Vallerga CL, Heilbron K, Bandres-Ciga S, Chang D, et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. The Lancet Neurology. 2019;18(12):1091–1102. pmid:31701892
  45. 45. Blauwendraat C, Reed X, Krohn L, Heilbron K, Bandres-Ciga S, Tan M, et al. Genetic modifiers of risk and age at onset in GBA associated Parkinson’s disease and Lewy body dementia. Brain. 2020;143(1):234–248. pmid:31755958
  46. 46. Alfradique-Dunham I, Al-Ouran R, von Coelln R, Blauwendraat C, Hill E, Luo L, et al. Genome-wide association study Meta-analysis for Parkinson disease motor subtypes. Neurology Genetics. 2021;7(2). pmid:33987465
  47. 47. Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–634. pmid:34662886
  48. 48. Jiang L, Zheng Z, Fang H, Yang J. A generalized linear mixed model association tool for biobank-scale data. Nature genetics. 2021;53(11):1616–1621. pmid:34737426
  49. 49. Rodrigo LM, Nyholt DR. Imputation and Reanalysis of ExomeChip Data Identifies Novel, Conditional and Joint Genetic Effects on Parkinson’s Disease Risk. Genes. 2021;12(5):689. pmid:34064523
  50. 50. Smeland OB, Shadrin A, Bahrami S, Broce I, Tesli M, Frei O, et al. Genome-wide Association Analysis of Parkinson’s Disease and Schizophrenia Reveals Shared Genetic Architecture and Identifies Novel Risk Loci. Biological psychiatry. 2021;89(3):227–235. pmid:32201043
  51. 51. Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nature genetics. 2021;53(10):1415–1424. pmid:34594039
  52. 52. Anderson CA, Boucher G, Lees CW, Franke A, D’Amato M, Taylor KD, et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nature genetics. 2011;43(3):246–252. pmid:21297633
  53. 53. Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012;491(7422):119–124. pmid:23128233
  54. 54. Julià A, Domènech E, Chaparro M, García-Sánchez V, Gomollón F, Panés J, et al. A genome-wide association study identifies a novel locus at 6q22. 1 associated with ulcerative colitis. Human molecular genetics. 2014;23(25):6927–6934. pmid:25082827
  55. 55. Liu JZ, Van Sommeren S, Huang H, Ng SC, Alberts R, Takahashi A, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nature genetics. 2015;47(9):979–986. pmid:26192919
  56. 56. Ostrowski J, Paziewska A, Lazowska I, Ambrozkiewicz F, Goryca K, Kulecka M, et al. Genetic architecture differences between pediatric and adult-onset inflammatory bowel diseases in the Polish population. Scientific reports. 2016;6(1):1–10. pmid:28008999
  57. 57. Yang SK, Hong M, Oh H, Low HQ, Jung S, Ahn S, et al. Identification of loci at 1q21 and 16q23 that affect susceptibility to inflammatory bowel disease in Koreans. Gastroenterology. 2016;151(6):1096–1099. pmid:27569725
  58. 58. De Lange KM, Moutsianas L, Lee JC, Lamb CA, Luo Y, Kennedy NA, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nature genetics. 2017;49(2):256–261. pmid:28067908
  59. 59. Dönertaş HM, Fabian DK, Fuentealba M, Partridge L, Thornton JM. Common genetic associations between age-related diseases. Nature aging. 2021;1(4):400–412. pmid:33959723
  60. 60. Glanville KP, Coleman JR, O’Reilly PF, Galloway J, Lewis CM. Investigating pleiotropy between depression and autoimmune diseases using the UK Biobank. Biological psychiatry global open science. 2021;1(1):48–58. pmid:34278373
  61. 61. Wu Y, Murray GK, Byrne EM, Sidorenko J, Visscher PM, Wray NR. GWAS of peptic ulcer disease implicates Helicobacter pylori infection, other gastrointestinal disorders and depression. Nature communications. 2021;12(1):1–17. pmid:33608531
  62. 62. Consortium IMSG. Risk alleles for multiple sclerosis identified by a genomewide study. New England Journal of Medicine. 2007;357(9):851–862.
  63. 63. De Jager PL, Jia X, Wang J, De Bakker PI, Ottoboni L, Aggarwal NT, et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nature genetics. 2009;41(7):776–782. pmid:19525953
  64. 64. Patsopoulos Nikolaos A, Bayer Pharma MS Genetics Working Group, the Steering Committees of Studies Evaluating IFNβ-1b & a CCR1-Antagonist, Consortium A, GeneMSA, International Multiple Sclerosis Genetics Consortium, et al. Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Annals of neurology. 2011;70(6):897–912. pmid:22190364
  65. 65. Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L, et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011;476(7359):214. pmid:21833088
  66. 66. Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C, et al. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nature genetics. 2013;45(11):1353. pmid:24076602
  67. 67. Andlauer TF, Buck D, Antony G, Bayas A, Bechmann L, Berthele A, et al. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation. Science advances. 2016;2(6):e1501678. pmid:27386562
  68. 68. International Multiple Sclerosis Genetics Consortium, ANZgene, IIBDGC, WTCCC2, Patsopoulos N, Baranzini S, Santaniello A, Shoostari P, et al. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science. 2019;365 (6460).
  69. 69. Mayes MD, Bossini-Castillo L, Gorlova O, Martin JE, Zhou X, Chen WV, et al. Immunochip analysis identifies multiple susceptibility loci for systemic sclerosis. The American Journal of Human Genetics. 2014;94(1):47–61. pmid:24387989
  70. 70. López-Isac E, Acosta-Herrera M, Kerick M, Assassi S, Satpathy AT, Granja J, et al. GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways. Nature communications. 2019;10(1):1–14. pmid:31672989
  71. 71. Fejzo M, Chen H, Anderson L, McDermott M, Karlan B, Konecny G, et al. Analysis in epithelial ovarian cancer identifies KANSL1 as a biomarker and target gene for immune response and HDAC inhibition. Gynecologic Oncology. 2020;. pmid:33229045
  72. 72. Lyu R, Sun J, Xu D, Jiang Q, Wei C, Zhang Y. GESLM algorithm for detecting causal SNPs in GWAS with multiple phenotypes. Briefings in Bioinformatics. 2021;22(6):bbab276. pmid:34323927
  73. 73. Graff M, Scott RA, Justice AE, Young KL, Feitosa MF, Barata L, et al. Genome-wide physical activity interactions in adiposity-A meta-analysis of 200,452 adults. PLoS Genetics. 2017;13(4):e1006528. pmid:28448500
  74. 74. Day F, Karaderi T, Jones MR, Meun C, He C, Drong A, et al. Large-scale genome-wide meta-analysis of polycystic ovary syndrome suggests shared genetic architecture for different diagnosis criteria. PLoS Genetics. 2018;14(12):e1007813. pmid:30566500
  75. 75. Kalra G, Milon B, Casella AM, Herb BR, Humphries E, Song Y, et al. Biological insights from multi-omic analysis of 31 genomic risk loci for adult hearing difficulty. PLoS Genetics. 2020;16(9):e1009025. pmid:32986727
  76. 76. Ray D, Boehnke M. Methods for meta-nalysis of multiple traits using GWAS summary statistics. Genetic Epidemiology. 2018;42:134–145. pmid:29226385
  77. 77. Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021;1(1):1–21.
  78. 78. Pingault JB, O’reilly PF, Schoeler T, Ploubidis GB, Rijsdijk F, Dudbridge F. Using genetic data to strengthen causal inference in observational research. Nature Reviews Genetics. 2018;19(9):566–580. pmid:29872216