Simulation-based inference on virtual brain models of disorders

Meysam Hashemi; Abolfazl Ziaeemehr; Marmaduke M Woodman; Jan Fousek; Spase Petkoski; Viktor K Jirsa

doi:10.1088/2632-2153/ad6230

Abbreviations

VBM	virtual brain model
SBI	simulation-based inference
SC	structural connectivity
FC	functional connectivity
FCD	functional connectivity dynamic
BOLD	blood-oxygen-level-dependent
fMRI	functional magnetic resonance imaging
ANNs	artificial neural networks
NFs	normalizing flows
MAF	masked autoregressive flow
NSF	neural spline flow
MCMC	Markov chain Monte Carlo

1. Introduction

One class of computational network models commonly used to analyze functional neuroimaging modalities, such as fMRI, MEG, and EEG, is the class of connectome-based models (Ghosh et al 2008, Sanz-Leon et al 2015, Bassett and Sporns 2017), also known as VBMs (Sanz Leon et al 2013, Jirsa et al 2023). Each node in these models corresponds to a brain region (Bullmore and Sporns 2009, Sporns 2016). The collective behavior of these nodes over time is then described using a dynamical model of average neuronal activity, known as neural mass models (Deco et al 2008, 2011, Breakspear 2017). The connectome-based models incorporate individual structural brain imaging data (Melozzi et al 2019, Hashemi et al 2021), typically diffusion-weighted imaging data, to estimate the edge weights (Hagmann et al 2007, Van Essen et al 2012, Schirner et al 2015, 2023). The SC imposes a constraint on the brain dynamics, allowing for the personalized simulation of the brain’s (ab)normal activities and their associated imaging recordings, potentially informing targeted interventions (Proix et al 2017, Olmi et al 2019, Jirsa et al 2023, Wang et al 2023).

During aging or some diseases, the SC is reported to deteriorate (Antonenko and Flöel 2013, Damoiseaux 2017, Zuo et al 2017), or be perturbed (Stam 2014, Ozdemir et al 2020, Menardi et al 2021), particularly with respect to the number of inter- and intra-hemispheric fibers within tracts and fiber density (Zimmermann et al 2018, Puxeddu et al 2020, Lavanga et al 2023, Petkoski et al 2023). These observations, however, do not map trivially on functional data, such as fMRI (Stumme et al 2022, Jockwitz et al 2023, Krämer et al 2023), and therefore to the statistical correlation or temporal dependency between the activities of different brain regions, as measured by FC. This could be due to several factors, notably those related to the structure of the generative model family: these features include a degenerate mapping between SC and FC, high dimensionality of the parameter space, and degeneracy induced by network effects in the latent state space, all introducing potential non-identifiability. These sources of indeterminacy stymie even state-of-the-art methods, which invites us to find ways to improve identifiability and employ algorithms better equipped for dealing with degeneracy.

State-of-the-art methods for inverting models of neuroimaging data vary in realism and complexity of the resulting estimate. Methods such as dynamical causal modeling (DCM) (Friston et al 2003, 2014) combine sophisticated statistical models and fixed variational schemes to address specific mechanistic hypotheses in neuroimaging datasets but scale poorly in terms of the size of the hypothesized network, and its mean-field variant ignores—by definition—the correlation between parameters. The Digital Twin Brain (Lu et al 2022), on the other hand, models a very large network of spiking neurons and employs a Kalman filter scheme (a type of recursive filtering algorithm) to fit the data, yet it remains difficult to assess both the realism of the network connectivity and the interpretability of the model parameters. The VBMs aim to push the frontier forward by combining scalability to realistically detailed whole-brain models with recent probabilistic inference schemes to yield reliable, mechanistically interpretable estimates (Hashemi et al 2020, 2021, 2023). The resulting improvements in the state-of-the-art are achieved through parameterizations, which decorrelate model parameters (e.g. to take advantage of parallel simulations). The resulting models scale well with data resources while remaining tractable for probabilistic inference on personalized whole-brain functional data. This work highlights the importance of exploring flexible and scalable alternative methods that may offer unique advantages in terms of realism, interpretability, and uncertainty quantification through Bayesian estimation for better downstream decision-making processes.

SBI (Cranmer et al 2020, Boelts et al 2022) or likelihood-free inference (Papamakarios et al 2019b, Brehmer et al 2020) sidesteps problems with MCMC sampling such as geometrical issues (Betancourt et al 2014, Betancourt 2016), immense computational cost (Baldy et al 2024), and algorithm sequentiality entirely: SBI uses the generative model to map samples from the prior (i.e background knowledge) to a corresponding set of low dimensional data features, and then it takes a maximum likelihood estimate of a Bayesian regression of model parameters on data features. In practice, to ensure the resulting approximate posterior density is sufficiently expressive, these methods employ deep neural networks to parametrize or construct directly the approximated density (Greenberg et al 2019, Gonçalves et al 2020, Hashemi et al 2023). In these deep network approaches, a simple base probability distribution (i.e. prior) is transformed into a more complex distribution (i.e. posterior) through a sequence of invertible transformations (Rezende and Mohamed 2015, Papamakarios et al 2019a). In particular, the advanced machine learning techniques based on unsupervised generative models offer efficient reconstruction of the probability density functions and highly expressive transformations with low-cost computation (Kobyzev et al 2020, Papamakarios et al 2021), hence, efficient Bayesian inference on complex high-dimensional systems. SBI implies computationally intensive simulation and training stages, but performed only once, and subsequent inference is extremely efficient as it requires only a forward pass of the neural network on a vector of data features to construct a posterior distribution (in order of seconds, due to amortized strategy; Hashemi et al 2023).

In this study, we use an exact mean-field model of spiking neurons at each parcellated brain region, displaying a variety of dynamics such as multi-stability, in which the emergent dynamics are constrained by personalized anatomical data (SC matrix). We aim to reliably estimate the full posterior distribution of control parameters in VBMs, by training deep neural networks on a low-dimensional representation of fMRI BOLD data, such as FC (Friston 1994, Greicius et al 2003, Honey et al 2009), and FCD (Zalesky et al 2014, Hansen et al 2015, Lurie et al 2020). We use the term spatio-temporal data features to refer to the statistical moments derived from BOLD time-series, while we refer to the summary statistics extracted from FC and FCD matrices as functional data features. The main motivation for using the SBI approach is to leverage the advantages of fast and parallel simulations for efficient and flexible inference using data features, accompanied by uncertainty quantification rather than a single point estimation (Wang et al 2019, Kong et al 2021).

Note that opting for SBI over MCMC sampling will not offer a straightforward solution to the inference challenges at the whole-brain level. A generative model defines the joint probability distribution of both observed data and latent variables or parameters of the model, enabling the generation of synthetic data samples. Hence, the performance of inference relies on both the level of information contained in the observed data, and the efficiency of model structure to consistently recognize that data. Accordingly, to address non-identifiability issues using generative models such as VBMs, we take the following approaches: (i) Enhancing the inference by incorporating more information, e.g. through data augmentation; (ii) Increasing the probability transition between states in latent dynamics, e.g. through intervention by stimulation; (iii) Restructuring the model configuration space, e.g. through hierarchical reparameterization, to facilitate more efficient exploration of the posterior distribution.

Our results on the uncertainty quantification using SBI show that FCD is more informative than FC for inference on perturbed connections, but using both data features provides stronger model evidence against the fitted data. Nevertheless, correlations between different brain regions and the associated dynamics (functional data features) do not provide sufficient information for inferring heterogeneous excitability at whole-brain level. We demonstrate that training deep neural density estimators (Papamakarios et al 2019a, Kobyzev et al 2020) by including the spatio-temporal data features provides more accurate inference on generative parameters. As an alternative to such data augmentation, the perturbation in brain dynamics by stimulation can provide an efficient remedy for the non-identifiability issue in the estimation of degradation in intra-hemispheric connections (within the limbic system). By prioritizing model structure over data, we show that the hierarchical structure in VBMs proves to be more effective than pooled (homogeneous) and unpooled (heterogeneous) modeling, to infer the control parameters, (such as regional excitabilities), from functional data.

2. Materials and methods

2.1. SC

The VBMs emphasize the structural network characteristics of the brain by representing the brain regions as nodes, which are connected via a SC matrix representing white matter fibre tracts (Sanz-Leon et al 2015). The $\mathrm{SC} \in \mathbb{R}^{\mathit{N}_{\mathit{n}} \times \mathit{N}_{\mathit{n}}}$ with N_n parcelled regions used in this study was taken from the open-source brain network modeling tool, the virtual brain (TVB; Sanz Leon et al 2013, Schirner et al 2015). The T1-weighted MRI images were processed to obtain the brain parcellation, whereas diffusion-weighted (DW-MRI) images were used for tractography. With the generated fiber tracts and with the regions defined by the brain parcellation, the connectome was built by counting the fibers connecting all regions. Using Desikan–Killiany parcellation (Desikan et al 2006) in the reconstruction pipeline, the patient’s brain was divided into $N_n = 88$ cortical and subcortical regions. The connectome was normalized so that the maximum value is equal to one.

2.2. The VBMs

To ensure a realistic in-silico experiment and therefore assess the function-structure link during the brain disorders via VBMs, we simulated the BOLD fMRI using a whole-brain network model that couples exact mean-field representation of spiking neurons through the weighted edges in the SC matrix. Assuming a Lorentzian distribution on the membrane potentials across decoupled brain regions, with half-width Δ centered at η, the regional brain dynamics can be described by a neural mass model derived analytically in the limit of infinitely all-to-all coupled quadratic integrate-and-fire neurons (Montbrió et al 2015).

By coupling the brain regions via an additive current in the average membrane potential equations, the dynamics of the whole-brain network can be described as follows (Rabuffo et al 2021, Fousek et al 2022):

$\begin{align} \tau\dot{r_i}(t) & = 2 r_i(t) v_i(t) + \dfrac{\Delta}{\pi \tau} \nonumber \\ \tau \dot{v_i}(t) & = v_i^2(t) - (\pi \tau r_i(t))^2 + J \tau r_i(t) + \eta + I_{stim}(t) + G \sum_{j = 1}^{N_n} SC_{ij}(r_{j}(t)-r_{i}(t))) + \xi(t), \end{align} \tag{ 1 }$

where v_i and r_i are the average membrane potential and firing rate, respectively, at the ith brain region. The excitability η, the synaptic weight J, the spread of the heterogeneous distribution Δ, and the characteristic time constant τ, can be tuned so that each decoupled node is in a bistable regime, exhibiting a down-state stable fixed point and an up-state stable focus in the 2D phase space (Montbrió et al 2015). The bistability is a fundamental property of regional brain dynamics to ensure a switching behavior in the data (e.g. to generate FCD), that has been recognized as representative of realistic dynamics observed empirically (Rabuffo et al 2021). The input current $I_\textrm{stim}$ represents the stimulation to selected brain regions, which increase the basin of attraction of the up-state in comparison to the down-state, while the fixed points move farther apart (Rabuffo et al 2021). Moreover, the parameter G is the network scaling parameter that modulates the overall impact of brain connectivity on the state dynamics (see supplementary, figure S1), $\textrm{SC}_{i,j}$ denotes the symmetrical connection weight between ith and jth nodes with ${i,j} \in \{1, 2, \ldots, N_n \}$ , and the dynamical noise $\xi(t) \sim \mathcal N(0, {\sigma}^2)$ follows a Gaussian distribution with mean zero and variance $\sigma^2 = 0.03$ .

In this study, the brain network model was implemented in TVB software (Sanz Leon et al 2013, Rabuffo et al 2021) and equipped with BOLD forward solution comprising the Balloon-Windkessel model (Stephan et al 2007) applied to the firing rate. In addition, we have implemented the model in C++, which is around 2 times faster than our Python implementation with Just-in-Time (JIT) compilation, and also CUDA-based GPU, which provides up to 100 times faster computational cost through parallelization (see supplementary, figure S2). The model simulation and parameter estimation were performed on a Linux machine with Intel Core i0-10900 CPU 2.80 GHz and 32 GB RAM.

The approach of virtually modeling the brain connectivity in disorders provides the basis to investigate whether a specific modification in connectome affects the observed brain function in a causal sense. We specifically tested whether the disorder is related to inter-hemispheric and intra-hemispheric connections in SC, by applying two spatial masks on the subject-specific connectome (normalized by the scaling factor G) as follows:

$\begin{align} \textrm{SC}_\textrm{disordered} = G \textrm{SC}_\textrm{healthy} - \alpha M_\textrm{inter} - \beta M_\textrm{intra}\end{align} \tag{ 2 }$

where parameters α and β are the unknown normalized intensity of degradation in SC (as the target of parameter estimation in the range [0–1]), according to the spatial masks $M_\textrm{inter}$ and $M_\textrm{intra}$ , on inter-hemispheric and intra-hemispheric connections, respectively. In this study, the $M_\textrm{intra}$ mask indicates the afferent and efferent connections originating from the limbic system (see figure 1). The motivation for these masks comes from the fact that they have been shown to capture the leading mechanism of healthy aging (Lavanga et al 2023) and Alzheimer’s Disease (Zimmermann et al 2018, Arbabyazd et al 2023).

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** The SBI-VBMs workflow to estimate the posterior distribution of generative parameters in brain disorders. (A) TVB reconstruction pipeline. The T1-MRI and DW-MRI images are processed to build a personalized brain connectome. (B) The virtual brain model (VBM). An exact neural mass model of spiking neurons is placed at each brain region, combined with anatomical data to generate various data features, such as FC and FCD. Estimating the generative parameters with perturbation in the SC using a masking approach, and stimulating certain regions, provides predictions on degradation in the connectome caused by disorders. (C) The SBI with deep neural density estimators. First, the model parameters are drawn randomly from a prior distribution. Then, the VBM simulator takes the parameters as input and generates summary statistics of the data as output. Next, a class of deep neural density estimators is trained on the pairs of random parameters and corresponding data features to learn the joint posterior distribution of the model parameters. Finally, we can quickly approximate the posterior for new data and make probabilistic predictions that are consistent with the observed data.
Download figure:
Standard image High-resolution image

2.3. FCD

To quantify the communication between different brain regions and associated dynamics, we computed both static FC and FCD, which is the time-variant representation of FC and reflects the fluctuation of the covariance matrix over time (see supplementary, figure S1). From the simulated BOLD signals generated by TVB, we extracted FC matrix by calculating the Pearson correlation between the BOLD signals of any two brain regions. In order to track the time-dependent changes in the FC, we computed the windowed FCD, with length of sliding window τ = 30 s and step size of $t_{w+1}-t_{w} = 6$ s (Hansen et al 2015, Arbabyazd et al 2020):

$\begin{align} \textrm{FCD}\left(w_i, w_j\right) = \textrm{corr}\left(\textrm{uppertri}\left(\textrm{FC}\left(w_i\right)\right), \textrm{uppertri}\left(\textrm{FC}\left(w_j\right)\right)\right)\end{align} \tag{ 3 }$

where, the (i, j) entry of the FCD matrix provided the Pearson correlation between the upper triangular parts of the two FC matrices $\textrm{FC}(w_i)$ and $\textrm{FC}(w_j)$ calculated at each sliding windows $w = 1, 2, \ldots, N_w$ . To assess whether the simulated BOLD time-series might have state transitions between up and down states, we quantified the brain’s fluidity by calculating the FCD variance to capture the transitioning behavior (see supplementary, figure S1).

2.4. Bayesian inference

The Bayesian approach offers a principled way for making inference, prediction, and quantifying uncertainty for decision-making process (Gelman et al 1995, Bishop 2006, Box and Tiao 2011). This approach naturally evaluates and incorporates uncertainties in the parameters and observations to drive meaningful conclusions from the data, with a broad range of applications (Bonomi et al 2016, Hashemi et al 2018, Khalvati et al 2019, Guimerà et al 2020, Broderick et al 2023, Sip et al 2023, Wang et al 2023). Parameter estimation within a Bayesian framework is treated as the quantification and propagation of uncertainty through probability distributions placed on the parameters, updated with the information provided by data (Hashemi et al 2020, 2021). Considering the joint probability distribution of the observation x and the unknown control parameters θ, referred to as the generative model. Then by the product rule: $p(\theta, x) = p(\theta \mid x) p(x)$ , equivalently, $p(\theta, x) = p(x \mid \theta)p(\theta)$ . Hence, given data x and model parameters θ, Bayes rule defines the posterior distribution as

$\begin{align} p\left(\theta \mid x\right) = \frac{p\left(\theta\right) p\left(x \mid \theta\right)}{p\left(x\right)}\end{align} \tag{ 4 }$

that combines and actualizes prior information (domain expertise) about unknown parameters with the knowledge acquired from observed data through the so-called likelihood function (data generating process). The prior information $p(\theta)$ is typically determined before seeing the data through beliefs and previous evidence about possible values of the parameters. The likelihood function $p(x \mid \theta)$ represents the probability of some observed outcomes given a certain set of parameters (the information about the parameters provided by the observed data). The denominator $p(x) = \int p(x\mid \theta)p(\theta)d\theta$ represents the probability of the data and it is known as evidence or marginal likelihood. In the context of inference, this term amounts to simply a normalization factor. Note that using Bayesian inference in this study, we aim to estimate the entire posterior distribution of the unknown parameters, i.e. the uncertainty over a range of plausible parameter values that generates the data, rather than a single point estimate for data fitting (e.g. the maximum likelihood estimation; Hashemi et al 2018, or the maximum a posteriori; Friston et al 2014, Razi et al 2015, Vattikonda et al 2021).

2.5. SBI

The key challenge to perform an efficient Bayesian inference is the evaluation of likelihood function $p(x \mid \theta)$ . This is typically intractable for high-dimensional models involving nonlinear latent variables (such as VBMs), as the likelihood evaluation requires an integral over all possible trajectories through the latent space that controls the generative process: $p(x \mid \theta) = \int p(x, z \mid \theta) \textrm{d}z$ , where $p(x, z \mid \theta)$ is the joint probability density of data x and unmeasured (hidden) latent variables z, given parameters θ. In particular, when dealing with high-dimensional latent space at whole-brain scales, the computational cost of evaluating the likelihood function can become prohibitive. This makes likelihood-based approaches, such as MCMC sampling, inapplicable (Hashemi et al 2023). SBI or likelihood-free inference performs efficient Bayesian inference for complex models where the calculation of likelihood function is either analytically or computationally intractable (Cranmer et al 2020). Instead of direct sampling from distributions and explicit evaluation of likelihood function, SBI sidesteps this problem by employing deep ANNs to learn an invertible transformation between parameters and data (more precisely, between the features of a simulated dataset and parameters of a parameterized approximation of the posterior distribution).

Taking prior distribution $p(\vec{\theta})$ over the parameters of interest $\vec{\theta}$ , a limited number of N simulations are generated for training step $\{(\vec{\theta}_i, \vec{x}_i)\}_{i = 1}^{N}$ , where $\vec{\theta_i} \sim p(\vec{\theta})$ and $\vec{x}_i$ is the simulated data given model parameters $\vec{\theta}_i$ . In other words, the training data set is an ensemble of N independent and identically distributed samples from the generative model $p(\vec{\theta}, \vec{x}) = p(\vec{\theta}) p(\vec{x} \mid \vec{\theta})$ , that can be run in parallel. After the training step, we are able to efficiently estimate the approximated posterior $q_{\phi}(\vec{\theta} \mid \vec{x})$ with learnable parameters φ, so that for the observed data $\vec{x}_\textrm{obs}$ : $q_{\phi}(\vec{\theta} \mid \vec{x}_\textrm{obs}) \simeq p(\vec{\theta} \mid \vec{x}_\textrm{obs})$ . For more details, see Hashemi et al (2023). Note that due to the amortized strategy, for any new observed data $\vec{x}_\textrm{obs}$ , we can readily approximate the true posterior $p(\vec{\theta} \mid \vec{x}_\textrm{obs})$ by a forward pass through the trained ANN (in the order of seconds).

A powerful technique in probabilistic machine learning for sampling and probability density estimation is NSF (Rezende and Mohamed 2015). NFs are a family of generative machine learning models that convert a base distribution into any complex target distribution, where both sampling and density estimation can be efficient and exact (Papamakarios et al 2019a, Kobyzev et al 2020). The aim of density estimation is to offer an accurate description of the underlying probability distribution of an observable data set, where the density itself is unknown. In particular, NFs embedded in the SBI approach, such as (sequential) neural posterior estimation (Lueckmann et al 2017, Gonçalves et al 2020), allow for the direct estimation of joint posterior distributions, and bypass the need for MCMC, while also potentially capturing degeneracy or multi-modalities. Recently advanced models such as MAF (Papamakarios et al 2017) and NSF (Durkan et al 2019) provide efficient and exact density evaluation and sampling from the joint distribution of high-dimensional random variables in a single neural-network pass. These generative models use deep neural networks to learn complex mappings between input data and their corresponding probability densities. They have achieved state-of-the-art performance with diverse applications, which efficiently represent rich structured and multi-modal posterior distributions, capturing complex dependencies and variations in the data distribution (Papamakarios et al 2019a, Kobyzev et al 2020). In this study, we use autoregressive flows (MAF and NSF models), which support invertible nonlinear transformations with low-cost computation (Kobyzev et al 2020). MAF uses affine transformations (Durkan et al 2019) with binary masks, eliminating the need for sequential recursion (Germain et al 2015). NSF leverages splines as a coupling function, enhancing the flexibility of the transformations while retaining exact invertibility (Papamakarios et al 2017). In this study, the MAF model comprises 5 flow transforms, each with two blocks and 50 hidden units, tanh nonlinearity and batch normalization after each layer. The NSF model consists of 5 flow transforms, two residual blocks of 50 hidden units each, ReLU nonlinearity, and 10 spline bins, all as implemented in the public sbi toolbox (Tejero-Cantero et al 2020). By training MAF/NSF on TVB simulations with random parameters, we are able to readily estimate the full posterior of parameters from low-dimensional data features, such as FC/FCD.

2.6. Model parameterization

We can parameterize the SBI-VBMs in three different ways:

Pooled structure assumes a common distribution across all individual units (here, the excitability), treating them collectively as a single unit with no variation in the sampling process. For simplicity, pooled (or homogeneous) modeling of excitability assumes a shared value across all brain regions (see supplementary, figure S3(A)). In this case, the set of generative parameters is low-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta \} \in \mathbb{R}^{4}$ .
Unpooled structure treats individual units as being sampled independently, allowing for adaptation to diverse characteristics and behaviors within the dataset without assuming a common structure. Unpooled (or heterogeneous) modeling of excitability assumes distinct characteristics for each region without imposing a uniform structure at the whole-brain level (see supplementary, figure S3(B)). This approach leads to more precise and biologically plausible inferences about the brain’s (dys)functioning but significantly increases the dimensionality of the parameter space. In this case, the set of generative parameters becomes high-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{N_n+3}$ , with $i \in {1, 2, \ldots, N_n}$ , and N_n being the number of brain regions.
Hierarchical (or partially pooled) structure offers a middle ground between the simplicity of a pooled structure and the complexity of a fully unpooled structure. This balance allows for more flexibility and efficiency in Bayesian modeling while maintaining biological plausibility (see supplementary, figure S3(C)). In this case, the set of generative parameters is optimal as $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta} \} \in \mathbb{R}^{5}$ , and $\eta_i \sim \mathcal{N}(\mu_{\eta}, \sigma_{\eta}^{2})$ .

2.7. Sensitivity analysis

Sensitivity analysis is a necessary step to determine which model parameters mostly contribute to variations in the model’s behavior due to changes in the model’s input, i.e. the identifiability analysis (Hashemi et al 2018, 2023). A local sensitivity coefficient measures the influence of small changes in one model parameter on the model output, while the other parameters are held constant. This can be quantified by computing the curvature of objective function through the Hessian matrix (Bates and Watts 1980, Hashemi et al 2018):

$\begin{align} S\left(\vec{\theta}\right) = \mathcal{D} \left(\dfrac{\partial^2 \mathcal{E}\left({\vec{\theta}}\right)}{\partial \vec{\theta} \partial \vec{\theta}^{\top}} \mid_{{\vec{\theta}}^{\ast}}\right),\end{align} \tag{ 5 }$

where $\mathcal{D}$ denotes the main diagonal elements of a matrix, $\mathcal{E}$ indicates a fitness function, centered at estimated parameters ${\vec{\theta}}^{\ast}$ .

Using the SBI approach, after the training step and posterior estimation for a specific observation, we can also efficiently perform sensitivity analysis by calculating the eigenvalues and corresponding eigenvectors from (Tejero-Cantero et al 2020, Deistler et al 2022):

$\begin{align} M = \mathbb{E}_{p\left(\vec{\theta}|\vec{x}_\textrm{obs}\right)}\left[\nabla_{\theta}p\left(\vec{\theta}|\vec{x}_\textrm{obs}\right)^T\nabla_{\theta}p\left(\vec{\theta}|\vec{x}_\textrm{obs}\right)\right],\end{align} \tag{ 6 }$

which then does an eigendecomposition $M = Q \Lambda Q^{-1}$ . A strong eigenvalue of the so-called active subspaces (Constantine 2015) indicates that the gradient of the posterior is large, hence, the system output is sensitive to changes along the direction of the corresponding eigenvector.

2.8. Evaluation of posterior fit

To measure the reliability of the Bayesian inference using synthetic data, we evaluate the posterior z-scores (denoted by z) against the posterior shrinkage (denoted by s), which are defined as (Betancourt 2014):

$\begin{align} z & = | \dfrac{\bar \theta-\theta^\ast}{\sigma_\textrm{post}}|,\end{align} \tag{ 7 }$

$\begin{align} s & = 1- \dfrac{\sigma^2_\textrm{post}}{\sigma^2_\textrm{prior}},\end{align} \tag{ 8 }$

where $\bar \theta$ and $\theta^\ast$ are the posterior mean and the true values, respectively, whereas σ_prior, and σ_post indicate the standard deviations of the prior and the posterior, respectively. The posterior z-score quantifies how much the posterior distribution of a parameter encompasses its true value. The posterior shrinkage quantifies how much the posterior distribution contracts from the initial prior distribution. The concentration of estimations towards large shrinkages indicates that all the posteriors are well-identified, while the concentration towards small z-scores indicates that the true values are accurately encompassed in the posteriors.

3. Results

In the following, we first present the SBI-VBMs workflow (section 3.1). Then, we demonstrate the estimation of the level of degradation in the connectome including global, inter-, and intra-hemispheric degradation and investigate their identifiability. To address potential non-identifiability issues, we propose data feature augmentation, by incorporating both spatio-temporal and functional features (section 3.2). Next, we employ stimulation as an effective remedy for non-identifiability of intra-hemispheric degradation, by increasing the probability of transition between hidden states (section 3.3). Finally, we investigate the inference of generative parameters, including excitability. We explore different model structures–homogeneous (section 3.4), heterogeneous (section 3.5), and hierarchical (section 3.6)–to identify the most effective parameterization for Bayesian modeling of VBMs.

3.1. The SBI-VBMs workflow

Figure 1 illustrates the overview of our approach, referred to as SBI-VBMs, to make probabilistic predictions on brain disorders by estimating the joint posterior distribution of the generative parameters. At the first step, the non-invasive brain imaging data such as T1-weighted MRI, and DW-MRI are collected for a specific patient (figure 1(A)). With the generated fiber tracts and the regions defined by the brain parcellation, the connectome (i.e. the total set of links between brain regions) is constructed by counting the fibers connecting all regions. The SC matrix, whose entries represent the connection strength between brain regions, provides the basis for constructing a network model–TVB–which is capable of generating various functional brain imaging data at arbitrary locations (e.g. cortical and subcortical structures). This step constitutes the structural brain network component, which imposes a constraint on brain network dynamics, allowing the hidden dynamics to be inferred from the data.

Then, each brain network node is assigned a computational model of average neuronal activity, a so-called neural mass model (see figure 1(B)). Here, we use an exact mean-field approximation for a population of spiking neurons, which due to its rich dynamics, such as bi-stability, enables us to generate various spatio-temporal dynamics as observed experimentally. Subsequently, the perturbation of SC by spatial masking, the inter- and intra-hemispheric edges can provide an estimation of the level of degradation in the patient connectome caused by disorders (see equation (2)). Notably, the stimulation of certain brain regions can be used to increase the causal evidence in the structure-function relationship. Embedding the disordered SC into the VBM, then complements the generative brain model of disorders (see equation (1)). The set of generative parameters θ comprises the regional excitability parameters η_i, a global coupling parameter G describing the scaling of the brain’s SC, and parameters α and β indicating the levels of interhemispheric deterioration, and intrahemispheric degradation within the limbic system, respectively. With an optimal set of these parameters, VBMs can mimic essential data features observed in fMRI recordings, such as FC/FCD matrices.

Finally, we use SBI (see figure 1(C)) to estimate the posterior distribution of the generative parameters in brain disorders. For this purpose, we first draw the parameters randomly from a prior distribution, i.e. $\vec{\theta}_i \sim p(\vec{\theta})$ , for each ith simulation. The VBM simulator takes the parameters $\vec{\theta}$ as input and generates summary statistics $\vec{x}$ of the data as output. By preparing a training dataset through performing VBM simulations with random parameters, a class of deep neural density estimators (such as MAF or NSF model) is then trained on $\{(\vec{\theta}_i, \vec{x}_i)\}_{i = 1}^{N}$ with budget of N simulations, to learn an invertible transformation between data features and parameters of an approximated posterior distribution $p(\vec{\theta} \mid \vec{x})$ (see equation (4)). This is the primary advantage of this approach that the amortized inference at the subject level enables us to evaluate different hypotheses with uncertainty quantification at a negligible computational cost, typically in the order of a few seconds.

3.2. Inference on level of degradation in connectome

Here we show the estimation over the set of unknown generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta\} \in \mathbb{R}^{3}$ . Figure 2 shows the estimated posterior and the sensitivity analysis, illustrating the impact of data features on the inference process. For the training (here using MAF model), the parameters are drawn from a uniform prior in the range: $\theta_i \in \mathcal{U}(0, 1)$ . It can be seen that the SBI accurately estimates the parameters G and α using functional data features (FC and FCD) from a budget of 100k simulations (figures 2(A) and (B)). However, we observe diffuse posterior estimation for β mask, by relying only on the functional data features (figure 2(C)). This is quantified by posterior shrinkage, indicating well-identified Bayesian estimation for parameters G and α, but poorly identified β mask when using functional data features (figure 2(D)). This is due to the model’s insensitivity to intra-hemispheric deterioration compared to the whole-network and intra-hemispheric degradation, as calculated from the eigenvalues of the posterior density (figure 2(E)). This is in agreement with the sensitivity analysis conducted using the Hessian matrix, a metric describing the local curvature of a function based on its second partial derivatives (figure 2(F)). The sensitivity of the β mask is nearly zero using Kullback–Leibler divergence or Kolmogorov–Smirnov distance as the fitness function (see supplementary, table S1 and figure S4). Moreover, their values indicate more sensitivity of parameters estimation to FCD than FC (see supplementary, figure S5).

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Inference and sensitivity analysis on the global coupling, inter- and intra-hemispheric degradation in a virtual brain, with $\vec{\theta} = \{G, \alpha, \beta\} \in \mathbb{R}^{3}$ . The posterior distribution estimated using 100k random simulations is shown (in red) for (A) the global coupling parameter G, (B) the inter-hemispheric deterioration α, and (C) the intra-hemispheric degradation β. The parameters are drawn from a uniform prior between zero and one (in blue). The ground truth values are G = 0.62, α = 0.3, β = 0.4 (in green). The training on the spatio-temporal features results in better estimation of model parameters (in dashed red) compared to functional features (in light red). (D) The posterior shrinkage close to one indicates well-identified Bayesian estimation. (E) The sensitivity analysis conducted using active subspaces explains the lack of posterior shrinkages in the estimation of β. (F) The Hessian values indicate no sensitivity to the β mask.
Download figure:
Standard image High-resolution image

To address the practical non-identifiability issue in estimating the β mask, we included the statistical characteristics of the BOLD time-series (such as the mean, variance, skewness, and kurtosis of BOLD time-series) in the training step. As shown in figures 2(A) and (C), the training on spatio-temporal data features results in the better estimation of model parameters, particularly in the estimation of β mask. This is also confirmed by posterior shrinkages, eigenvalues, and Hessian values, as shown in figures 2(D)–(F).

Furthermore, the effect of the simulation budget on the uncertainty quantification when training with the state-of-the-art deep neural density estimators (MAF and NSF) is shown in figure S6. We observed that both MAF and NSF models are compatible in uncertainty quantification (with different simulation budgets), but MAF was 2–4 times faster than NSF during the training process.

3.3. Inference on perturbation-based degradation in connectome

When integrating spatio-temporal data in training, some features such as mean and variance may become redundant, especially when using signal processing techniques like normalization or z-scoring. To address this issue, we have perturbed the brain dynamics through stimulation. Figure 3 illustrates the consequence of brain stimulation on the recovery of generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta\} \in \mathbb{R}^{3}$ . Here we used only functional data features for training in SBI.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Inference on the global, inter- and intra-hemispheric degradation in a virtual brain with perturbed dynamics induced by stimulation. Here we used only functional data features for training in SBI. (A), (E) The simulated BOLD data and corresponding FC and FCD matrices, under conditions of no-stimulation and with-stimulation, respectively. (B), (F) SBI provides accurate recovery of the parameter G, pre- and post-stimulation, respectively. (C), (G) Accurate posterior estimation of the α mask, before and after stimulation, respectively. (D), (H) SBI on β mask shows poor estimation when the stimulation is off, but reliable estimation when the stimulation is on, respectively. The violin plots in red show the estimated posterior densities. The black dots represent a linear regression on the maximum values of the estimated posteriors. Dashed green lines represent a perfect fit (ground truth values).
Download figure:
Standard image High-resolution image

Figures 3(A) and (E) show the simulated BOLD data and the corresponding FC and FCD matrices, in the absence and presence of intervention by stimulation, respectively. It can be seen that the BOLD data display enhanced structural patterns, as evidenced by the higher FC and the increased variance in FCD (the fluidity before stimulation was 0.001, and it increased to 0.004 after stimulation). Here the stimulation is induced by adding a repetitive step current, characterized by a duration of 5 s and an amplitude of $1.2~\mu {\textrm{A cm}^{-2}}$ to the membrane potential variable (through I_stim in equation (1)), within the limbic system.

Figures 3(B) and (F) show that SBI provides accurate recovery of global scaling factor G, before and after stimulation, respectively. This is consistent with our previous results indicating the model’s heightened sensitivity to this parameter. Figures 3(C) and (G) demonstrate the robustness of estimation on the α mask by closely matching the true values, regardless of the stimulation. This indicates the reliability of SBI in estimating the level of degradation between hemispheres, even by perturbation in brain dynamics.

Figures 3(D) and (H) show that SBI provides poor estimation on the β mask when the stimulation is off, but reliable inference when the stimulation is applied, respectively. This demonstrates that the intervention by stimulation is able to address the non-identifiability issue encountered when estimating degradation within brain hemispheres. Perturbing brain dynamics through stimulation increases the likelihood of transitions between brain states, thereby improving the quality of estimation by providing new observable responses for the inference process. Although it may be impractical to administer stimulation to all subjects, VBMs allow us to run simulations in silico and make inferences to evaluate our hypotheses when real experiments are prohibitive.

In the following, we extend the dimensionality of generative parameters to include the excitability parameter. We investigate different forms of model parameterization, such as pooled (homogeneous), unpooled (heterogeneous), and hierarchical (multi-level), to improve the efficiency of SBI-VBMs in higher dimensions, with no need for stimulation (see supplementary, figure S3).

3.4. Inference on homogeneous generative parameters

Here we show the estimation over set of unknown generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta, \eta, J, \Delta \} \in \mathbb{R}^{6}$ . In this case, we infer the brain dynamics from parameters located in multiple brain regions with similar or homogeneous characteristics. This form of model parameterization (see supplementary, figure S3(A)) reduces the spatial dimensions, hence the computational complexity of the inference process.

Figure 4 shows the estimated posterior distributions using both spatio-temporal and functional data features extracted from BOLD time-series. The diagonal panels show that the ground-truth values (in green vertical lines) are well under the support of the estimated posterior distributions (in red). Here, we used 100k random simulations from uniform priors (in blue) as: $G, \alpha, \beta \in \mathcal{U}(0, 1)$ , $\eta \in \mathcal{U}(-6, -3.5)$ , $J \in \mathcal{U}(1, 30)$ , and $\Delta \in \mathcal{U}(0.1, 2)$ .

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Inference on homogeneous generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta, J, \Delta \} \in \mathbb{R}^{6}$ . The training process involved using both spatio-temporal and functional data features, with a budget of 100k simulations. The diagonal panels display the prior (in blue) and estimated posterior (in red). The true values (green lines) are well under the support of the estimated posterior distributions. The lower diagonal panels display the observed and predicted BOLD data and the corresponding FC/FCD matrices. The upper diagonal panels show the joint posterior between parameters, and their correlation values (ρ at the upper left corners). The ground-truth values are shown by green stars. High-probability areas are color-coded in yellow, while low-probability areas are represented in black.
Download figure:
Standard image High-resolution image

**Figure 4.** Inference on homogeneous generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta, J, \Delta \} \in \mathbb{R}^{6}$ . The training process involved using both spatio-temporal and functional data features, with a budget of 100k simulations. The diagonal panels display the prior (in blue) and estimated posterior (in red). The true values (green lines) are well under the support of the estimated posterior distributions. The lower diagonal panels display the observed and predicted BOLD data and the corresponding FC/FCD matrices. The upper diagonal panels show the joint posterior between parameters, and their correlation values (ρ at the upper left corners). The ground-truth values are shown by green stars. High-probability areas are color-coded in yellow, while low-probability areas are represented in black.
Download figure:
Standard image High-resolution image

Due to sufficient budget of simulations and the informativeness of the data features for training, hence the accurate parameter estimation, we can see a close agreement between the data features in the observed and predicted BOLD data (see FC/FCD matrices, shown in the lower diagonal panels). The joint posterior between parameters are shown at the upper diagonal panels, along with their correlation values (ρ shown at the upper left corners). It can be seen that there is a strong correlation between parameters η and J (with ρ = −0.9), also between η and Δ (with ρ = −0.9). This indicates a parameter degeneracy between the excitability and the spread of their distribution, as well the synaptic weight at whole-brain level. This can be due to the homogeneous parameterization used for inference, which ignores all variation among the units being sampled. Using MAF model, the training took around 45 min, whereas generating 10k samples from the posterior took less than 2 s. Supplementary figure S7 demonstrates that excluding the spatio-temporal data features (i.e. statistical characteristics of BOLD time-series) during training leads to poor parameter estimation.

3.5. Inference on heterogeneous generative parameters

Here we use SBI to estimate posterior distribution in the set of unknown generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{91}$ , where η_i with $i \in \{1, 2, \ldots, N_n = 88\}$ is the excitability parameter heterogeneously distributed across brain regions. This form of model parameterization (see supplementary, figure S3(B)) offers a comprehensive explanation for the variation in the excitability across brain regions.

Figure 5 shows the maximum of estimated posterior distribution for three different scenarios in setting the excitabilities across brain regions. It can be seen that SBI provides accurate posterior estimation for various scenarios of excitability maps across brain regions. Here, the training process involved using both spatio-temporal and functional data features. See supplementary, figure S8 for the setups and full estimated posteriors.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Inference on heterogeneous generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{91}$ . SBI provides accurate estimation for different configurations of excitabilities across brain regions. Both spatio-temporal and functional data features were used for training, with a budget of 1M simulations. The regions are color-coded based on their excitability values, with blue representing low-, red representing medium-, and green representing high- excitability. At top: axial view. At bottom: sagittal view.
Download figure:
Standard image High-resolution image

**Figure 5.** Inference on heterogeneous generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{91}$ . SBI provides accurate estimation for different configurations of excitabilities across brain regions. Both spatio-temporal and functional data features were used for training, with a budget of 1M simulations. The regions are color-coded based on their excitability values, with blue representing low-, red representing medium-, and green representing high- excitability. At top: axial view. At bottom: sagittal view.
Download figure:
Standard image High-resolution image

Importantly, our results indicate that using only the whole-brain functional features (FC and FCD) fails to offer sufficient information for inferring heterogeneous excitability across regions (see supplementary, figure S9).

Regarding the computational cost, using MAF model, the training on 1M simulations took around $10~h$ , whereas generating 10k samples from the posterior took less than 60 s. Supplementary figures S10 and S11 demonstrate that both MAF and NSF models were compatible in uncertainty quantification of heterogeneous parameters. Note that here we used 1M random simulations, compared to the 100k simulations used in homogeneous parameterization (figure 4). This indicates the demand for a very large simulation budget during training step to achieve accurate estimation based on a heterogeneous parameterization.

3.6. Inference on hierarchical generative parameters

Here we use SBI to estimate posterior distribution in the set of unknown generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta} \} \in \mathbb{R}^{5}$ , where $\eta_i \sim \mathcal{N}(\mu_{\eta},\,\sigma_{\eta}^{2})$ , indicating that excitabilities are sampled from a Gaussian distribution with mean centered at µ_η, and standard deviation of σ_η. This is a hierarchical parameterization on excitabilities (see supplementary, figure S3(C)), in which they are sampled from a population distribution so that the heterogeneous parameters η_i with $i \in \{1, 2, \ldots, N_n\}$ are now sampled from a generative distribution $\mathcal{N}(\mu_{\eta}, \sigma_{\eta})$ . Hence, only two hyperparameters µ_η and σ_η need to be estimated to explain the variation in excitability across the brain, rather than N_n excitability parameters.

Figure 6 shows the estimated posterior by SBI using the both spatio-temporal and functional data features for training. The diagonal panels show that the posterior (in red) accurately encompass the ground-truth values (in green vertical lines) used for generating the observed data. For training, we generated 100k random simulations from uniform priors (in blue) as: $G, \alpha, \beta \in \mathcal{U}(0, 1)$ , $\mu_{\eta} \in \mathcal{U}(-6, -4)$ , and $\sigma_{\eta} \in \mathcal{U}(0, 1)$ . Due to effective form of model parameterization, hence the accurate parameter estimation, we can see a close agreement between the data features in the observed and predicted BOLD data, such as the FC/FCD matrices (shown at the lower diagonal panels). The joint posterior between parameters are shown at the upper diagonal panels, along with their correlation values (ρ at the upper left corners). Using MAF model, the training took around $30~min$ , whereas generating 10k samples from the posterior took less than 1 s. In contrast to heterogeneous modeling, the hierarchical structure yields more informative posteriors when using only functional data features, even with a significantly smaller budget of simulations for training (see supplementary, figure S9 versus figure S12). These results indicate that hierarchical approach effectively takes into account the variability across brain regions while naturally incorporating a form of regularization to provide more stable and robust estimates.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Inference on hierarchical generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta} \} \in \mathbb{R}^{5}$ , with $\eta_i \sim \mathcal{N}(\mu_{\eta},\,\sigma_{\eta}^{2})$ . We used both spatio-temporal and functional data features, for the training on a budget of 100k simulations. The diagonal panels display the prior (in blue) and estimated posterior distributions (in red). The true values (green lines) are well under the support of the estimated posterior distributions. The lower diagonal panels display the observed and predicted BOLD data and the corresponding FC/FCD matrices. The upper diagonal panels show the joint posterior between parameters, and their correlation values (ρ at the lower left corners). The ground-truth values are shown by green stars. High-probability areas are color-coded in yellow, while low-probability areas are represented in black.
Download figure:
Standard image High-resolution image

**Figure 6.** Inference on hierarchical generative parameters in a virtual brain. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta} \} \in \mathbb{R}^{5}$ , with $\eta_i \sim \mathcal{N}(\mu_{\eta},\,\sigma_{\eta}^{2})$ . We used both spatio-temporal and functional data features, for the training on a budget of 100k simulations. The diagonal panels display the prior (in blue) and estimated posterior distributions (in red). The true values (green lines) are well under the support of the estimated posterior distributions. The lower diagonal panels display the observed and predicted BOLD data and the corresponding FC/FCD matrices. The upper diagonal panels show the joint posterior between parameters, and their correlation values (ρ at the lower left corners). The ground-truth values are shown by green stars. High-probability areas are color-coded in yellow, while low-probability areas are represented in black.
Download figure:
Standard image High-resolution image

Lastly, we evaluated the posterior fit in all aforementioned scenarios by plotting z-scores versus shrinkage (see supplementary, figures S13 and S14). We observed that using only functional data features (FC/FCD), we can achieve ideal estimation of the β mask through stimulation and of excitability η through hierarchical parameterization. However, when incorporating all available data features (both spatio-temporal and functional), we achieved an ideal Bayesian inversion for all parameters regardless of parameterization, except for the β mask, which requires stimulation for ideal estimation. These results highlight the importance of brain dynamic perturbation by stimulation and the use of effective Bayesian modeling such as hierarchical when spatio-temporal information is lacking.

4. Discussion

This study addresses the ongoing challenge of probabilistic inference on complex whole-brain dynamics using connectome-based models, by presenting SBI-VBMs. Our methodology (see figure 1) offers a compelling advantage in terms of parallel and fast simulations, especially when utilized with the computational power of GPUs (see supplementary, figure S2). We showed that by systematically examining how variations in input parameters affect the output, sensitivity analysis (see figures 2, S5, S13 and S14) is crucial in ensuring the robustness and reliability of inference results, thus for decision-making processes. Our results underscore the limitations of relying simply on functional data features for making inferences about degradation in brain connectivity and regional excitability (see figure 3 and supplementary, figure S9). Rather, we proposed the SBI-VBMs approach, which relies on expressive deep neural networks to easily incorporate all relevant information, such as spatio-temporal and functional features. This approach demonstrates its effectiveness in accurately estimating generative parameters associated with brain disorders (see figures 5 and 6). Furthermore, perturbing the brain dynamics through stimulation, which increases the likelihood of transitions between brain states, provides a valuable tool for resolving the issue of non-identifiability, especially in the context of intra-hemispheric connections (see figure 3). Ultimately, the hierarchical structure of SBI-VBMs emerges as an efficient parameterization, enabling precise and biologically meaningful inference on brain’s (dys)functioning (see figures 6 and S12).

Several previous studies in whole-brain modeling (Deco et al 2014, Wang et al 2019, Kong et al 2021, Cabral et al 2022) have used optimization methods to provide a single best value of an objective (or a cost) function, scoring the model performance against the observed data (such as minimizing Kolmogorov–Smirnov distance or maximizing the Pearson correlation between observed and generated FC/FCD). Such a parametric approach results in only a point estimation, and is limited to construct the relationship between parameters and the associated uncertainty (see figures 4 and 6). The optimization algorithms may easily get stuck in local extrema, requiring multi-start strategies to address the potential multi-modalities and parameter degeneracies. Moreover, the estimation depends critically on the form of objective function defined for optimization (Svensson et al 2012, Hashemi et al 2018), and the models involving differential equations often have strongly correlated and/or non-identifiable parameters. These issues can be addressed by using Bayesian inference which provides proper representation of distributions and dependence among parameters (Samaniego 2010, Hashemi et al 2018).

Bayesian approach has the advantage of yielding a joint probability distribution for model parameters. The posterior distribution encompasses all possible parameter combinations that produce a simulation output that explains the data with quantifying the uncertainty in estimation, which is critical for model comparison, selection and averaging, hypothesis testing, and decision-making process. Bayesian inference also allows us to integrate the prior knowledge in inference and prediction e.g. the physiological, anatomical, or clinical knowledge to maximize the model predictive power against measurements (Hashemi et al 2021, Baldy et al 2023, Wang et al 2023).

Learning the parameters of dynamical systems enables one to infer not only beliefs or probabilities of events but also the dynamics of events under changing conditions, for instance, changes induced by treatments or external interventions (Friston et al 2003, Pearl 2009, Jirsa et al 2023). A well-established Bayesian framework for inferring hidden neural state dynamics from neuroimaging data is the so-called DCM (Friston et al 2003). Our SBI approach shares a key aspect with DCM: Bayesian inference for causal effects by estimating the posterior distribution of parameters of a dynamical generative model. However, there are key differences in practice. The connectome-based approach used in this study considers the SC as fixed parameters which are obtained from non-invasive imaging data of individuals. In contrast, DCM relies on effective connectivity to explain the effects on observation induced by the causal changes in interactions among brain regions (Frässle et al 2017, 2018). Although effective connectivity can provide a better model fit to empirical data due to the high-dimensionality of parameter space, it may easily suffer from the non-identifiability issue, unless the changes in connectivity are constrained by the prior belief that there are transitions among a small number of brain connectivity states (Zarghami and Friston 2020). In DCM, the nonlinear ordinary differential equation representing the neural mass model is often approximated by its linearization around the system fixed points, whereas the nonlinear property of generative models to ensure a switching behavior in the data is maintained in our approach (see equation (1)). More importantly, inversion of a DCM involves minimizing the free energy (equivalently, the Kullback–Leibler divergence), in order to maximize the model evidence (Friston et al 2014, Razi et al 2015), while its mean-field variant ignores—by definition—the correlation between parameters. Our approach is equipped with state-of-the-art deep learning algorithms for Bayesian inference (e.g. MAF/NSF models), which systematically place a tighter upper bound on model evidence (Rezende and Mohamed 2015, Papadopoulou et al 2017) to deal with potential multi-modalities and degeneracies among parameters (Hashemi et al 2023).

Recently, we have proposed a whole-brain probabilistic framework, the so-called Bayesian Virtual Epileptic Patient (BVEP; Hashemi et al 2020), in which the generative model based on a system of high-dimensional and nonlinear stochastic ordinary differential equations was inverted using unbiased and automatic MCMC sampling algorithms. Although, we have shown that this non-parametric approach is able to accurately estimate the spatial map of epileptogenicity across whole-brain areas, it required a reparameterization over model configuration space to facilitate efficient exploration of the posterior distribution in terms of computational time and convergence diagnostics (Jha et al 2022). In the presence of metastability in the state space (see figures 1 and S1), MCMC methods require either more computational cost or intricately designed sampling strategies (Gabrié et al 2022, Jha et al 2022, Baldy et al 2023), whereas SBI allows for efficient Bayesian estimation without the access to the full knowledge on the state space representation of a system (Baldy et al 2024). Note that we used the data features derived from only firing rates, while the information related to membrane potential activities was treated as missing data (i.e. no access to the full state space behavior). More critically, this is highly beneficial if the model output such as simulated raw time-series poses discrepancy with the observed data, rather, it accurately explains the low-dimensional data features such as FC/FCD. Providing fast simulations, SBI can be applied to other whole-brain network models, since it requires neither model nor data features to be differentiable (Gonçalves et al 2020, Hashemi et al 2023). Nevertheless, finding low-dimensional but sufficient informative data features that can deal with parameter degeneracies, noise, network input, and scale to higher dimensions with a feasible budget of simulations are the challenges for applying this approach to other neuroimaging datasets.

In recent work by Sip et al (2023), a data-driven approach has been introduced to infer both the unknown generative dynamical system and the parameters varying across brain regions, while network nodes representing the brain regions are connected with realistic strengths derived from diffusion-weighted imaging data. This method offers a different perspective, leveraging the power of deep generative models (Variational Autoencoders) to focus on the role of regional variance of model parameters. This approach bypasses the need for specific mathematical form of the neural mass models at each region, allowing for increased flexibility and the potential to capture complex relationships within neuroimaging data. In contrast, the current study employs an exact mean-field description of spiking neurons to ensure both realistic simulations of brain activities and the interpretability of the estimations to develop effective interventions. However, determining causality in brain disorders requires careful consideration of multiple variables, including longitudinal data, multimodal imaging techniques, and consistency of statistical and computational methods with the data. Moreover, brain disorders often involve intricate interactions between genetic, environmental, and neurobiological factors, making it challenging to isolate specific causal factors.

In summary, the methodology employed in this study offers the advantage of parallel and fast simulations for flexible, scalable, and efficient probabilistic inference and can be applied to different connectome-based models. The key challenge lies in identifying low-dimensional yet informative data features that can effectively deal with the non-identifiability issues in the generative models. Nevertheless, hierarchical structures serve as an effective solution, especially in cases where informative data features or perturbing brain dynamics are not easily available. The applications of our approach to inferring the origin of brain diseases remain to be explored in future studies. This investigation holds promise for advancing our understanding and potentially informing targeted interventions to clinicians for different brain disorders.

Information sharing statement

All code is provided freely and is available at GitHub (https://github.com/ins-amu/SBI-VBMs).

Acknowledgments

This research has received funding from EU’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreements No. 101147319 (EBRAINS 2.0 Project), No. 101137289 (Virtual Brain Twin Project), and government grant managed by the Agence Nationale de la Recherch reference ANR-22-PESN-0012 (France 2030 program). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Giovanni Rabuffo, and Paul Triebkorn for fruitful discussions. We acknowledge the use of Fenix Infrastructure resources, which are partially funded from the European Union’s Horizon 2020 research and innovation programme through the ICEI project under the Grant Agreement No. 800858.

Data availability statement

No new data were created or analysed in this study.

Author contributions

Conceptualization: M H, M W, S P., and V K J Methodology: M H, A Z, M W Software: M H, A Z, M W, and J F Investigation: M H, and A Z Visualization: M H, and A Z Supervision: S P, and V K J Funding acquisition: V K J Writing—original draft: M H Writing—review $\&$ editing: M H, A Z, M W, J F, S P, V K J.

Appendix: Supplementary

Figure S1. Refer to the following caption and surrounding text. — **Figure S1.** The simulated firing rates and the corresponding BOLD signals across brain regions, for different values of the network scaling parameter G (see equation (1)), which modulates the overall impact of the SC matrix (left column). The FC (middle column) and FCD (right column) matrices are shown for the simulated BOLD data. (A) When the global coupling parameter G is weak, the brain regions enter the monostable regime (down-state), resulting in low firing rates across regions. This is manifested as a lack of interactions between different brain regions, resulting in a random FC matrix. Consequently, there are no temporal dynamics in FC, leading to a zero switching index in the FCD matrix (with the variance of 0.001). (B) At the working point (here around G = 0.5 with noise level of $\sigma^2 = 0.03$ ), the brain regions transition into the bistable regime. This leads to structured transitions between low and high firing rates in those regions. Consequently, the brain demonstrates correlated activities between brain regions, as captured by the FC matrix. Additionally, there is recurrence in the brain’s large-scale dynamics, which is reflected by a non-zero switching index in the FCD matrix (with the variance of 0.009). (C) When the global coupling parameter G is strong, the brain regions enter the monostable regime (up-state), resulting in high firing rates across regions. Again, this results in decorrelated regional activities, which are represented by a random FC matrix and a zero switching index in the FCD matrix ((with the variance of 0.002)).
Download figure:
Standard image High-resolution image

**Figure S1.** The simulated firing rates and the corresponding BOLD signals across brain regions, for different values of the network scaling parameter G (see equation (1)), which modulates the overall impact of the SC matrix (left column). The FC (middle column) and FCD (right column) matrices are shown for the simulated BOLD data. (A) When the global coupling parameter G is weak, the brain regions enter the monostable regime (down-state), resulting in low firing rates across regions. This is manifested as a lack of interactions between different brain regions, resulting in a random FC matrix. Consequently, there are no temporal dynamics in FC, leading to a zero switching index in the FCD matrix (with the variance of 0.001). (B) At the working point (here around G = 0.5 with noise level of $\sigma^2 = 0.03$ ), the brain regions transition into the bistable regime. This leads to structured transitions between low and high firing rates in those regions. Consequently, the brain demonstrates correlated activities between brain regions, as captured by the FC matrix. Additionally, there is recurrence in the brain’s large-scale dynamics, which is reflected by a non-zero switching index in the FCD matrix (with the variance of 0.009). (C) When the global coupling parameter G is strong, the brain regions enter the monostable regime (up-state), resulting in high firing rates across regions. Again, this results in decorrelated regional activities, which are represented by a random FC matrix and a zero switching index in the FCD matrix ((with the variance of 0.002)).
Download figure:
Standard image High-resolution image

Figure S2. Refer to the following caption and surrounding text. — **Figure S2.** Benchmarking the simulation cost of the VBM using MPR neural mass model, with 2.5M time step on Intel(R) Core(TM) i9-10900 CPU 2.80 GHz (in red) and NVIDIA RTX A5000 GPU (in blue). GPUs deliver substantial speedups up to $100\mathrm{X}$ over multi-core CPUs.
Download figure:
Standard image High-resolution image

**Figure S2.** Benchmarking the simulation cost of the VBM using MPR neural mass model, with 2.5M time step on Intel(R) Core(TM) i9-10900 CPU 2.80 GHz (in red) and NVIDIA RTX A5000 GPU (in blue). GPUs deliver substantial speedups up to $100\mathrm{X}$ over multi-core CPUs.
Download figure:
Standard image High-resolution image

Figure S3. Refer to the following caption and surrounding text. — **Figure S3.** The form of parameterization in SBI-VBMs, to explains the observation denoted by *x_i* (e.g. BOLD data) through the hidden layer of *z_i* (e.g. firing rates in equation (1)). (A) Pooled (homogeneous) structure assumes a common distribution for excitability across all regions and no variation in the sampling process. Hence, the set of generative parameters is low-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta \} \in \mathbb{R}^{4}$ . (B) Unpooled (heterogeneous) structure captures the distinct characteristics of individual regions without imposing a uniform structure at the whole-brain level. Hence, the set of generative parameters becomes high-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{N_n+3}$ with $i \in \{1, 2, \ldots, N_n\}$ . (C) Hierarchical (or partial pooling) structure offers a balance between the simplicity of pooled models and the complexity of unpooled models. Hence, the set of generative parameters is optimal as $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta}\} \in \mathbb{R}^{5}$ , with $\eta_i \sim \mathcal{N}(\mu_{\eta},\,\sigma_{\eta}^{2})$ . .
Download figure:
Standard image High-resolution image

**Figure S3.** The form of parameterization in SBI-VBMs, to explains the observation denoted by *x_i* (e.g. BOLD data) through the hidden layer of *z_i* (e.g. firing rates in equation (1)). (A) Pooled (homogeneous) structure assumes a common distribution for excitability across all regions and no variation in the sampling process. Hence, the set of generative parameters is low-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta \} \in \mathbb{R}^{4}$ . (B) Unpooled (heterogeneous) structure captures the distinct characteristics of individual regions without imposing a uniform structure at the whole-brain level. Hence, the set of generative parameters becomes high-dimensional as $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{N_n+3}$ with $i \in \{1, 2, \ldots, N_n\}$ . (C) Hierarchical (or partial pooling) structure offers a balance between the simplicity of pooled models and the complexity of unpooled models. Hence, the set of generative parameters is optimal as $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta}\} \in \mathbb{R}^{5}$ , with $\eta_i \sim \mathcal{N}(\mu_{\eta},\,\sigma_{\eta}^{2})$ . .
Download figure:
Standard image High-resolution image

Figure S4. Refer to the following caption and surrounding text. — **Figure S4.** Kullback–Leibler (KL) divergence between the observed and predicted data for estimated parameters over the degree of degradation in the global, between and within connections (denoted by G, α, and β, respectively) in virtual brains. The estimation is performed by minimizing the KL divergence, which measures the difference between probability distributions. (A) and (B) display the distribution of FC (functional connectivity) and FCD (functional connectivity dynamics) matrix elements, respectively, for the observed and the estimated working point.
Download figure:
Standard image High-resolution image

Figure S5. Refer to the following caption and surrounding text. — **Figure S5.** Sensitivity analysis on the degree of degradation in a personalized connectome caused by virtual disorders, based on Kolmogorov–Smirnov (KS) distance (top row), and Kullback–Leibler divergence (KL) divergence (bottom row). The plots represent the results of a grid search over the KL/KS between the distribution of observed and predicted FC (in blue) and FCD (in red) values, computed for incremental increase in (A), (E) for the global scaling parameter G, and (B), (F) for the level of deterioration in inter-hemispheric connections α, and (C), (G) for the degree of intra-hemispheric degradation β within the limbic system. (D), (H) The Hessian values, quantifying the local curvature of KS, and KL distance, respectively, at the ground truth values (in green), using functional data features, show no sensitivity to the β mask. The Hessian values, quantifying the local curvature of KS distance at the ground truth values (in green), indicate more sensitivity to FCD than FC. Note that the finite confidence intervals in the profile of the β mask indicate a practical non-identifiability rather than structural non-identifiability, which demonstrates a flat valley that extends infinitely in both the upper and lower bound.
Download figure:
Standard image High-resolution image

Figure S6. Refer to the following caption and surrounding text. — **Figure S6.** The effect of the number of simulations on uncertainty estimation with training the state-of-the-art deep neural density estimators (MAF and NSF models). (A), (B), and (C) show the plot of posterior shrinkage versus the number of simulations for estimating the parameters G, α, and β, respectively, from FC/FCD, indicating that both MAF and NSF yield compatible results. (D) However, we observed that MAF was 2-4 times faster than NSF during the training process. (E) The plot of z-score versus shrinkage indicates an ideal Bayesian estimation for G and α, but not for the β mask.
Download figure:
Standard image High-resolution image

Figure S7. Refer to the following caption and surrounding text. — **Figure S7.** Inference on homogeneous generative parameters in a virtual brain using only functional data features. Here the set of inferred parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta, J, \Delta \} \in \mathbb{R}^{6}$ . The training process involved using only functional data features (FC/FCD), with a budget of 100k simulations. The diagonal panels display the prior (in blue) and estimated posterior (in red). For most of parameters, the maximum a posterior deviates form true value (green lines) with a large uncertainty. The lower diagonal panels display the observed and predicted BOLD data and the corresponding FC/FCD matrices. The upper diagonal panels show the joint posterior between parameters, and their correlation values (ρ at the upper left corners). The ground-truth values are shown by green stars. High-probability areas are color-coded in yellow, while low-probability areas are represented in black.
Download figure:
Standard image High-resolution image

Figure S8. Refer to the following caption and surrounding text. — **Figure S8.** Inference on heterogeneous excitabilities η_i for 88 brain regions. SBI provides accurate posterior estimation (in red) using both spatio-temporal and functional data features for (A)–(C) different configurations in excitabilities across brain regions (ground truth in green dots). Here, we used 1M random simulations from uniform priors (in blue) as: $\eta_i \in \mathcal{U}(-6, -3.5)$ .
Download figure:
Standard image High-resolution image

**Figure S8.** Inference on heterogeneous excitabilities η_i for 88 brain regions. SBI provides accurate posterior estimation (in red) using both spatio-temporal and functional data features for (A)–(C) different configurations in excitabilities across brain regions (ground truth in green dots). Here, we used 1M random simulations from uniform priors (in blue) as: $\eta_i \in \mathcal{U}(-6, -3.5)$ .
Download figure:
Standard image High-resolution image

Figure S9. Refer to the following caption and surrounding text. — **Figure S9.** SBI on the set of generative parameters denoted by $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{91}$ , where η_i with $i \in \{1, 2, \ldots, N_n = 88\}$ . The training process involved only functional data features, on a budget of 1M simulations. (A) The estimated posterior (in red) for parameters G, α, and β. The prior (in blue), for each was placed between zero and one. The ground truth values are G = 0.62, α = 0.3, β = 0.4 (in green). (B) The join posterior between parameters. (C) The plot of z-score versus shrinkage indicates an ideal estimation for G and α, but poor estimation for β and η_i. (D) The estimated posterior of η_i, for 88 regions. The prior on excitabilities (in blue) was placed as: $\eta_i \in \mathcal{U}(-6, -3.5)$ . This result indicate that relying on functional data is inadequate for accurate inference on heterogeneous excitabilities distributed across the whole brain regions.
Download figure:
Standard image High-resolution image

Figure S10. Refer to the following caption and surrounding text. — **Figure S10.** Estimation on heterogeneous generative parameters, using MAF model for training in SBI. Here the set of generative parameters is $\vec{\theta} = \{G, \alpha, \beta, \eta_i \} \in \mathbb{R}^{91}$ , with $i \in \{1, 2, \ldots, N_n = 88\}$ . The training process involved using both spatio-temporal and functional data features, with a budget of 1M simulations. (A) The estimated posterior (in red) for parameters G, α, and β. The prior (in blue), for these parameters was placed between [0,1]. The ground truth values are G = 0.62, α = 0.3, β = 0.4 (in green). (B) The join posterior between parameters. (C) The plot of z-score versus shrinkage indicates an ideal estimation for G, α, and η_i but poor estimation for β. (D) The estimated posteriors of η_i, for 88 regions (in red) demonstrate no shrinkages from prior (in blue). The prior on excitabilities was placed as: $\eta_i \in \mathcal{U}(-6, -3.5)$ . This result indicates that both spatio-temporal and functional data features provides an ideal Bayesian estimation on all generative parameters, except the intra-hemispheric degradation within the limbic system denoted by β mask.
Download figure:
Standard image High-resolution image

Figure S11. Refer to the following caption and surrounding text. — **Figure S11.** The same analysis as shown in figure S10, but using the NSF model for training in SBI. Both the MAF and NSF models are compatible in accuracy of estimation and uncertainty quantification of the generative parameters.
Download figure:
Standard image High-resolution image

Figure S12. Refer to the following caption and surrounding text. — **Figure S12.** The same analysis as shown in figure 5, but excluding the spatio-temporal data feature from training. In contrast to heterogeneous modeling (figure S9), the hierarchical modeling provides informative posteriors when using only functional data feature for inference, even with a significantly smaller budget of simulations.
Download figure:
Standard image High-resolution image

Figure S13. Refer to the following caption and surrounding text. — **Figure S13.** Evaluation of posterior fit in SBI-VBMs. The plot of z-score versus shrinkage by training MAF on only functional data features (FC/FCD) (A) Inference on parameters $\vec{\theta} = \{G, \alpha, \beta\} \in \mathbb{R}^{3}$ when the stimulation is off, but in (B) When the stimulation is on. (C) Inference on degradation and homogeneous parameters $\vec{\theta} = \{G, \alpha, \beta, \eta, J, \Delta\} \in \mathbb{R}^{6}$ . (D) Inference on degradation and heterogeneous parameters $\vec{\theta} = \{G, \alpha, \beta, \eta_i\} \in \mathbb{R}^{91}$ . (E) Inference on degradation and hierarchical parameters $\vec{\theta} = \{G, \alpha, \beta, \mu_{\eta}, \sigma_{\eta}\} \in \mathbb{R}^{5}$ . The concentration of estimates towards the bottom right or left of the plot implies an ideal or poor Bayesian inversion, respectively. Hence, by using only functional data features, we can achieve ideal estimation of the β mask through stimulation and of excitability η through hierarchical parameterization.
Download figure:
Standard image High-resolution image

Figure S14. Refer to the following caption and surrounding text. — **Figure S14.** The same analysis as shown in figure S13, but using both spatio-temporal and functional data features. The concentration of estimates towards the bottom right or left of the plot implies an ideal or poor Bayesian inversion, respectively. We can see an ideal Bayesian inversion for all parameters regardless of parameterization, except for the β mask, for which ideal estimation requires stimulation..
Download figure:
Standard image High-resolution image

Table S1. Sensitivity analysis conducted based on KS and KL distances between the distributions of values in the observed and predicted FC/FCD matrices.

Hessian value	G	α	β
KS distance between FC matrices	0.1022	0.0085	0.0016
KS distance between FCD matrices	0.2052	0.0213	0.0033
KL distance between FC matrices	0.4128	0.0337	0.0046
KL distance between FCD matrices	0.425 12	0.0484	0.0109

Dates

Peer review information