Precision Spectral Estimation at Sub-Hz Frequencies: Closed-Form Posteriors and Bayesian Noise Projection
Precision Spectral Estimation at Sub-Hz Frequencies: Closed-Form Posteriors and Bayesian Noise Projection
Abstract—We present a Bayesian method for esti- decreases, the Gaussianity assumption becomes increasingly
mating spectral quantities in multivariate Gaussian inaccurate, leading, for instance, to paradoxical outcomes
time series. The approach, based on periodograms such as non-negligible probabilities for negative PSD values.
and Wishart statistics, yields closed-form expressions
at any given frequency for the marginal posterior At M = 1, the frequentist approach becomes entirely
distributions of the individual power spectral densities, infeasible. In such cases, Bayesian inference, free from these
the pairwise coherence, and the multiple coherence, limitations, remains the only viable approach.
as well as for the joint posterior distribution of the We encountered this situation while analyzing data from
full cross-spectral density matrix. In the context of the LISA Pathfinder mission [3], [4], [5]. The mission’s
noise projection—where one series is modeled as a
linear combination of filtered versions of the others, plus objective was to precisely measure the noise spectrum of
a background component—the method also provides force disturbances acting on two nominally freely falling
closed-form posteriors for both the susceptibilities, i.e., test masses in space, reaching acceleration levels as low as
the filter transfer functions, and the power spectral a few fm s−2 /Hz1/2 and frequencies down to approximately
density of the background. Originally developed for the 20 µHz.
analysis of the data from the European Space Agency’s
LISA Pathfinder mission, the method is particularly Unlike typical spectral estimation applications, where the
well-suited to very-low-frequency data, where long goal is to extract a signal from a noisy stochastic process,
observation times preclude averaging over large sets LISA Pathfinder aimed to measure the noise itself with
of periodograms, which would otherwise allow these to the highest possible accuracy, particularly at the lowest
be treated as approximately normally distributed. achievable frequencies.
Index Terms—Signal processing, Spectral analysis, While reviewing the literature for a consistent and
Spectral estimation, Time series decorrelation. practical Bayesian approach suited to our needs, we found
that the fundamental principles had long been established.
I. Introduction However, we could not find a detailed, practical method
applicable to our data processing, leading us to develop
PECTRAL analysis using Welch’s method [1], [2] is the
S most common approach to noise characterization. This
method estimates power spectral densities (PSD) and cross-
one independently.
We applied this method to the data analysis of LISA
Pathfinder in Refs. [3], [6], and briefly summarized its main
spectral densities (CPSD) of multivariate noise time series
features in the appendices of the second paper. Here, we
from their properly normalized discrete Fourier transforms,
provide a detailed description of the method, discussing
known as periodograms.
its foundations, deriving key procedures rigorously, and
To enhance estimate precision and quantify uncertainty,
presenting quantitative evidence of its validity through
Welch’s method divides the time series into M equal-length
numerical simulations.
possibly overlapping segments, generating M periodogram
The paper is organized as follows: in Section II we
samples. A common frequentist approach estimates the
define the key experimental quantities of multi-variate time
PSD and CPSD by averaging these samples, with uncer-
series and derive their likelihood under the Gaussian data
tainty proportional to the standard deviation of the mean.
hypothesis; in Section III we build the Bayesian posteriors
This relies on the central limit theorem, assuming that
for all the related spectral quantities; in Section IV we
averaging rapidly yields Gaussian statistics suitable for
discuss the case of noise projection, that is the case where
confidence level predictions.
one series is modeled as a linear combination of filtered
However, spectral resolution decreases with M , and in
versions of the others, plus a background component of
many applications, particularly those at very low frequency,
which one wants to estimate the spectrum; finally in
M must remain small, sometimes even M = 1. As M
Section V we give some concluding remarks.
Contact information:
[email protected],
[email protected]
Preprint July 28, 2025
2
II. Periodograms, key periodogram functions estimator of Σij (f = k/(N T )), and Xi [k] would be inde-
and their likelihood pendent of Xj∗ [k ′ ] if k ̸= k ′ .
In this section, we recall a few basic concepts and results In reality, w̃(ϕ) is a 2π-periodic function with a central
that we will use in our Bayesian inference method. We lobe at ϕ = 0, and a sequence of strongly suppressed
assume that data are Gaussian for the purpose of the our side lobes. We assume that the aliasing deriving from the
analysis, something we found to be quantitatively true in periodicity of w̃(ϕ) has been made negligible by properly
the case of LISA Pathfinder [3]. choosing a short enough sampling time T .
The width of the central lobe depends on the choice of
the window, but is always of the form ±m(2π/N ), with
A. Basic definitions and nomenclature m a small integer. Thus, if |k − k ′ | > m, then Xi [k] and
In our approach, we assume that we have acquired, Xj∗ [k ′ ] may be treated as independent within a reasonable
synchronously and with sampling time T , the time series of accuracy. Then from Eq. (3),
p real, stationary, Gaussian, zero-mean stochastic processes. 1
Z 2T
We call xi [n], with 1 ≤ i ≤ p, the sample of the i-th series k
⟨Xi [k]Xj∗ [k]⟩ ≃ Σi,j (f )G f − df (5)
taken at time t = nT . 1
− 2T NT
The joint statistics of these samples is fully contained in 2
the mean values of their products with G(f ) = (T /N ) |w̃(2πf T )| . Note that w[n] is always
1
normalised such that − 1 G f − NkT df = 1.
R 2T
Z ∞
2T
⟨xi [n]xj [m]⟩ = Σi,j (f )ei2π(m−n)f T df, (1) Thus, Xi [k]Xj∗ [k] is an estimator of Σi,j (f ) averaged
−∞ over the band f = (k ± 2πm)/(N T ).
and then in the Hermitian positive definite matrix Σ(f ), From now on, with Σ(f ), unless otherwise specified, we
with elements Σi,j (f ), which, by definition, is the joint two- indicate this averaged version of the CPSD matrix.
sided power cross-spectral density (CPSD) matrix of the p As usual in statistics, the precision of the estimator can
stochastic processes at frequency f . Note that the diagonal be increased by averaging over repeated measurements. To
element Σi,i (f ) is the power spectral density (PSD) of this aim, Welch’s method prescribes to split the available
the process xi (t), while the off-diagonal Σi,j (f ) is the pair time series into M segments1 , each of length N , average
CPSD of xi (t) and xj (t). over them, and use the observed CPSD matrix Π[k], with
Our goal is to infer each element of Σ(f ) independently elements
at each frequency, without assuming any functional depen- M
1 X
dence on f . Πij [k] = Xi,(ℓ) [k]Xj,(ℓ)
∗
[k], (6)
The main tool for such an inference is the periodogram M
ℓ=1
Xi [k] calculated over an N -long segment of the multivariate as an estimator of Σ(f = k/(N T )).
time series. Xi [k] is defined as: Thus, in summary, the matrices Π[k], with k ∈
r N −1
T X [0, 2m, 4m, 6m...⌊N/4m⌋] are independent estimators of the
Xi [k] = xi [n] w[n] e−2πikn/N (2) matrices Σ(f ) with f ∈ [0, 2m, 4m, 6m, ..., ⌊N/4m⌋]/(N T ),
N n=0
with a spectral resolution of ±m/(N T ).
with 0 ≤ k ≤ N − 1 an integer, and w[n] the coefficients of One can show that Π(k), which is Hermitian, is positive
a suitable tapering window. Since xi [N − k] = x∗i [k], only definite only if M ≥ p. As the positive definiteness is
the first ⌊N/2⌋ + 1 of these coefficient carry independent mandatory for an estimator of Σ(f = k/(N T )), the
information. minimum number of periodograms one should average on
Xi [k] is Gaussian, complex, and zero-mean. Key for the is p.
inference are the complex mean values: Some common applications deviate from the spectral
Z ∞ estimator with evenly spaced frequencies described above
T
⟨Xi [k]Xj∗ [k ′ ]⟩ = Σi,j (f )× [3], [7], [8]. In those applications, at each frequency of
N −∞
(3) interest f , one adjusts M , and then N , according to some
2π 2π ′
× w̃ k − 2πf T w̃ ∗
k − 2πf T df averaging optimization criterion, and picks just one matrix
N N Π[k], with k selected such that f = k/(N T ). Π[k] is then
PN −1 an estimate of the CPSD at the frequency f . This procedure
with w̃(ϕ) = n=0 w[n]e−iϕn the Fourier sequence trans-
is repeated at each frequency of interest.
form of the tapering window w[n].
As N depends on the frequency, the width of the
If, ideally, one could choose w[n] such that
spectral window is no longer frequency-independent, as
2π 2π ′ it is in the case of uniform spacing. As a consequence,
w̃ k − 2πf T w̃ ∗
k − 2πf T =
N N the independence of the CPSD estimators at different
(4)
2π
= 2πN δkk′ δ k − 2πf T , 1 These segments do not need to be disjoint. It has been shown that,
N as the window w[n] tapers their ends, some overlap between adjoining
segments does not significantly change the statistical properties of
with δkk′ the Kronecker delta of k and k ′ , and δ(ϕ) the the derived quantities with respect to the case of disjoint segments
Dirac delta of ϕ, then Xi [k]Xj∗ [k] would be an unbiased [1].
3
frequencies must, in principle, be established frequency We use the conditional probability in Eq. (11) as the
by frequency. An assessment of the methods in Refs. [3] likelihood function for the inference of Σ
and [8] shows that, for at least some choices of parameters,
the estimators at different frequencies can be considered C. Sampling distribution of derived quantities
practically independent. We will use these methods in
The complex Wishart distribution describes the joint
several examples presented in the remainder of the paper.
Some functions of Π[k] that we will also use in the probability of all the elements of the matrix W . Starting
following are: from that, and employing its mathematical properties [9],
[10], we give the sampling distributions of some derived
• the measured magnitude-squared cross-coherence
quantities that we will use as likelihood functions for the
(MSC), between the i-th and j-th time series
Bayesian inference of the relative quantities.
Πij [k]
2
a) Power spectral density: For p = 1, that is, in case
2
|ρ̂ij [k]| = 1/2 1/2
; (7) of a single univariate stochastic process, calling Π the only
Πii [k] Πjj [k] element of Π, and S the PDF of the process and only
a useful diagnostics of the possible linear correlation element of Σ, the PDF in Eq. (11) reduces to
between the two underlying processes; (M Π)M −1 −M Π/S
• the multiple coherence [9], a useful generalization of p MΠ S = e (12)
2 Γ(M )S M
|ρ̂ij [k]| to the case of multiple series,
−1 This means that Π ∼ Γ(M, S/M ), with Γ(M, S/M ) the
R̂2 [k] = 1 − Π11 [k] Π−1
11 [k] , (8) Gamma distribution with shape parameter M and scale pa-
−1 rameter S/M . Equivalently, Eq. (12) implies that 2M Π/S
with Πi,j the elements of the inverse Π of Π. This is
−1
is chi-square distributed with 2M degrees of freedom. Note
used as a diagnostic of how much of the noise power in
that this result is also obtained by calculating the marginal
x1 [n] is due to its correlation to the remaining series.
distribution of any of the diagonal elements of Π from the
• The Schur complement of any sub-block C in the
joint PDF in Eq. (11).
decomposition of the Hermitian matrix Π[k] as
b) Magnitude squared coherence: Defining the theoret-
Π[k] =
A B
(9) ical ρ ij
B† C Σi,j
ρij = 1/2 1/2 , (13)
Here A is a q × q matrix, C is r × r, B is q × r, and Σi,i Σj,j
q + r = p. The Schur complement of the block C in the sampling distribution of the MSC, |ρ̂ |2 is [9], [11]:
ij
Π[k], is:
Π[k]/C ≡ A − BA B −1 †
(10) p(|ρ̂| |ρ| ) =(M − 1)(1 − |ρ̂| )
2 2 2 M −2
(1 − |ρ|2 )M
We discuss in the following section the sampling distri- × 2 F1 (M, M, 1, |ρ̂|2 |ρ|2 ) (14)
butions of all these quantities. where 2 F1 represents Gauss’ hypergeometric function.
Note that this distribution only holds for M > 1 as,
B. Sampling distribution of the CPSD matrix
notoriously, when M = 1, |ρ̂ij |2 = 1 holds exactly. More in
Reference [9] shows that the joint sampling distribution general, for low values of M , p(|ρ̂|2 |ρ|2 ) carries a significant
of the elements of the matrix W = M Π, conditional to bias toward |ρ̂|2 > |ρ|2 .
the theoretical CPSD matrix Σ, is a complex Wishart c) Multiple coherence: Similarly to the case of MSC,
distribution, with probability density function (PDF): defining the theoretical R2 :
M −p
|W | R2 = 1 − (Σ11 Σ−1
11 ) (15)
−1
p W Σ, M = etr −Σ−1 W (11)
M
Γ
e p (M ) |Σ|
with Σ−1
i,j the elements of Σ
−1
, the PDF of the sample
Here, |·| is the determinant, etr the exponential trace multiple coherence R̂ is [9]:
2
etr(·) = exp (tr (·)), and Γ
e p (M ) is the multivariate complex
Γ(M )
Gamma function: p(R̂2 R2 ) = (R̂2 )p−2 (1 − R̂2 )M −p
p Γ(p − 1)Γ(M − p + 1)
× (1 − R2 )M 2 F1 (M, M, p − 1, |ρ̂|2 |ρ|2 ) (16)
1
Y
Γp (M ) = π
e 2 p(p−1)
Γ(M − i + 1)
i=1
with M ≥ p. In the 2-D case, the multiple coherence and
Note that we have dropped, for clarity, the explicit depen- the MSC coincide.
dence of all quantities on frequency. d) Schur complement: finally, if W is decomposed as:
We denote this distribution2 with CW(Σ, M ). As ex-
pected, CW(Σ, M ) is defined only if M ≥ p, that is, if W
A B
W = (17)
is positive definite. B† C
2We use the symbolic expression a ∼ A to indicate that a random with C r × r, then the Schur complement of C
variable a is distributed according to a distribution A.
Thus, W ∼ CW(Σ, M ). W /C ≡ A − BC −1 B † (18)
4
Figure 2: Plot of p(S|Π) for S|Π ∼ invΓ(M, M Π), for a Figure 3: Comparison of the PSD prediction accuracy of
few different values of M . For the sake of clarity, the PDF the direct frequentist method, its Student-t variant, and
is for the ratio S/Π. the Bayesian posterior in Eq. (20). The plots show the
probability pmiss that the prediction misses the true value,
divided by the estimated likelihood of that same event. For
method that accounts for the √ fact that, in Gaussian each method the simulation has been repeated for equal
statistics, t = (St − Π)/(sΠ / M ) follows a Student’s t- tail credible intervals with likelihood ℓ(1) ≈ 0.68 (1σ) ,
distribution with M − 1 degrees of freedom. Thus, in the ℓ(2) ≈ 0.95 (2σ), and ℓ(3) ≈ 0.997 (3σ) and as a function
calculation of the equal-tail credible intervals, it would of the number M of periodograms in the available sample.
be more accurate to replace −k by the 2ℓ -quantile of the The plots for the Bayesian case are barely distinguishable as
Student-t distribution with M − 1 degrees of freedom, and they are all superimposed on each other at ≃ 1, regardless
+k by the (1 − 2ℓ )-quantile. of the value of M .
To perform the comparison, we have done a simulation.
Each trial of this simulation consists of the following steps.
• We extract M samples Πi from a Γ(1, 1) distribution,
thus simulating M periodograms of a process with Let us start with the basic choice p(Σ) = 1 on the
true PSD Strue = 1. positive definite complex matrices domain. With such a
• From the samples above, we calculate the sample mean
choice
Π and standard deviation sΠ . From these we calculate Σ|W ∼ CW −1 (W , M − p) (23)
the credible intervals with likelihoods ℓ(1), ℓ(2) and with CW −1 the complex inverse Wishart distribution [15].
ℓ(3). We do this for all three methods: the direct This distribution has a few problems. First, it carries
frequentist method, the Student-t variant, and the some bias. Though defining what bias is for a matrix
Bayesian posterior in Eq. (20). distribution may be difficult, it is worth inspecting the
• We check which, if any, of these 9 intervals contains
posterior predictive distribution of a future observation W̃
the true value Strue = 1. conditional on the observation of W .
By performing a large number of trials, we estimate the We find that W −1 · W̃ is W −1 · W̃ ∼ CBII (M, M − p)
p
probability pmiss that the true value S = 1 is not included with CBII (a, b) the matrix-variate type-2 complex Beta
p
within each estimated credible interval, and we compare it distribution [16].
with the estimated likelihood of this same event ℓmiss = 1−ℓ.
Ref. [16] shows that ⟨W −1 · W̃ ⟩ = Ip M/(M −2p) with Ip
A consistent estimator should have pmiss /ℓmiss ≃ 1. The
the p × p identity matrix, while for an unbiased estimation
results of this simulation are shown in Figure 3.
The figure clearly shows that while the Bayesian estimate one would expect ⟨W · W̃ ⟩ = Ip . Note that this mean
−1
is consistent and unbiased, the frequentist method may value bias depends on the number of series considered
have a probability of missing the true value significantly together and becomes infinite when M = 2p.
exceeding the estimated likelihood. The effect increases at That such dependence of the bias on p is paradoxical is
low M and may become rather large for tails beyond the well illustrated by the marginal distribution of the diagonal
ℓ(1) threshold even at M ≃ 50. elements. Indeed, the marginal distribution of Σii , that is
The effect is due to the fact that the use of Gaussian the estimate of the PSD of the i-th time series Si , can be
statistics predicts a credible interval significantly narrower calculated [15] to be Si |Πii = Σii |Πii ∼ invΓ(M − 2p +
than that predicted by the correct invΓ one. Thus for the 1, M Πii ), a distribution only defined for M ≥ 2p, and
same random sample, the true value may belong to the different from that one gets by considering the i-th series
latter, but fall outside the former. alone, Si |Π ∼ invΓ(M − 1, M Πii ).
Thus, just assuming there are other p − 1 series that may
B. Inference of the entire CPSD matrix be correlated with the one under study would change the
In the general case p > 1 some difficulty with the choice inferred posterior for Si , and would induce a bias increasing
of the proper prior for Σ makes the spectral inference more with p.
complex. The situation is slightly better for the Jeffreys prior.
6
i.e., the fraction of the PSD of x(t) contributed by the rior becomes:
disturbances, a useful quantity one wants to estimate. p(Sx0 x0 , α|W ) =
(W0 )M −r
W0
= exp − ×
Γ(M − r)SxM0 x−r+1
0
Sx0 x0 (38)
Wy,y
Sx0 x0 Wy,y
A. Inference of susceptibilities, residuals, and CPSD of × exp −(α − α0 ) · · (α − α0 )† .
πr Sx0 x0
disturbances in the general case
This is equivalent to stating that:
Our starting point is a key re-parametrization of the Sx0 x0 | W ∼ invΓ(M − r, W0 ) (39)
sample distribution in Eq. (11). For the sake of such re-
parametrization, we need to introduce, in analogy with and
Eq. (29), the block partition of W and Π α | W , Sx0 x0 ∼ CN (α0 , Sx0 ,x0 Wyy −1
) (40)
Wxx Wxy
Πxx Πxy
with CN (Sx0 ,x0 Wyy −1
, α0 ) the complex, circularly symmet-
W = † = M (34) ric r-variate Gaussian distribution, with covariance matrix
Wxy Wyy Π†xy Πyy
Sx0 ,x0 Wyy−1
and mean value α0 .
Two functions of W that we also need in the following Note that the marginal distribution of Sx0 x0 is already
are: given by Eq. (39) while that of α may be obtained by
integrating Eq. (38) over Sx0 x0 .
1) the ‘observed’ residual noise PSD Π0 that we define By performing this integration we get that
from −1
(41)
α ∼ ctr α0 , W0 Wyy ,M − r ,
1 1 1
Π0 = × ≡ × W (35)
M − r (W −1 )xx M −r
0
the latest being the complex multivariate t-distribution for
an r-long complex vector, with mean value α0 , scale factor
with (W −1 )xx the upper-left 1 × 1 block of W −1 ;
W0 Wyy −1
and M − r degrees of freedom.
2) the ‘observed’ susceptibility vector
This means that the real and imaginary parts of α, re-
α0 = Wxy · Wyy −1
(36) cast into the 2r-long real vector αR = Re α follow a
Im α
In Appendix A, where we show that the sample distri- joint multivariate Student t2r distribution [18]
bution in Eq. (11) can be re-parametrized as:
αR ∼ t2r (α0,R , Ω, 2(M − r)) , (42)
1
W0
p(W |Sx0 x0 , α, Syy ) ∝ M exp − × with 2(M
Sx0 x0 Sx0 x0 − r) degrees of freedom, mean value α0,R =
Re α0
(α − α0 ) · Wyy · (α − α0 ) † , and a scale matrix given by
× exp − × (37) Im α0
Sx0 x0 −1
1 1 Re Wyy Im Wyy
etr −1 = Π (43)
× −S W Ω
|Syy |M yy yy
2
0
− Im Wyy Re Wyy
Thus, the distribution splits into two independent parts, From this joint marginal distribution, we also get the
one depending on Sx0 x0 and α but not on Syy , and one marginal distributions of the single components of αR that
that only depends on Syy . Thus, if one select a prior of are univariate t distributions with 2(M − r) degrees of free-
dom [18], and scale parameter given by the corresponding
the kind p(Sx0 x0 , α) × p(Syy ), then the posterior also splits
into the product of the joint posterior for Sx0 x0 and α, element in Ω.
with the posterior for Syy alone. The estimate of the latter Note that the covariance of the elements of αR , ((M −
reduce to the estimate of the CPSD that we have already r)/(M − r − 1))Ω, decreases with decreasing Π0 , the PSD
treated. From now on we focus then on the estimate of of residuals. This is expected as, for a given value of the
Sx0 x0 and α only. total PSD, a small Π0 implies a large contribution of the
disturbances and then a large signal-to-noise ratio for the
The most realistic, least informative prior for Sx0 x0 , as components of αR .
for all other PSDs we have met, is again p(Sx0 x0 ) ∝ 1/Sx0 x0 It is also straightforward to calculate that Ω ∝ M −1 , so
independently of the value of α. that this signal-to-noise ratio, as expected, also increases
On the other hand, the components of α are, in the with increasing averaging.
language of statics, location parameters. If they can be The model discussed so far assumes the disturbances yi (t)
assumed independent of each other, then, for each of them, are measured with negligible readout noise. It is therefore
the least informative prior is just p(αi ) = 1. important to consider, before concluding this section, the
From the above consideration, it follows that a sound non- consequences of applying the method when such noise is
informative joint prior for Sx0 x0 , and α is p(Sx0 x0 , α) = in reality not negligible.
1/Sx0 x0 , with which their properly normalized joint poste- Let us consider first the estimate of Sx0 x0 . Our method
9
in reality estimates 1/Σ−1 11 , whatever the detailed form of susceptibilities are then complex, frequency-dependent, and
−1
Σ is. Indeed our starting point is 1/W11 , whose sampling non-causal.
distribution is 1/W11 ∼ Γ(M − r, 1/Σ−1
−1
11 ). For the simulation, we selected the PSD of the residuals
From this, and the distribution in Eq. (20), one x0 to consist of a ∝ 1/f 2 low frequency tail merging into
can derive the sampling distribution of our Bayesian a plateau extending up to some double-pole roll-off. More
estimate for Sx0 ,x0 . We find that this distribution is explicitly4 (see Figure 7):
β ′ M − r, M − r, 1/Σ−1 11,true , a distribution whose median 2
1 − e−2πf1 T 1 − e−2πf2 T
is equal to 1/Σ−1 11 , and a relative uncertainty that only Sx0 (f ) = +
depends on M − r. This confirms that the methods gives 1 − e−2πf1 T e−i2πf T 1 − e−2πf2 T e−i2πf T
an unbiased estimate of 1/Σ−1 11 . 2
1 − e−T /τ e−i2πf0 T
In the presence of readout noise, 1/Σ−1 11 ̸= Sx0 x0 , as +
the true form of Σ is not that in Eq. (29). Indeed, in the 1 − e−T /τ e−i2πf T
(46)
simplest model of additive noise, the measured disturbance
is yi (t) + ni (t), with ni (t) a zero mean stationary process with T = 1 s the sampling time, f1 = 0.10 Hz and f2 =
independent of all the yi ’s. In this case, the lower diagonal 0.11 Hz the two roll-off frequencies, and f0 = 1 mHz the
block of Σ becomes Syy + Sn with Sn a diagonal matrix cross-over frequency between the tail and the plateau.
whose generic element Sni ,ni is the PSD of ni (t). The disturbances are in the form zi (t) = clf,i zlf,i (t) +
Working out the formula for 1/Σ−1 11 in the general case chf,i zhf,i (t) + ci z0 (t), where all the time series on the
is a bit cumbersome. It becomes particularly simple if also right hand side are Gaussian, zero-mean and mutually
Syy is diagonal, that is, if the disturbances are mutually independent, and the coefficient clf,i , chf,i and ci are real
uncorrelated. One can readily calculate that in this case and randomly selected.
r The zlf,i (t) and z0 (t) share the same PSD
X 2 Syi ,yi Sni ni
1/Σ−1
11 = Sx0 x0 + |αi | (44) 2
Syi ,yi + Sni ni 1 − e−T /τ1 e−i2πf0 T 1 − e−T /τ2 e−i2πf0 T
i=1 Slf (f ) = (47)
1 − e−T /τ1 e−i2πf T 1 − e−T /τ2 e−i2πf T
with Syi yi the PSD of yi (t). One can recognize that in the
limit of dominant readout noise 1/Σ−1 11 → Sxx . In other
with τ1 = 1.0×105 s and τ2 = 1.1×105 s. For f ≫ 1/τ1 , 1/τ2
words, a dominant readout noise, as expected, completely this PSD amounts to a ∝ 1/f 4 low frequency tail with unit
obscures any correlation between x(t) and the y’s. value at f = f0 .
Furthermore, within the samePsimplification of uncorre- The zhf,i (t) have PSD
p
lated disturbances, the product j=2 Σ1,j Σ−1 , with k > 1 2
j,k
1 − e−T /τ e−i2πf0 T
which, in the noiseless limit, is (Sxy · Syy −1
)k−1 = αk−1 , Shf (f ) = (48)
becomes instead 1 − e−T /τ e−i2πf T
Xp
Syk−1 yk−1 again a ∝ 1/f 2 tail with unit value at f = f0 . The presence
Σ1,j Σ−1 = α k−1 . (45) of the shared series z0 (t) induces correlation among the z’s
j,k
Syk−1 ,yk−1 + Snk−1 nk−1
j=2 with CPSD Szi ,zj (f ) = ci cj Slf (f ).
Thus, in the presence of significant readout noise, our All PSDs above must be intended to be zero for |f | ≥
method overestimates the PSD of the residuals, underesti- 1/(2T ). With this prescription, they can be read as discrete
mates the absolute value of the susceptibility, and should time Fourier transforms of the corresponding discrete time
only be used for an upper limit on Sx0 ,x0 . autocorrelation, and their shape allows a straightforward
We have used the approach described in this section implementation as auto-regressive moving average (ARMA)
to decorrelate the effect of the temperature from the stochastic processes.
acceleration data series of LPF [3]. To further test its In Figure 7, we illustrate the ASD of the simulated time
validity, in particular with respect to bias, we have also series.
studied a simulated case. This is discussed in the next As for the filter h(t), its transfer function is
section. 1 − e−2πT /τa 1 − e−2πT /τb
h(f ) = (49)
1 − e−2πT /τa e−i2πT f 1 − e−2πT /τb e−i2πT f
B. A test simulation with τa = 2000 s and τa = 2001 s, and zero for |f | ≥ 1/(2T ).
One can recognize the transfer function of a discrete-time
PWe3
have generated a times series x(t) = x0 (t) +
two-pole infinite impulse response low-pass filter, which is
i=1 ni zi (t), with all series Gaussian and zero-mean, and
with ni three real coefficients. We have also generated the easily implemented again as an ARMA filter on the discrete
three “observed” disturbances yi (t) = h(t) ∗ zi (t), with time series of the z’s.
h(t) the impulse response of a low-pass filter, and with ∗ We think that all in all this model possesses many
indicating the time-convolution. features of a realistic situation, complex frequency depen-
Within this simple model, the susceptibilities become 4 Note that in this section we show and calculate single-sided PSDs,
αi (f ) = ni /h(f ) with h(f ) the frequency response of as this is the standard practice. Discussion, results, and susceptibilities
the filter, that is the Fourier transform of h(t). The are not affected by this choice.
10
Figure 7: ASD of time series used in the simulation. The Figure 8: Example of noise decorrelation for one sample
dashed lines represent the ‘true values’ that is those used to of the multivariate time series {x(t), y1 (t), y2 (t), y3 (t)}
generate the simulated data that have been calculated from with the same set of numerical coefficient used for the
Eqs. 46 to 48, and from one random extraction of the set of data in Figure 7. Top panel. Black data points: ASD
numerical coefficients ni , clf,i , chf,i and ci . The noisy lines of x(t) estimated using the posterior in Eq. (20). Dots
are the averages of the estimated ASD from 100 different represent the medians of the ASD posteriors, while error
simulations generated from the true spectrum above. Time bars delimit their ℓ(1) (≃ 68.5% likelihood) equal-tail
series are sampled with T = 1 and last 5 × 105 s. Frequency credible intervals. Red data points: ASD of x0 (t) estimated
dependent data partition for periodogram calculation is using the marginal posterior in Eq. (39). Dots and bars have
performed according to the method of Ref. [8]. We used the the same meaning as for the black data points. ASDs have
Nuttall four-coefficient minimal-side-lobe spectral window been estimated with the Nuttall window, and the frequency
[19]. separation is such that nearest neighbors may have a
linear correlation in the 10-30% range. Correlation between
the second-nearest neighbors is negligible. Red dashed
dence of PSD, cross-correlation among disturbances, high line: true value from Eq. (46). Lower panel. Black data
data dynamic range, complex susceptibility etc., to give a points: posterior distribution for the multiple coherence
meaningful test of the method. coefficient R2 for the same data. The posterior is that
In Figure 8 we show the result of the decorrelation on one in Eq. (26), dots are medians, and bars delimit the ℓ(2)
example of a 5 × 105 s multivariate time series generated (≃ 95.5% likelihood) symmetric-tail credible intervals. Red
as described above. dashed line: corresponding true value for R2 calculated as
The figure clearly shows that the method, at least for 1 − Sx0 (f )/Sx (f ), with Sx (f ) the true spectral density of
this example, gives an unbiased estimate of the ASD x(t).
of the residuals. The ASD indeed fluctuates, within the
uncertainties predicted by the posterior in Eq. (20), around
Within this frequency range the estimate appears unbiased
the true value in Eq. (46). The figure also shows, for
and in agreement with the true value within the uncertain-
reference, the estimate of the multiple coherence coefficient
ties predicted by the proper marginal t-distribution.
R2 , from the posterior in Eq. (27). The plot indicates that
at f ≃ 1 mHz, where M ≃ 30, the method allows to detect
a ≃ 10% contribution of the disturbances to the total PSD. C. The case of real frequency-independent susceptibilities
In Figure 9 we show the estimate of the susceptibility In many practical circumstances, one can safely assume
α1 (f ) for the same set of data used for Figure 8, and that α is a real frequency-independent vector. If this is
compare it with the true value n1 /h(f ), with h(f ) from the case, α becomes a common parameter in the sampling
Eq. (49). The figure is limited to f ≃ 1 mHz, as above that distribution of the W ’s at all frequencies. Thus, to build up
frequency the susceptibility is in practice compatible with a posterior for α, one needs to consider the joint likelihood
α1 = 0, as expected from the fact that the contribution of all the W ’s for a given value of α. We anticipate that, in
of the disturbances to total power becomes undetectable. this case, we do not get a closed form posterior distribution,
11
Acknowledgments
This work has been supported in part by Agenzia
Spaziale Italiana (ASI), Project No. 2017-29-H.1-2020
Figure 11: The joint posterior for the three susceptibilities “Attività per la fase A della missione LISA”, and Project
α1 , α2 , and α3 . The red surface delimits a credible region No. 2024-36-HH.0-2024 “Attività per la fase B2/C della
with ≃ ℓ(1) likelihood, while the cyan, semi-transparent missione LISA”. The authors would like to acknowledge
surface delimits a credible region with ≃ ℓ(2) likelihood. various useful discussions with all the members of the
The green axes cross at the true value α = n, with n the Trento LISA group.
vector with components ni used in the simulation.
V. Conclusions
In conclusion, we have presented a set of Bayesian low-
bias closed-form posteriors—based on simple and physically
meaningful priors—for the most commonly estimated
quantities at a given frequency in the spectral analysis
of multivariate time series, and in particular in noise
projection of physical instruments.
The distributions of some of these priors are available
within the main software platforms, which makes the calcu-
lation of credible intervals and other statistical quantities
particularly simple. For the others, we give the explicit form
of the PDF that can be used to numerically calculate the
relevant statistical quantities. For the reader’s convenience,
these posteriors are summarized in Table I.
For the case of noise projection, we have shown with
simulations that the method is capable of retrieving,
with negligible bias, a residual whose ASD is orders of
magnitude smaller than the part due to the measured
disturbances, and we have also investigated the robustness
of the method in the presence of readout noise in the
disturbance measurement.
13
(M + 1)(1 − R2 )M ×
Multiple 2 2
× 2F1 (M,M,p−1,R̂ R )
R2 – (1, M, M ) Sect. III-D
coherence ;R̂2
p Fq
(M + 2, p − 1)
Noise Projection
Table I: Summary of closed-form posteriors presented in this paper. For the meaning of the symbols, please refer to the
section indicated in the rightmost column.
References
[1] P. D. Welch, “The use of fast Fourier transform for the esti-
mation of power spectra: a method based on time averaging
over short, modified periodograms,” IEEE Trans. Audio and
Electroacoustics, vol. 15, no. 2, pp. 70–73, 1967.
[2] A. Papoulis and S. Pillai, Probability, Random Variables, and
Stochastic Processes, ser. McGraw-Hill series in electrical engi-
neering: Communications and signal processing. Tata McGraw-
Hill, 2002.
[3] M. Armano et al., “In-depth analysis of LISA Pathfinder perfor-
mance results: Time evolution, noise projection, physical models,
and implications for LISA,” Phys. Rev. D, vol. 110, p. 042004,
Aug 2024.
[4] ——, “Beyond the Required LISA Free-Fall Performance: New
LISA Pathfinder Results down to 20 µHz,” Phys. Rev. Lett., vol.
120, p. 061101, 2 2018.
[5] ——, “Sub-Femto-g Free Fall for Space-Based Gravitational
Wave Observatories: LISA Pathfinder Results,” Phys. Rev. Lett.,
vol. 116, p. 231101, 2016.