0% found this document useful (0 votes)
7 views14 pages

Precision Spectral Estimation at Sub-Hz Frequencies: Closed-Form Posteriors and Bayesian Noise Projection

The document presents a Bayesian method for estimating spectral quantities in multivariate Gaussian time series, particularly suitable for very-low-frequency data. It addresses limitations of frequentist approaches in spectral estimation, especially when the number of segments is small, and provides closed-form posteriors for power spectral densities and coherence measures. The method was developed for analyzing data from the LISA Pathfinder mission, aiming to accurately measure noise spectra at low frequencies.

Uploaded by

José Martínez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Precision Spectral Estimation at Sub-Hz Frequencies: Closed-Form Posteriors and Bayesian Noise Projection

The document presents a Bayesian method for estimating spectral quantities in multivariate Gaussian time series, particularly suitable for very-low-frequency data. It addresses limitations of frequentist approaches in spectral estimation, especially when the number of segments is small, and provides closed-form posteriors for power spectral densities and coherence measures. The method was developed for analyzing data from the LISA Pathfinder mission, aiming to accurately measure noise spectra at low frequencies.

Uploaded by

José Martínez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Precision spectral estimation at sub-Hz frequencies:


closed-form posteriors and Bayesian noise projection
Lorenzo Sala 1,2 , Stefano Vitale 1
1
Department of Physics, University of Trento, I-38123 Trento, Italy
2
Trento Institute for Fundamental Physics and Applications, TIFPA/INFN,
I-38123 Trento, Italy
arXiv:2507.20846v1 [astro-ph.IM] 28 Jul 2025

Abstract—We present a Bayesian method for esti- decreases, the Gaussianity assumption becomes increasingly
mating spectral quantities in multivariate Gaussian inaccurate, leading, for instance, to paradoxical outcomes
time series. The approach, based on periodograms such as non-negligible probabilities for negative PSD values.
and Wishart statistics, yields closed-form expressions
at any given frequency for the marginal posterior At M = 1, the frequentist approach becomes entirely
distributions of the individual power spectral densities, infeasible. In such cases, Bayesian inference, free from these
the pairwise coherence, and the multiple coherence, limitations, remains the only viable approach.
as well as for the joint posterior distribution of the We encountered this situation while analyzing data from
full cross-spectral density matrix. In the context of the LISA Pathfinder mission [3], [4], [5]. The mission’s
noise projection—where one series is modeled as a
linear combination of filtered versions of the others, plus objective was to precisely measure the noise spectrum of
a background component—the method also provides force disturbances acting on two nominally freely falling
closed-form posteriors for both the susceptibilities, i.e., test masses in space, reaching acceleration levels as low as
the filter transfer functions, and the power spectral a few fm s−2 /Hz1/2 and frequencies down to approximately
density of the background. Originally developed for the 20 µHz.
analysis of the data from the European Space Agency’s
LISA Pathfinder mission, the method is particularly Unlike typical spectral estimation applications, where the
well-suited to very-low-frequency data, where long goal is to extract a signal from a noisy stochastic process,
observation times preclude averaging over large sets LISA Pathfinder aimed to measure the noise itself with
of periodograms, which would otherwise allow these to the highest possible accuracy, particularly at the lowest
be treated as approximately normally distributed. achievable frequencies.
Index Terms—Signal processing, Spectral analysis, While reviewing the literature for a consistent and
Spectral estimation, Time series decorrelation. practical Bayesian approach suited to our needs, we found
that the fundamental principles had long been established.
I. Introduction However, we could not find a detailed, practical method
applicable to our data processing, leading us to develop
PECTRAL analysis using Welch’s method [1], [2] is the
S most common approach to noise characterization. This
method estimates power spectral densities (PSD) and cross-
one independently.
We applied this method to the data analysis of LISA
Pathfinder in Refs. [3], [6], and briefly summarized its main
spectral densities (CPSD) of multivariate noise time series
features in the appendices of the second paper. Here, we
from their properly normalized discrete Fourier transforms,
provide a detailed description of the method, discussing
known as periodograms.
its foundations, deriving key procedures rigorously, and
To enhance estimate precision and quantify uncertainty,
presenting quantitative evidence of its validity through
Welch’s method divides the time series into M equal-length
numerical simulations.
possibly overlapping segments, generating M periodogram
The paper is organized as follows: in Section II we
samples. A common frequentist approach estimates the
define the key experimental quantities of multi-variate time
PSD and CPSD by averaging these samples, with uncer-
series and derive their likelihood under the Gaussian data
tainty proportional to the standard deviation of the mean.
hypothesis; in Section III we build the Bayesian posteriors
This relies on the central limit theorem, assuming that
for all the related spectral quantities; in Section IV we
averaging rapidly yields Gaussian statistics suitable for
discuss the case of noise projection, that is the case where
confidence level predictions.
one series is modeled as a linear combination of filtered
However, spectral resolution decreases with M , and in
versions of the others, plus a background component of
many applications, particularly those at very low frequency,
which one wants to estimate the spectrum; finally in
M must remain small, sometimes even M = 1. As M
Section V we give some concluding remarks.
Contact information:
[email protected],
[email protected]
Preprint July 28, 2025
2

II. Periodograms, key periodogram functions estimator of Σij (f = k/(N T )), and Xi [k] would be inde-
and their likelihood pendent of Xj∗ [k ′ ] if k ̸= k ′ .
In this section, we recall a few basic concepts and results In reality, w̃(ϕ) is a 2π-periodic function with a central
that we will use in our Bayesian inference method. We lobe at ϕ = 0, and a sequence of strongly suppressed
assume that data are Gaussian for the purpose of the our side lobes. We assume that the aliasing deriving from the
analysis, something we found to be quantitatively true in periodicity of w̃(ϕ) has been made negligible by properly
the case of LISA Pathfinder [3]. choosing a short enough sampling time T .
The width of the central lobe depends on the choice of
the window, but is always of the form ±m(2π/N ), with
A. Basic definitions and nomenclature m a small integer. Thus, if |k − k ′ | > m, then Xi [k] and
In our approach, we assume that we have acquired, Xj∗ [k ′ ] may be treated as independent within a reasonable
synchronously and with sampling time T , the time series of accuracy. Then from Eq. (3),
p real, stationary, Gaussian, zero-mean stochastic processes. 1
Z 2T  
We call xi [n], with 1 ≤ i ≤ p, the sample of the i-th series k
⟨Xi [k]Xj∗ [k]⟩ ≃ Σi,j (f )G f − df (5)
taken at time t = nT . 1
− 2T NT
The joint statistics of these samples is fully contained in 2
the mean values of their products with G(f ) = (T /N ) |w̃(2πf T )| . Note that w[n] is always
1
normalised such that − 1 G f − NkT df = 1.
R 2T 
Z ∞
2T
⟨xi [n]xj [m]⟩ = Σi,j (f )ei2π(m−n)f T df, (1) Thus, Xi [k]Xj∗ [k] is an estimator of Σi,j (f ) averaged
−∞ over the band f = (k ± 2πm)/(N T ).
and then in the Hermitian positive definite matrix Σ(f ), From now on, with Σ(f ), unless otherwise specified, we
with elements Σi,j (f ), which, by definition, is the joint two- indicate this averaged version of the CPSD matrix.
sided power cross-spectral density (CPSD) matrix of the p As usual in statistics, the precision of the estimator can
stochastic processes at frequency f . Note that the diagonal be increased by averaging over repeated measurements. To
element Σi,i (f ) is the power spectral density (PSD) of this aim, Welch’s method prescribes to split the available
the process xi (t), while the off-diagonal Σi,j (f ) is the pair time series into M segments1 , each of length N , average
CPSD of xi (t) and xj (t). over them, and use the observed CPSD matrix Π[k], with
Our goal is to infer each element of Σ(f ) independently elements
at each frequency, without assuming any functional depen- M
1 X
dence on f . Πij [k] = Xi,(ℓ) [k]Xj,(ℓ)

[k], (6)
The main tool for such an inference is the periodogram M
ℓ=1
Xi [k] calculated over an N -long segment of the multivariate as an estimator of Σ(f = k/(N T )).
time series. Xi [k] is defined as: Thus, in summary, the matrices Π[k], with k ∈
r N −1
T X [0, 2m, 4m, 6m...⌊N/4m⌋] are independent estimators of the
Xi [k] = xi [n] w[n] e−2πikn/N (2) matrices Σ(f ) with f ∈ [0, 2m, 4m, 6m, ..., ⌊N/4m⌋]/(N T ),
N n=0
with a spectral resolution of ±m/(N T ).
with 0 ≤ k ≤ N − 1 an integer, and w[n] the coefficients of One can show that Π(k), which is Hermitian, is positive
a suitable tapering window. Since xi [N − k] = x∗i [k], only definite only if M ≥ p. As the positive definiteness is
the first ⌊N/2⌋ + 1 of these coefficient carry independent mandatory for an estimator of Σ(f = k/(N T )), the
information. minimum number of periodograms one should average on
Xi [k] is Gaussian, complex, and zero-mean. Key for the is p.
inference are the complex mean values: Some common applications deviate from the spectral
Z ∞ estimator with evenly spaced frequencies described above
T
⟨Xi [k]Xj∗ [k ′ ]⟩ = Σi,j (f )× [3], [7], [8]. In those applications, at each frequency of
N −∞
(3) interest f , one adjusts M , and then N , according to some
2π 2π ′
   
× w̃ k − 2πf T w̃ ∗
k − 2πf T df averaging optimization criterion, and picks just one matrix
N N Π[k], with k selected such that f = k/(N T ). Π[k] is then
PN −1 an estimate of the CPSD at the frequency f . This procedure
with w̃(ϕ) = n=0 w[n]e−iϕn the Fourier sequence trans-
is repeated at each frequency of interest.
form of the tapering window w[n].
As N depends on the frequency, the width of the
If, ideally, one could choose w[n] such that
spectral window is no longer frequency-independent, as
2π 2π ′ it is in the case of uniform spacing. As a consequence,
   
w̃ k − 2πf T w̃ ∗
k − 2πf T =
N N the independence of the CPSD estimators at different
(4)

 
= 2πN δkk′ δ k − 2πf T , 1 These segments do not need to be disjoint. It has been shown that,
N as the window w[n] tapers their ends, some overlap between adjoining
segments does not significantly change the statistical properties of
with δkk′ the Kronecker delta of k and k ′ , and δ(ϕ) the the derived quantities with respect to the case of disjoint segments
Dirac delta of ϕ, then Xi [k]Xj∗ [k] would be an unbiased [1].
3

frequencies must, in principle, be established frequency We use the conditional probability in Eq. (11) as the
by frequency. An assessment of the methods in Refs. [3] likelihood function for the inference of Σ
and [8] shows that, for at least some choices of parameters,
the estimators at different frequencies can be considered C. Sampling distribution of derived quantities
practically independent. We will use these methods in
The complex Wishart distribution describes the joint
several examples presented in the remainder of the paper.
Some functions of Π[k] that we will also use in the probability of all the elements of the matrix W . Starting
following are: from that, and employing its mathematical properties [9],
[10], we give the sampling distributions of some derived
• the measured magnitude-squared cross-coherence
quantities that we will use as likelihood functions for the
(MSC), between the i-th and j-th time series
Bayesian inference of the relative quantities.
Πij [k]
2
a) Power spectral density: For p = 1, that is, in case
2
|ρ̂ij [k]| = 1/2 1/2
; (7) of a single univariate stochastic process, calling Π the only
Πii [k] Πjj [k] element of Π, and S the PDF of the process and only
a useful diagnostics of the possible linear correlation element of Σ, the PDF in Eq. (11) reduces to
between the two underlying processes;  (M Π)M −1 −M Π/S
• the multiple coherence [9], a useful generalization of p MΠ S = e (12)
2 Γ(M )S M
|ρ̂ij [k]| to the case of multiple series,
−1 This means that Π ∼ Γ(M, S/M ), with Γ(M, S/M ) the
R̂2 [k] = 1 − Π11 [k] Π−1
11 [k] , (8) Gamma distribution with shape parameter M and scale pa-
−1 rameter S/M . Equivalently, Eq. (12) implies that 2M Π/S
with Πi,j the elements of the inverse Π of Π. This is
−1
is chi-square distributed with 2M degrees of freedom. Note
used as a diagnostic of how much of the noise power in
that this result is also obtained by calculating the marginal
x1 [n] is due to its correlation to the remaining series.
distribution of any of the diagonal elements of Π from the
• The Schur complement of any sub-block C in the
joint PDF in Eq. (11).
decomposition of the Hermitian matrix Π[k] as
  b) Magnitude squared coherence: Defining the theoret-
Π[k] =
A B
(9) ical ρ ij
B† C Σi,j
ρij = 1/2 1/2 , (13)
Here A is a q × q matrix, C is r × r, B is q × r, and Σi,i Σj,j
q + r = p. The Schur complement of the block C in the sampling distribution of the MSC, |ρ̂ |2 is [9], [11]:
ij
Π[k], is:
Π[k]/C ≡ A − BA B −1 †
(10) p(|ρ̂| |ρ| ) =(M − 1)(1 − |ρ̂| )
2 2 2 M −2
(1 − |ρ|2 )M
We discuss in the following section the sampling distri- × 2 F1 (M, M, 1, |ρ̂|2 |ρ|2 ) (14)
butions of all these quantities. where 2 F1 represents Gauss’ hypergeometric function.
Note that this distribution only holds for M > 1 as,
B. Sampling distribution of the CPSD matrix
notoriously, when M = 1, |ρ̂ij |2 = 1 holds exactly. More in
Reference [9] shows that the joint sampling distribution general, for low values of M , p(|ρ̂|2 |ρ|2 ) carries a significant
of the elements of the matrix W = M Π, conditional to bias toward |ρ̂|2 > |ρ|2 .
the theoretical CPSD matrix Σ, is a complex Wishart c) Multiple coherence: Similarly to the case of MSC,
distribution, with probability density function (PDF): defining the theoretical R2 :
M −p
|W | R2 = 1 − (Σ11 Σ−1
11 ) (15)
−1
p W Σ, M = etr −Σ−1 W (11)
  
M
Γ
e p (M ) |Σ|
with Σ−1
i,j the elements of Σ
−1
, the PDF of the sample
Here, |·| is the determinant, etr the exponential trace multiple coherence R̂ is [9]:
2
etr(·) = exp (tr (·)), and Γ
e p (M ) is the multivariate complex
Γ(M )
Gamma function: p(R̂2 R2 ) = (R̂2 )p−2 (1 − R̂2 )M −p
p Γ(p − 1)Γ(M − p + 1)
× (1 − R2 )M 2 F1 (M, M, p − 1, |ρ̂|2 |ρ|2 ) (16)
1
Y
Γp (M ) = π
e 2 p(p−1)
Γ(M − i + 1)
i=1
with M ≥ p. In the 2-D case, the multiple coherence and
Note that we have dropped, for clarity, the explicit depen- the MSC coincide.
dence of all quantities on frequency. d) Schur complement: finally, if W is decomposed as:
We denote this distribution2 with CW(Σ, M ). As ex-
pected, CW(Σ, M ) is defined only if M ≥ p, that is, if W
 
A B
W = (17)
is positive definite. B† C
2We use the symbolic expression a ∼ A to indicate that a random with C r × r, then the Schur complement of C
variable a is distributed according to a distribution A.
Thus, W ∼ CW(Σ, M ). W /C ≡ A − BC −1 B † (18)
4

is distributed as CW(Σ/S, M −r), where Σ/S is the Schur


complement of the r × r block in Σ [12],[13, p. 539]. This
result holds only if M > r.

III. Bayesian inference for spectral quantities

We now use the results from the previous section to


perform Bayesian inference of the theoretical distribution
underlying a set of observed spectral quantities.
Our starting point is the likelihood in Eq. (11), which,
when multiplied by an appropriate prior distribution p(Σ), Figure 1: Cumulative density function cdf of the posterior
yields the Bayesian posterior for the theoretical CPSD predictive distribution of a future observation Π̃, con-
matrix Σ. Since Σ is the only free parameter in the sample ditional on the past one Π, for the three prior options
distribution, this posterior fully captures the statistical discussed in the text: flat, Jeffreys, and 1/S 2 . The function
information of the stochastic processes under investigation. is calculated at Π̃ = Π and plotted as a function of the
The key step in this approach is selecting a suitable prior. number of averaged periodograms M .
Before addressing the general case for p > 1, we begin with
the simpler, yet illuminating case of p = 1, the inference
of the PSD of a single stochastic process, which provides tion of a further observation Π̃, conditional on the past
valuable insight for the general case. observation Π, the PDF of which is, by definition:
Z ∞
p(Π̃|Π) = p(Π̃|S)p(S|Π)dS. (22)
A. Inference of the PSD for a single stochastic process 0

The calculation gives Π̃|Π ∼ β ′ M, M̃ , 1, Π with β ′ the



When p = 1, Eq. (11) becomes Eq. (12), and to build a
posterior for the PSD S we need a prior p(S). beta prime distribution. The integer M̃ is M̃ = M − 1,
M̃ = M , M̃ = M + 1 for the flat, Jeffreys and 1/S 2 priors
We have considered three options.
respectively.
1) The uniform, non-informative prior p(S) = Θ(S), It seems reasonable that a posterior with minimum
with Θ(S) the Heaviside theta function. With this bias should assign equal or similar probabilities to future
choice, the posterior distribution of S conditional on observations larger than the past observation, Π̃ ≥ Π,
the observation of Π is and to those smaller Π̃ ≤ Π. This means that the
S|Π ∼ invΓ(M − 1, M Π) (19) cumulative distribution function (cdf) c(Π̃|Π) should obey
c(Π̃ = Π|Π) ≃ 1. In Figure 1, we plot c(Π̃ = Π|Π) as a
with invΓ the inverse gamma distribution. This poste- function of M for the three different priors. The figure
rior is only defined for M > 1. clearly shows that, within this definition of bias, the only
2) Jeffreys non-informative prior [14]. Calculating the unbiased choice is the Jeffreys prior.
Fisher information I(S) from Eq. (12),
p as prescribed In conclusion, given that the Jeffreys prior is unbiased,
by Jeffreys formula, we get p(S) ∝ I(S) ∝ 1/S, for defined down to M = 1, invariant under reparametrization,
S ≥ 0. and based on a very realistic assumption about the lack
As p(log(s)) = S × p(S), the Jeffreys prior is uniform of prior knowledge on the order of magnitude of S, we
as a function of log(s) and corresponds then to a definitely adopt it as the preferred choice.
complete lack of prior knowledge even on the order of As a consequence, we adopt the posterior for S in Eq. (20).
magnitude of S, a rather realistic description of the A plot of the PDF of this posterior for a few choices of M
situation in most cases of noise calibration. is shown in Figure 2.
Note that the main property of the Jeffreys prior is the Note that at low values of M the PDF is rather skew,
invariance under re-parametrization. Thus, the switch with rather asymmetric equal probability tails around the
S → log(S) does not change the prior probability of median. This shows that a naive use of Gaussian statistics
an event. may become highly inaccurate.
With the Jeffreys prior: To further illustrate this point, we compare the predic-
tions of the posterior in Eq. (20) to those of the simplified,
S|Π ∼ invΓ(M, M Π) (20)
Gaussian-based frequentist method,  which defines
√  equal-
3) For comparison we have also considered a prior p(S) = tail credible intervals for S as S ∈ Π ± ksΠ / M , where
1/S 2 that yields the posterior sΠ is the periodogram sample standard deviation. The
parameter k determines the likelihood ℓ(k) of the interval.
S|Π ∼ invΓ(M + 1, M Π) (21)
Since this method is based on Gaussian statistics, it yields
The three posteriors above carry some bias. To quantify, ℓ(1) ≈ 0.68, ℓ(2) ≈ 0.95, and ℓ(3) ≈ 0.997.
it is useful to calculate the posterior predictive distribu- We also include in the comparison a variant of this
5

Figure 2: Plot of p(S|Π) for S|Π ∼ invΓ(M, M Π), for a Figure 3: Comparison of the PSD prediction accuracy of
few different values of M . For the sake of clarity, the PDF the direct frequentist method, its Student-t variant, and
is for the ratio S/Π. the Bayesian posterior in Eq. (20). The plots show the
probability pmiss that the prediction misses the true value,
divided by the estimated likelihood of that same event. For
method that accounts for the √ fact that, in Gaussian each method the simulation has been repeated for equal
statistics, t = (St − Π)/(sΠ / M ) follows a Student’s t- tail credible intervals with likelihood ℓ(1) ≈ 0.68 (1σ) ,
distribution with M − 1 degrees of freedom. Thus, in the ℓ(2) ≈ 0.95 (2σ), and ℓ(3) ≈ 0.997 (3σ) and as a function
calculation of the equal-tail credible intervals, it would of the number M of periodograms in the available sample.
be more accurate to replace −k by the 2ℓ -quantile of the The plots for the Bayesian case are barely distinguishable as
Student-t distribution with M − 1 degrees of freedom, and they are all superimposed on each other at ≃ 1, regardless
+k by the (1 − 2ℓ )-quantile. of the value of M .
To perform the comparison, we have done a simulation.
Each trial of this simulation consists of the following steps.
• We extract M samples Πi from a Γ(1, 1) distribution,
thus simulating M periodograms of a process with Let us start with the basic choice p(Σ) = 1 on the
true PSD Strue = 1. positive definite complex matrices domain. With such a
• From the samples above, we calculate the sample mean
choice
Π and standard deviation sΠ . From these we calculate Σ|W ∼ CW −1 (W , M − p) (23)
the credible intervals with likelihoods ℓ(1), ℓ(2) and with CW −1 the complex inverse Wishart distribution [15].
ℓ(3). We do this for all three methods: the direct This distribution has a few problems. First, it carries
frequentist method, the Student-t variant, and the some bias. Though defining what bias is for a matrix
Bayesian posterior in Eq. (20). distribution may be difficult, it is worth inspecting the
• We check which, if any, of these 9 intervals contains
posterior predictive distribution of a future observation W̃
the true value Strue = 1. conditional on the observation of W .
By performing a large number of trials, we estimate the We find that W −1 · W̃ is W −1 · W̃ ∼ CBII (M, M − p)
p
probability pmiss that the true value S = 1 is not included with CBII (a, b) the matrix-variate type-2 complex Beta
p
within each estimated credible interval, and we compare it distribution [16].
with the estimated likelihood of this same event ℓmiss = 1−ℓ.
Ref. [16] shows that ⟨W −1 · W̃ ⟩ = Ip M/(M −2p) with Ip
A consistent estimator should have pmiss /ℓmiss ≃ 1. The
the p × p identity matrix, while for an unbiased estimation
results of this simulation are shown in Figure 3.
The figure clearly shows that while the Bayesian estimate one would expect ⟨W · W̃ ⟩ = Ip . Note that this mean
−1

is consistent and unbiased, the frequentist method may value bias depends on the number of series considered
have a probability of missing the true value significantly together and becomes infinite when M = 2p.
exceeding the estimated likelihood. The effect increases at That such dependence of the bias on p is paradoxical is
low M and may become rather large for tails beyond the well illustrated by the marginal distribution of the diagonal
ℓ(1) threshold even at M ≃ 50. elements. Indeed, the marginal distribution of Σii , that is
The effect is due to the fact that the use of Gaussian the estimate of the PSD of the i-th time series Si , can be
statistics predicts a credible interval significantly narrower calculated [15] to be Si |Πii = Σii |Πii ∼ invΓ(M − 2p +
than that predicted by the correct invΓ one. Thus for the 1, M Πii ), a distribution only defined for M ≥ 2p, and
same random sample, the true value may belong to the different from that one gets by considering the i-th series
latter, but fall outside the former. alone, Si |Π ∼ invΓ(M − 1, M Πii ).
Thus, just assuming there are other p − 1 series that may
B. Inference of the entire CPSD matrix be correlated with the one under study would change the
In the general case p > 1 some difficulty with the choice inferred posterior for Si , and would induce a bias increasing
of the proper prior for Σ makes the spectral inference more with p.
complex. The situation is slightly better for the Jeffreys prior.
6

This can be calculated to be [17] p(Σ) = |Σ|−p yielding


Σ|W ∼ CW −1 (W , M ). (24)
With this choice, ⟨W −1 · W̃ ⟩ = Ip M/(M − p). Still
Si |Π = Σii |Π ∼ invΓ(M − p + 1, M Πii ), instead of the
almost unbiased posterior Si |Π ∼ invΓ(M, M Πii ) that one
gets from the Jeffreys prior in the p = 1 case. Thus, the
bias is somewhat reduced but still paradoxically depends
on p.
The prior that gives the marginal posterior Σii |Π ∼
invΓ(M, M Πii ), consistent with that estimated from the
i-th series alone and the Jeffreys prior, is p(Σ) = |Σ|−2p+1 . Figure 4: The equal tail, ℓ(2)(≃ 0.95) likelihood credible
For this: intervals for MSC (error bars) as a function of the observed
2
−1
Σ|W ∼ CW (W , M + p − 1) (25) value of |ρ̂| and of the number of averaged periodograms
M . The central dots are the values of the median. The
Such prior falls within the class on non-informative priors dashed line |ρ|2 = |ρ̂|2 is given for reference. For the sake
for Q = Σ−1 discussed in [17], p(Q) ∝ |Q|−K etr [−QΛ]. of clarity, for different values of M we plot |ρ|2 at slightly
Indeed, remembering that the Jacobian of the transfor- shifted values of |ρ̂|2 .
mation Q → Σ is |Σ|−2p [15], this prior corresponds to
K = −1 and Λ = 0, that is p(Q) ∝ |Q|−1 .
Note that for the posterior in Eq. (25) ⟨W −1 · W̃ ⟩ =
Ip M/(M − 1). As the distribution only holds for M > 1,
and actually the entire multidimensional Bayesian inference mentioned significant bias of the sample distribution toward
only holds for M ≥ p, the bias remains smaller than p/(p − high values.
1) and independent of p.
Based on the discussion above, we recommend the p(Q) ∝ To check this, we have numerically calculated the
|Q| prior when in need of inferring the whole CPSD at
−1 posterior predictive distribution of a future observation
2 2
a given frequency. |ρ̃| conditional on the pastobservation  |ρ̂| . We give in
2 2
When only particular functions of the CPSD are needed, Figure 5 a contour plot of p |ρ̃| |ρ̂| for M = 5 that is
as in some of the following sections, the proper priors will clearly symmetric around the line |ρ̃|2 = |ρ̂|2 , thus showing
be formulated in terms of those functions and not of the the lack of real bias of our posterior.
whole CPSD.

C. Inference of the MSC


We use Eq. (14) to derive the posterior distribution for
2 2
the theoretical MSC |ρij | . As 1 ≥ |ρij | ≥ 0, the least
informative prior appears to be one constant in that same
interval. This choice and Eq. (14) yield:
p(|ρ|2 |ρ̂|2 ) = (M + 1)(1 − |ρ|2 )M (1 − |ρ̂|2 )M −2
2 F1 (M, M, 1, |ρ̂| |ρ| )
2 2
× (26)
2 F1 (2, 2, 2 + M, |ρ̂| )
2

As already anticipated, MSC is mostly used, within noise


characterization, as a diagnostic for the existence of linear
correlation between two processes. To illustrate to what
extent such diagnostic parameter is effective, we plot in
Figure 4 the ℓ(2)(≃ 0.95) likelihood, equal tail credible
interval, and the median predicted by the posterior in
Eq. (26). We do that as a function of both |ρ̂|2 and M .
The figure shows that one reaches a reasonable confidence
that some correlation exists between the two processes, only Figure 5: Contour plot of the probability density function
when both M and |ρ̂|2 are large enough. For instance, this
 
2 2
p |ρ̃| |ρ̂| of the posterior predictive distribution of a
confidence is never reached for M = 2, only if |ρ̂|2 ≳ 0.6 for 2
M = 5, and even for M = 20, one would require |ρ̂|2 ≳ 0.2 future observation |ρ̃| conditional on the past observation
2
Note that the values of the median are always found |ρ̂| . The calculation is for M = 5.
2 2
below the ‘unbiased’ line |ρ| = |ρ̂| . This bias is only
apparent, and in reality, it compensates for the already
7

D. Inference of R2 Here, x0 (t)—the ‘residual’—is a process independent of the


Assuming also for R2 a flat prior, Eq. (16) yields: yi (t)’s. We refer to the Fourier transforms of the functions
αi (t), denoted by αi (f ), as the ‘susceptibilities’.
p(R2 R̂2 ) = (M + 1)(1 − R2 )M × Our goal is to estimate the PSD Sx0 x0 of the residual
2 F1 (M, M, p − 1, R̂2 R2 ) x0 (t), and the susceptibilities αi (f ). This task, often
× (27) referred to as noise projection or noise decorrelation, is
(1, M, M )
 
p Fq ; R̂2 common in what is known as noise hunting, which involves
(M + 2, p − 1)
identifying the source of noise in the main data series
where p Fq is the generalized hypergeometric function. of a physical apparatus by examining correlations with
As said, R2 is a generalization of MSC for the case other independently measured disturbances that may have
p > 2. We will show (Eqs.(32,33)) that, if x1 [n] is a linear coupled into the primary measurement. The noise hunting
combination of the remaining p−1 processes, plus a residual carried out for LISA Pathfinder was no exception, and it
one, then R2 measures the fraction of the total PSD of required us to develop the approach described below.
x1 [n] that is due to the linear combination of the other We consider two cases:
processes. So ideally, for completely uncorrelated processes, 1) When αi (t) is a general function. This allows estima-
R2 = 0, while, in the opposite case of negligible residual, tion of both αi (f ) and Sx0 x0 (f ) independently at each
R2 = 1. frequency.
Similarly to what we have done for the MSC, to get a 2) When αi (t) = αi δ(t), with αi constant. In this case,
sense of the effectiveness of this measure, we plot in Figure 6 αi (f ) = αi becomes frequency-independent, and thus
the ℓ(2)(≃ 0.95) likelihood, equal tail credible interval, and a global parameter of the estimate, while Sx0 x0 (f )
the median predicted by the posterior in Eq. (27). We do remains dependent on frequency.
that as a function of both |R̂|2 and M in the case p = 5. It is convenient for the rest of the discussion to make
the following block partition of Σ:
 
Sxx Sxy
Σ= † (29)
Sxy Syy
where Sxx is just the PSD of x(t), Syy is the r × r CPSD
matrix of the disturbances, and Sxy is the 1 × r vector of
the CPSD between x(t) and all the disturbances yi (t). Note
that, for clarity, we have omitted the explicit dependence
of all quantities on frequency. We continue to do so in the
rest.
Within the model in Eq. (28),
r
Figure 6: The equal tail, ℓ(2)(≃ 0.95.5) likelihood credible
X
Sxx = Sx x + αi αj∗ Syi ,yj =
intervals for the multiple coherence R2 (error bars) as a
0 0
i,j=1 (30)
function of the observed value of R̂2 and of the number of
= Sx0 x0 + α · Syy · α †
averaged periodograms M . The calculation is for p = 5-
variate stochastic process. The central dots are the values of where we have introduced the r-long vector α with
the median. The dashed line R2 = R̂2 is given for reference. components αi . Furthermore,
For the sake of clarity, for different values of M we plot r
R2 at slightly shifted values of R̂2 .
X
Sxy = αi Syi ,yj = α · Syy (31)
i=1
The plot shows that also R̂2 carries a very significant bias that is α = Sxy · −1
Syy ,
a relation that will be useful in the
toward higher values, and the the posterior compensates following.3
for such large bias. Again, to conclude that a significant Note that Sx0 x0 is the Schur complement Σ/Syy of Syy
fraction of the noise power in x1 [n] is contributed by in the matrix Σ:
the part correlated with the remaining series, one needs
comparatively large values both of M and of R̂2 . Sx0 x0 = Sxx − Sxy · Syy · Sxy

= Σ/Syy = 1/(Σ−1 )11 (32)
and that the multiple coherence is
IV. Noise projection and time series
Sx0 x0
decorrelation R2 = 1 − (33)
Sxx
We now consider the case where the p-variate stochastic
process consists of a “main” process x(t) and r = p − 1 3 This relation is exactly true only if, at a given frequency f , S (f )
xy
“disturbances” yi , with 1 ≤ i ≤ r, modeled as and Syy (f ) are the exact values at f and not those smoothed over the
spectral window (see Eq. (3)) that we are using here. This spectral
Xr Z +∞ smoothing may bias the estimation of α(f ) should it have a strong
x(t) = x0 (t) + αi (t − t′ ) yi (t′ ) dt′ (28) dependency on frequency. Such bias can be mitigated by properly
−∞ reducing the width of the spectral window.
i=1
8

i.e., the fraction of the PSD of x(t) contributed by the rior becomes:
disturbances, a useful quantity one wants to estimate. p(Sx0 x0 , α|W ) =
(W0 )M −r
 
W0
= exp − ×
Γ(M − r)SxM0 x−r+1
0
Sx0 x0 (38)
Wy,y  
Sx0 x0 Wy,y
A. Inference of susceptibilities, residuals, and CPSD of × exp −(α − α0 ) · · (α − α0 )† .
πr Sx0 x0
disturbances in the general case
This is equivalent to stating that:
Our starting point is a key re-parametrization of the Sx0 x0 | W ∼ invΓ(M − r, W0 ) (39)
sample distribution in Eq. (11). For the sake of such re-
parametrization, we need to introduce, in analogy with and
Eq. (29), the block partition of W and Π α | W , Sx0 x0 ∼ CN (α0 , Sx0 ,x0 Wyy −1
) (40)

Wxx Wxy
 
Πxx Πxy
 with CN (Sx0 ,x0 Wyy −1
, α0 ) the complex, circularly symmet-
W = † = M (34) ric r-variate Gaussian distribution, with covariance matrix
Wxy Wyy Π†xy Πyy
Sx0 ,x0 Wyy−1
and mean value α0 .
Two functions of W that we also need in the following Note that the marginal distribution of Sx0 x0 is already
are: given by Eq. (39) while that of α may be obtained by
integrating Eq. (38) over Sx0 x0 .
1) the ‘observed’ residual noise PSD Π0 that we define By performing this integration we get that
from −1
(41)

α ∼ ctr α0 , W0 Wyy ,M − r ,
1 1 1
Π0 = × ≡ × W (35)
M − r (W −1 )xx M −r
0
the latest being the complex multivariate t-distribution for
an r-long complex vector, with mean value α0 , scale factor
with (W −1 )xx the upper-left 1 × 1 block of W −1 ;
W0 Wyy −1
and M − r degrees of freedom.
2) the ‘observed’ susceptibility vector
This means that the real and imaginary  parts  of α, re-
α0 = Wxy · Wyy −1
(36) cast into the 2r-long real vector αR = Re α follow a
Im α
In Appendix A, where we show that the sample distri- joint multivariate Student t2r distribution [18]
bution in Eq. (11) can be re-parametrized as:
αR ∼ t2r (α0,R , Ω, 2(M − r)) , (42)
1
 
W0
p(W |Sx0 x0 , α, Syy ) ∝ M exp − × with 2(M
Sx0 x0 Sx0 x0  − r) degrees of freedom, mean value α0,R =
Re α0

(α − α0 ) · Wyy · (α − α0 ) † , and a scale matrix given by
 
× exp − × (37) Im α0
Sx0 x0 −1
1 1 Re Wyy Im Wyy

etr −1 = Π (43)
 
× −S W Ω
|Syy |M yy yy
2
0
− Im Wyy Re Wyy

Thus, the distribution splits into two independent parts, From this joint marginal distribution, we also get the
one depending on Sx0 x0 and α but not on Syy , and one marginal distributions of the single components of αR that
that only depends on Syy . Thus, if one select a prior of are univariate t distributions with 2(M − r) degrees of free-
dom [18], and scale parameter given by the corresponding
the kind p(Sx0 x0 , α) × p(Syy ), then the posterior also splits
into the product of the joint posterior for Sx0 x0 and α, element in Ω.
with the posterior for Syy alone. The estimate of the latter Note that the covariance of the elements of αR , ((M −
reduce to the estimate of the CPSD that we have already r)/(M − r − 1))Ω, decreases with decreasing Π0 , the PSD
treated. From now on we focus then on the estimate of of residuals. This is expected as, for a given value of the
Sx0 x0 and α only. total PSD, a small Π0 implies a large contribution of the
disturbances and then a large signal-to-noise ratio for the
The most realistic, least informative prior for Sx0 x0 , as components of αR .
for all other PSDs we have met, is again p(Sx0 x0 ) ∝ 1/Sx0 x0 It is also straightforward to calculate that Ω ∝ M −1 , so
independently of the value of α. that this signal-to-noise ratio, as expected, also increases
On the other hand, the components of α are, in the with increasing averaging.
language of statics, location parameters. If they can be The model discussed so far assumes the disturbances yi (t)
assumed independent of each other, then, for each of them, are measured with negligible readout noise. It is therefore
the least informative prior is just p(αi ) = 1. important to consider, before concluding this section, the
From the above consideration, it follows that a sound non- consequences of applying the method when such noise is
informative joint prior for Sx0 x0 , and α is p(Sx0 x0 , α) = in reality not negligible.
1/Sx0 x0 , with which their properly normalized joint poste- Let us consider first the estimate of Sx0 x0 . Our method
9

in reality estimates 1/Σ−1 11 , whatever the detailed form of susceptibilities are then complex, frequency-dependent, and
−1
Σ is. Indeed our starting point is 1/W11 , whose sampling non-causal.
distribution is 1/W11 ∼ Γ(M − r, 1/Σ−1
−1
11 ). For the simulation, we selected the PSD of the residuals
From this, and the distribution in Eq. (20), one x0 to consist of a ∝ 1/f 2 low frequency tail merging into
can derive the sampling distribution of our Bayesian a plateau extending up to some double-pole roll-off. More
estimate for Sx0 ,x0 . We find  that this distribution is explicitly4 (see Figure 7):
β ′ M − r, M − r, 1/Σ−1 11,true , a distribution whose median 2
1 − e−2πf1 T 1 − e−2πf2 T
 
is equal to 1/Σ−1 11 , and a relative uncertainty that only Sx0 (f ) = +
depends on M − r. This confirms that the methods gives 1 − e−2πf1 T e−i2πf T 1 − e−2πf2 T e−i2πf T
an unbiased estimate of 1/Σ−1 11 . 2
1 − e−T /τ e−i2πf0 T
In the presence of readout noise, 1/Σ−1 11 ̸= Sx0 x0 , as +
the true form of Σ is not that in Eq. (29). Indeed, in the 1 − e−T /τ e−i2πf T
(46)
simplest model of additive noise, the measured disturbance
is yi (t) + ni (t), with ni (t) a zero mean stationary process with T = 1 s the sampling time, f1 = 0.10 Hz and f2 =
independent of all the yi ’s. In this case, the lower diagonal 0.11 Hz the two roll-off frequencies, and f0 = 1 mHz the
block of Σ becomes Syy + Sn with Sn a diagonal matrix cross-over frequency between the tail and the plateau.
whose generic element Sni ,ni is the PSD of ni (t). The disturbances are in the form zi (t) = clf,i zlf,i (t) +
Working out the formula for 1/Σ−1 11 in the general case chf,i zhf,i (t) + ci z0 (t), where all the time series on the
is a bit cumbersome. It becomes particularly simple if also right hand side are Gaussian, zero-mean and mutually
Syy is diagonal, that is, if the disturbances are mutually independent, and the coefficient clf,i , chf,i and ci are real
uncorrelated. One can readily calculate that in this case and randomly selected.
r The zlf,i (t) and z0 (t) share the same PSD
X 2 Syi ,yi Sni ni
1/Σ−1
11 = Sx0 x0 + |αi | (44) 2
Syi ,yi + Sni ni 1 − e−T /τ1 e−i2πf0 T 1 − e−T /τ2 e−i2πf0 T
i=1 Slf (f ) = (47)
1 − e−T /τ1 e−i2πf T 1 − e−T /τ2 e−i2πf T
with Syi yi the PSD of yi (t). One can recognize that in the
limit of dominant readout noise 1/Σ−1 11 → Sxx . In other
with τ1 = 1.0×105 s and τ2 = 1.1×105 s. For f ≫ 1/τ1 , 1/τ2
words, a dominant readout noise, as expected, completely this PSD amounts to a ∝ 1/f 4 low frequency tail with unit
obscures any correlation between x(t) and the y’s. value at f = f0 .
Furthermore, within the samePsimplification of uncorre- The zhf,i (t) have PSD
p
lated disturbances, the product j=2 Σ1,j Σ−1 , with k > 1 2
j,k
1 − e−T /τ e−i2πf0 T
which, in the noiseless limit, is (Sxy · Syy −1
)k−1 = αk−1 , Shf (f ) = (48)
becomes instead 1 − e−T /τ e−i2πf T
Xp
Syk−1 yk−1 again a ∝ 1/f 2 tail with unit value at f = f0 . The presence
Σ1,j Σ−1 = α k−1 . (45) of the shared series z0 (t) induces correlation among the z’s
j,k
Syk−1 ,yk−1 + Snk−1 nk−1
j=2 with CPSD Szi ,zj (f ) = ci cj Slf (f ).
Thus, in the presence of significant readout noise, our All PSDs above must be intended to be zero for |f | ≥
method overestimates the PSD of the residuals, underesti- 1/(2T ). With this prescription, they can be read as discrete
mates the absolute value of the susceptibility, and should time Fourier transforms of the corresponding discrete time
only be used for an upper limit on Sx0 ,x0 . autocorrelation, and their shape allows a straightforward
We have used the approach described in this section implementation as auto-regressive moving average (ARMA)
to decorrelate the effect of the temperature from the stochastic processes.
acceleration data series of LPF [3]. To further test its In Figure 7, we illustrate the ASD of the simulated time
validity, in particular with respect to bias, we have also series.
studied a simulated case. This is discussed in the next As for the filter h(t), its transfer function is
section. 1 − e−2πT /τa 1 − e−2πT /τb
h(f ) = (49)
1 − e−2πT /τa e−i2πT f 1 − e−2πT /τb e−i2πT f
B. A test simulation with τa = 2000 s and τa = 2001 s, and zero for |f | ≥ 1/(2T ).
One can recognize the transfer function of a discrete-time
PWe3
have generated a times series x(t) = x0 (t) +
two-pole infinite impulse response low-pass filter, which is
i=1 ni zi (t), with all series Gaussian and zero-mean, and
with ni three real coefficients. We have also generated the easily implemented again as an ARMA filter on the discrete
three “observed” disturbances yi (t) = h(t) ∗ zi (t), with time series of the z’s.
h(t) the impulse response of a low-pass filter, and with ∗ We think that all in all this model possesses many
indicating the time-convolution. features of a realistic situation, complex frequency depen-
Within this simple model, the susceptibilities become 4 Note that in this section we show and calculate single-sided PSDs,
αi (f ) = ni /h(f ) with h(f ) the frequency response of as this is the standard practice. Discussion, results, and susceptibilities
the filter, that is the Fourier transform of h(t). The are not affected by this choice.
10

Figure 7: ASD of time series used in the simulation. The Figure 8: Example of noise decorrelation for one sample
dashed lines represent the ‘true values’ that is those used to of the multivariate time series {x(t), y1 (t), y2 (t), y3 (t)}
generate the simulated data that have been calculated from with the same set of numerical coefficient used for the
Eqs. 46 to 48, and from one random extraction of the set of data in Figure 7. Top panel. Black data points: ASD
numerical coefficients ni , clf,i , chf,i and ci . The noisy lines of x(t) estimated using the posterior in Eq. (20). Dots
are the averages of the estimated ASD from 100 different represent the medians of the ASD posteriors, while error
simulations generated from the true spectrum above. Time bars delimit their ℓ(1) (≃ 68.5% likelihood) equal-tail
series are sampled with T = 1 and last 5 × 105 s. Frequency credible intervals. Red data points: ASD of x0 (t) estimated
dependent data partition for periodogram calculation is using the marginal posterior in Eq. (39). Dots and bars have
performed according to the method of Ref. [8]. We used the the same meaning as for the black data points. ASDs have
Nuttall four-coefficient minimal-side-lobe spectral window been estimated with the Nuttall window, and the frequency
[19]. separation is such that nearest neighbors may have a
linear correlation in the 10-30% range. Correlation between
the second-nearest neighbors is negligible. Red dashed
dence of PSD, cross-correlation among disturbances, high line: true value from Eq. (46). Lower panel. Black data
data dynamic range, complex susceptibility etc., to give a points: posterior distribution for the multiple coherence
meaningful test of the method. coefficient R2 for the same data. The posterior is that
In Figure 8 we show the result of the decorrelation on one in Eq. (26), dots are medians, and bars delimit the ℓ(2)
example of a 5 × 105 s multivariate time series generated (≃ 95.5% likelihood) symmetric-tail credible intervals. Red
as described above. dashed line: corresponding true value for R2 calculated as
The figure clearly shows that the method, at least for 1 − Sx0 (f )/Sx (f ), with Sx (f ) the true spectral density of
this example, gives an unbiased estimate of the ASD x(t).
of the residuals. The ASD indeed fluctuates, within the
uncertainties predicted by the posterior in Eq. (20), around
Within this frequency range the estimate appears unbiased
the true value in Eq. (46). The figure also shows, for
and in agreement with the true value within the uncertain-
reference, the estimate of the multiple coherence coefficient
ties predicted by the proper marginal t-distribution.
R2 , from the posterior in Eq. (27). The plot indicates that
at f ≃ 1 mHz, where M ≃ 30, the method allows to detect
a ≃ 10% contribution of the disturbances to the total PSD. C. The case of real frequency-independent susceptibilities
In Figure 9 we show the estimate of the susceptibility In many practical circumstances, one can safely assume
α1 (f ) for the same set of data used for Figure 8, and that α is a real frequency-independent vector. If this is
compare it with the true value n1 /h(f ), with h(f ) from the case, α becomes a common parameter in the sampling
Eq. (49). The figure is limited to f ≃ 1 mHz, as above that distribution of the W ’s at all frequencies. Thus, to build up
frequency the susceptibility is in practice compatible with a posterior for α, one needs to consider the joint likelihood
α1 = 0, as expected from the fact that the contribution of all the W ’s for a given value of α. We anticipate that, in
of the disturbances to total power becomes undetectable. this case, we do not get a closed form posterior distribution,
11

plus an independent term that only contains Syy,i and that


is not used here.
The posterior in Eq. (50) can be used for the MCMC
estimate of the parameter posterior distribution. We have
used this method extensively within the data processing
of LISA Pathfinder [3]. We have also applied it to simu-
lated noise with the same properties as that discussed in
Section IV-B, except that here the disturbances have not
been filtered, that is, yi (t) = zi (t). With this prescription,
the susceptibilities are just the real, frequency-independent
numbers αi = ni .
An example of the result of such a simulation is presented
in Figures 10 and 11.

Figure 9: Black data points: susceptibility α1 (f ) of x(t) to


y1 (t) for the example data of Figure 8 from its marginal
Student t-posterior. Dots represent the medians of the
posteriors, while error bars delimit their symmetric-tail
ℓ(1) credible intervals. Red dashed line: true susceptibility Figure 10: Example of noise decorrelation for one sample
value n1 /h(f ), with h(f ) from Eq. (49). of the multivariate time series {x(t), y1 (t), y2 (t), y3 (t)}
with real frequency independent susceptibilities. Meanings
of quantities are the same as those in the upper panel of
Figure 8. The posterior for the ASD of x0 (t) has been
but a useful form that can be integrated by using the
obtained with an MCMC integration of the posterior in
Markov Chain Monte Carlo approach.
Eq. (50). Data were taken at every other frequency of the
To build this posterior, let us call Wi the sample at fre-
data in Figure 8, both to simplify the calculation and to
quency fi —obtained by averaging over Mi periodograms—
ensure their mutual independence.
and similarly let’s indicate the corresponding theoretical
quantities with Sx0 ,x0 ,i and Σyy,i .
The figure shows again that the method gives a consistent
Let also assume that fi and fi+i are sufficiently far
and unbiased estimate of all the Sx0 x0 ,i and all the αi .
apart that Wi and Wi+1 may be treated as independent,
Before closing this section it is worth noticing that
so that the likelihoodQof the Wi is just the product of their
Nf one advantage of this simultaneous fit to the data at all
marginal likelihoods i=1 p (Wi |Sx0 x0 ,i , α, Syy,i ), with Nf
frequencies, is that one can include also data at very low
the number of considered frequencies.
frequency where the condition Mi ≥ p may be violated.
In addition, we assume that Sx0 ,x0 ,i and Σyy,i have This is shown as follows.
independent prior distributions, that the prior for Sx0 ,x0 ,i First, the distribution of a singular Wi obtained from
is, as before, ∝ 1/Sx0 ,x0 ,i , and that the priors for the M < p periodograms, is the singular complex Wishart
i
component of α are, again as before, independent and distribution [20]:
uniform.
Using also the fact that, for real α, Eq. (60) gives  π Mi (Mi −p) |Λi |Mi −p
i = etr −Σ−1 Wi , (51)
 
p W i Σ
Wx0 x0 ,i = Wxx,i −2α·Re(Wxy,i )+α·Re(Wyy,i )·α, we get

Γ
e p (Mi ) |Σi |Mi i

for the logarithm Λ of the joint posterior of all parameters


with |Λi |, the product of the non-zero eigenvalues of Wi
Nf
X The dependence of the likelihood in Eq. (51) on Σ and
Λ=− (Mi + 1) log(Sx0 ,x0 ,i )− M is the same as that in Eq. (11). Thus the difference
i=1
between the two likelihoods makes no difference for the
Nf Nf
X Wxx,i X Re(Wxy,i ) derivation of the posterior.
− + 2α · − (50) Furthermore, the stability of the Pposterior in
S
i=1 x0 x0 ,i i=1
S x0 x0 ,i Nf Re(Wxy,i )
  Eq. (50), requires that the matrices i=1 Sx0 x0 ,i
Nf
X Re(Wyy,i ) PNf Re(Wyy,i )
−α· ·α and i=1 Sx x ,i are full rank, not the individual Wi .
0 0
i=1
Sx 0 x 0 ,i As the rank of a sum of positive semi-definite matrices is
12

Still on noise projection, in addition to the single


frequency closed-form prior, we have also presented a simple
likelihood to be used in multi-frequency noise-projection
in case the susceptibilities may be confidently assumed to
be real and frequency independent.
We want to stress again that these results originate from
the experience of data processing at very low frequency,
from µHz to Hz [3], in which, due to length of the required
measurement time, only comparatively few periodograms
are available, and should be particularly suitable for
any situation in which a similar limitation in number of
available periodograms may occur.
Finally, as for almost all commonly used results in
data processing, also those presented here are based on
Gaussian statistics of the time series under processing. The
applicability of the result depends then on how much the
data may be safely considered Gaussian.

Acknowledgments
This work has been supported in part by Agenzia
Spaziale Italiana (ASI), Project No. 2017-29-H.1-2020
Figure 11: The joint posterior for the three susceptibilities “Attività per la fase A della missione LISA”, and Project
α1 , α2 , and α3 . The red surface delimits a credible region No. 2024-36-HH.0-2024 “Attività per la fase B2/C della
with ≃ ℓ(1) likelihood, while the cyan, semi-transparent missione LISA”. The authors would like to acknowledge
surface delimits a credible region with ≃ ℓ(2) likelihood. various useful discussions with all the members of the
The green axes cross at the true value α = n, with n the Trento LISA group.
vector with components ni used in the simulation.

larger than or equal to the maximum rank of the terms in


the sum [21], it suffices that just one of the Wi is of full
rank to give full rank to both sums.
As, in practice, all the Wi above a certain frequency are
full rank, the posterior is well defined even if a few terms
at the lowest frequency have Mi < p.

V. Conclusions
In conclusion, we have presented a set of Bayesian low-
bias closed-form posteriors—based on simple and physically
meaningful priors—for the most commonly estimated
quantities at a given frequency in the spectral analysis
of multivariate time series, and in particular in noise
projection of physical instruments.
The distributions of some of these priors are available
within the main software platforms, which makes the calcu-
lation of credible intervals and other statistical quantities
particularly simple. For the others, we give the explicit form
of the PDF that can be used to numerically calculate the
relevant statistical quantities. For the reader’s convenience,
these posteriors are summarized in Table I.
For the case of noise projection, we have shown with
simulations that the method is capable of retrieving,
with negligible bias, a residual whose ASD is orders of
magnitude smaller than the part due to the measured
disturbances, and we have also investigated the robustness
of the method in the presence of readout noise in the
disturbance measurement.
13

General Spectral Estimation

Quantity Symbol Posterior PDF Section


MΠ MΠ M
e−

S
Single series PSD S S ∼ invΓ(M, M Π) S
Sect. III-A
S Γ(M )
M +p−1
etr −Σ−1 W
 
Multivariate series |W |
Σ Σ ∼ CW −1 (W , M + p − 1) Sect. III-B
CPSD e p (M − p + 1) |Σ|M +2p−1
Γ
(M + 1)(1 − |ρ|2 )M ×
Two-series MSC |ρ|2 – ρ̂|2 |ρ|2 ) Sect. III-C
×(1 − |ρ̂|2 )M −2 2 F2 F1 (M,M,1,|
1 (2,2,2+M,| ρ̂|2 )

(M + 1)(1 − R2 )M ×
Multiple 2 2
× 2F1 (M,M,p−1,R̂ R ) 
R2 – (1, M, M ) Sect. III-D
coherence ;R̂2
p Fq
(M + 2, p − 1)

Noise Projection

Quantity Symbol Posterior PDF Section


W0
W0 (M −r)
e−

PSD of residual S
Sx0 x0 Sx0 x0 ∼ invΓ(M − r, W0 ) S
Sect. IV
(marginal) S Γ(M − r)
Γ(M )
Susceptibilities αR ∼ π r (2(M −r))r Γ(M −r)∥Ω|1/2
×
αR −M Sect. IV
(marginal) t2r (α0,R , Ω, 2(M − r)) −1

(α −α )·Ω ·(α −α )
× 1 + R 0,R2(M −r) R 0,R

Susceptibilities α|Sx0 x0 ∼ Wy,y W


α Sx0 x0 −(α−α0 )· Sx y,y ·(α−α0 )† Sect. IV
(conditional to Sx0 x0 ) −1 e
 x
CN α0 , Sx0 x0 Wyy πr
0 0

Table I: Summary of closed-form posteriors presented in this paper. For the meaning of the symbols, please refer to the
section indicated in the rightmost column.

Appendix A It is straightforward to show that


Derivation of noise-projection posterior.
Σ = U Σ′ U † (54)
where U is the conjugate transpose of matrix U .

Starting from the complex Wishart distribution Eq. (11),
we derive the joint posterior of the decorrelation parameters As the determinant of U is |U | = 1, then
in Eq. (38). This was synthetically discussed in [3], and we
|Σ| = |Σ′ | = Sx0 x0 |Syy |. (55)
repeat it here, in more detail, for the reader’s convenience.
We start by defining the block matrix U The exponent in Eq. (11) can be rewritten in terms of
  Σ′ as:
1 α  etr −Σ−1 W = etr −Σ′−1 W ′ .
   
(56)
U = (52)
0 I with W ′ defined as W ′ = U −1 W (U † )−1 .
Where I is the r × r identity matrix. As  
U performs the linear transformation x0 → x, yi → yi . −1 1/Sx0 x0 0 
Its inverse is obtained from U , by simply replacing α with Σ′ =  (57)
−1
−α. 0 Syy
A second important block matrix is, by using the block decomposition
   
′ ′
Sx0 x0 0  Wx0 x0 Wx0 y 
Σ′ =  (53) W′ =  (58)
0 Syy (Wx′ 0 y )† Wyy

14

we get [6] M. Armano, H. Audley, J. Baird, M. Bassan, P. Binetruy,


′ M. Born, D. Bortoluzzi, E. Castelli, A. Cavalleri, A. Cesarini
Wx x
− Sx 0x 0 et al., “Nano-Newton electrostatic force actuators for femto-
etr −Σ′−1 W ′ = e etr −Syy
−1 ′
=
   
0 0 · Wyy Newton-sensitive measurements: System performance test in the

Wx
(59) LISA Pathfinder mission,” Phys. Rev. D, vol. 109, p. 102009,
− 0 x0
=e etr −Syy
−1 May 2024.
 
Sx0 x0
· Wyy
[7] M. Tröbs and G. Heinzel, “Improved spectrum estimation
where, in the last term, we have used the straightforward from digitized time series on a logarithmic frequency axis,”
Measurement, vol. 39, pp. 120–129, 02 2006.
result Wyy′
= Wyy . [8] S. Vitale, L. Sala, and D. Vetrugno, “An optimised variant of the
Wx0 x0 in Eq. (59) may be written as:

periodogram method for noise power spectral density estimation
in LISA hardware testing,” University of Trento, Tech. Rep.
Wx′ 0 x0 = Wxx − α · Wxy

− Wxy · α† + α · Wyy · α† (60) LISA-UTN-INST-TN-0035, 2025.
[9] N. R. Goodman, “Statistical Analysis Based on a Certain
Multivariate Complex Gaussian Distribution (An Introduction),”
By introducing The Annals of Mathematical Statistics, vol. 34, no. 1, pp. 152–177,
03 1963.
α0 = Wxy · Wyy
−1
. (61) [10] D. K. Nagar and A. K. Gupta, “Expectations of Functions of
Complex Wishart Matrix,” Acta Applicandae Mathematicae, vol.
we get 113, no. 3, pp. 265–288, 03 2011.
[11] G. Carter, C. Knapp, and A. Nuttall, “Estimation of the

α · Wxy = α · Wyy · α†0 magnitude-squared coherence function via overlapped fast
(62) Fourier transform processing,” IEEE Transactions on Audio
Wxy · α† = α0 · Wyy · α†
and Electroacoustics, vol. 21, no. 4, pp. 337–344, 08 1973.
so that [12] D. V. Ouellette, “Schur complements and statistics,” Linear
Algebra and its Applications, vol. 36, pp. 187–295, 03 1981.
Wx′ 0 x0 = Wxx + (α − α0 ) · Wyy · (α − α0 )† − [13] C. R. Rao, Ed., Linear Statistical Inference and its Applications,
−1 †
(63) ser. Wiley Series in Probability and Statistics. Hoboken, NJ,
− Wxy · Wyy · Wx,y . USA: John Wiley & Sons, Inc., 1973.
[14] H. Jeffreys, “An invariant form for the prior probability in
But estimation problems,” Proceedings of the Royal Society of London.
1 Series A. Mathematical and Physical Sciences, vol. 186, no. 1007,
= Wxx − Wxy · Wyy
−1 †
· Wx,y = W /Wyy = , (64) pp. 453–461, 1946.
(W −1 )1,1 [15] P. Shaman, “The inverted complex Wishart distribution and its
application to spectral estimation,” J. Multivar. Anal., vol. 10,
thus finally no. 1, pp. 51–59, 03 1980.
1 [16] A. M. Mathai, S. B. Provost, and H. J. Haubold, Multivariate
Wx′ 0 x0 = + (α − α0 ) · Wyy · (α − α0 )† (65) statistical analysis in the real and complex domains. Springer
(W −1 )1,1 Nature, 2022.
[17] L. Svensson and M. Lundberg, “On posterior distributions for
As Wx′ 0 x0 is independent of Syy , the probability density signals in gaussian noise with unknown covariance matrix,” IEEE
in Eq. (11) splits in the product of two pieces, one that Transactions on Signal Processing, vol. 53, no. 9, pp. 3554–3571,
contains only α and Sx0 x0 , and one that contains only Syy . 2005.
[18] E. Coornish, “The Multivariate t-Distribution Associated with a
Using Eq. (65), and the definition in Eq. (35), we get: Set of Normal Sample Deviates,” Australian Journal of Physics,
M −p vol. 7, no. 4, p. 532, 1954.
|W | [19] A. H. Nuttall, “Some windows with very good sidelobe behavior,”
p(W |Sx0 x0 , α, Syy ) = ×
Γ
e p (M ) IEEE Transactions on Acoustics, Speech, and Signal Processing,
vol. 29, no. 1, pp. 84–91, 1981.
1 (M −r)Π0

(α−α0 )·Wyy ·(α−α0 )†
(66) [20] T. Ratnarajah and R. Vaillancourt, “Complex singular Wishart
e Sx0 x0
×e Sx0 x0
× matrices and applications,” Computers & Mathematics with
SxM0 x0 Applications, vol. 50, no. 3, pp. 399–411, 2005.
1 [21] S. M. S. Exchange), “Is the rank of the sum of two positive
etr −Syy
−1
 
× Wyy semi-definite matrices larger than their individual ranks?”
|Syy |M Mathematics Stack Exchange, (version: 2017-02-20). [Online].
This is used for Eq. (37). Available: https://math.stackexchange.com/q/2153772

References
[1] P. D. Welch, “The use of fast Fourier transform for the esti-
mation of power spectra: a method based on time averaging
over short, modified periodograms,” IEEE Trans. Audio and
Electroacoustics, vol. 15, no. 2, pp. 70–73, 1967.
[2] A. Papoulis and S. Pillai, Probability, Random Variables, and
Stochastic Processes, ser. McGraw-Hill series in electrical engi-
neering: Communications and signal processing. Tata McGraw-
Hill, 2002.
[3] M. Armano et al., “In-depth analysis of LISA Pathfinder perfor-
mance results: Time evolution, noise projection, physical models,
and implications for LISA,” Phys. Rev. D, vol. 110, p. 042004,
Aug 2024.
[4] ——, “Beyond the Required LISA Free-Fall Performance: New
LISA Pathfinder Results down to 20 µHz,” Phys. Rev. Lett., vol.
120, p. 061101, 2 2018.
[5] ——, “Sub-Femto-g Free Fall for Space-Based Gravitational
Wave Observatories: LISA Pathfinder Results,” Phys. Rev. Lett.,
vol. 116, p. 231101, 2016.

You might also like