SlideShare a Scribd company logo
© CHAPPUIS HALDER & CO
Back-testing of Expected
Shortfall: Main challenges and
methodologies
By Leonard BRIE with Benoit GENEST and Matthieu ARSAC
Global Research & Analytics1
1
This work was supported by the Global Research & Analytics Dept. of Chappuis Halder & Co.
Many collaborators from Chappuis Halder & Co. have been involved in the writing and the reflection around this paper; hence we would like
to send out special thanks to Claire Poinsignon, Mahdi Kallel, Mikaël Benizri and Julien Desnoyers-Chehade
© Global Research & Analytics Dept.| 2018 | All rights reserved
2
Executive Summary
In a context of an ever-changing regulatory environment over the last years, Banks have
witnessed the draft and publication of several regulatory guidelines and requirements in order
to frame and structure their internal Risk Management.
Among these guidelines, one has been specifically designed for the risk measurement of market
activities. In January 2016, the Basel Committee on Banking Supervision (BCBS) published
the Fundamental Review of the Trading Book (FRTB). Amid the multiple evolutions discussed
in this paper, the BCBS presents the technical context in which the potential loss estimation has
changed from a Value-at-Risk (VaR) computation to an Expected Shortfall (ES) evaluation.
The many advantages of an ES measure are not to be demonstrated, however this measure is
also known for its major drawback: its difficulty to be back-tested. Therefore, after recalling
the context around the VaR and ES models, this white paper will review ES back-testing
findings and insights along many methodologies; these have either been drawn from the latest
publications or have been developed by the Global Research & Analytics (GRA) team of
Chappuis Halder & Co.
As a conclusion, it has been observed that the existing methods rely on strong assumptions and
that they may lead to inconsistent results. The developed methodologies proposed in this paper
also show that even though the ES97.5% metric is close to a VaR99,9% metric, it is not as easily
back-tested as a VaR metric; this is mostly due to the non-elicitability of the ES measure.
Keywords: Value-at-Risk, Expected Shortfall, Back-testing, Basel III, FRTB, Risk
Management
EL Classification: C02, C63, G01, G21, G17
© Global Research & Analytics Dept.| 2018 | All rights reserved
3
Table of Contents
Table of Contents ....................................................................................................................... 3
1. Introduction ........................................................................................................................ 4
2. Context ............................................................................................................................... 4
2.1. Value-at-Risk............................................................................................................... 5
2.1.1. VaR Definition ..................................................................................................... 5
2.1.2. Risk Measure Regulation ..................................................................................... 6
2.1.3. VaR Calculation ................................................................................................... 6
2.1.4. VaR Back-Testing................................................................................................ 7
2.2. Expected Shortfall ....................................................................................................... 9
2.2.1. ES Definition........................................................................................................ 9
2.2.2. ES Regulatory framework.................................................................................. 10
2.2.3. ES Calculation.................................................................................................... 10
2.2.4. VaR vs. ES ......................................................................................................... 11
2.2.5. ES Back-Testing................................................................................................. 12
3. ES Back-Testing............................................................................................................... 14
3.1. Existing Methods....................................................................................................... 14
3.1.1. Wong’s Saddle point technique.......................................................................... 14
3.1.2. Righi and Ceretta................................................................................................ 17
3.1.3. Emmer, Kratz and Tasche.................................................................................. 22
3.1.4. Summary of the three methods........................................................................... 23
3.2. Alternative Methods .................................................................................................. 25
3.2.1. ES Benchmarking............................................................................................... 25
3.2.2. Bootstrap ............................................................................................................ 26
3.2.3. Quantile Approaches.......................................................................................... 27
4. Applications of the ES methodology and back-testing .................................................... 31
4.1. ES simulations........................................................................................................... 31
4.2. Back-test of the ES using our alternative methods.................................................... 34
5. Conclusion........................................................................................................................ 41
© Global Research & Analytics Dept.| 2018 | All rights reserved
4
1. Introduction
Following recent financial crises and their disastrous impacts on the industry, regulators are
proposing tighter monitoring on banks so that they can survive in extreme market conditions.
More recently, the Basel Committee on Banking Supervision (BCBS) announced a change in
the Market Risk measure used for Capital requirements in its Fundamental Review of the
Trading Book (FRTB), moving from the Value-at-Risk (VaR) to the Expected Shortfall (ES).
However, if the ES captures risks more efficiently than the VaR, it also has one main downside
which is its difficulty to be back-tested. This leads to a situation where banks use the ES to
perform Capital calculations and then perform the back-testing on a VaR. The focus for banks’
research is now to try to find ways to back-test using the ES, as it can be expected that regulators
will require so in a near-future.
This paper aims at presenting the latest developments in the field of ES back-testing
methodologies and introducing new methodologies developed by the Global Research &
Analytics (GRA) team of Chappuis Halder & Co.
First, a presentation of the context in which the back-testing of Expected Shortfall takes place
will be provided. This context starts with calculation and back-testing methodologies of the
Value-at-Risk, followed by a focus on the ES, analysing its calculation and how it defers from
the previous risk measure. The main issues of ES back-testing will then be exposed and
discussed.
Second, back-testing methodologies for ES will be reviewed in detail, beginning with
methodologies that have already been presented in previous years and then with alternative ones
introduced by the research department of Chappuis Halder &Co.
Third, some of the alternative back-testing methodology will be simulated on a hypothetical
portfolio and a comparison of the methodologies will be conducted.
2. Context
Recall that in January 2016, the Basel Committee on Banking Supervision (BCBS) issued its
final guidelines on the Fundamental Review of the Trading Book (FRTB). The purpose of the
FRTB is to cover shortcomings that both regulations and internal risk processes failed to capture
during the 2008 crisis. It shows a strategic reversal and the acceptance of regulators for:
- a convergence between risk measurement methods;
- an integrated assessment of risk types (from a silo risk assessment to a more
comprehensive risk identification);
- an alignment between prudential and accounting rules.
One of the main requirements and evolutions of the FRTB is the switch from a Value-at-Risk
(VaR) to an Expected Shortfall risk measurement approach. Hence, Banks now face the paradox
of using the ES for the computation of their Market Risk Capital requirements and the Value-
at-Risk for the back-testing. This situation is mainly due to the difficulty of finding an ES back-
testing methodology that is both mathematically consistent and practically implementable.
However, it can be expected that upcoming regulations will require banks to back-test the ES.
© Global Research & Analytics Dept.| 2018 | All rights reserved
5
The following sections aim at reminding the existing Market Risk framework for back-testing,
as most of the presented notions must be understood for the following chapters of this article.
The VaR will therefore be presented at first, given that its calculation and back-testing lay the
foundation of this paper. Then, a focus will be made on the ES, by analysing its calculation and
the way it defers from the VaR. Finally, the main issues concerning the back-testing of this new
measure will be explained. Possible solutions will be the subject of the next chapter.
2.1. Value-at-Risk
2.1.1. VaR Definition
The VaR was first devised by Dennis Weatherstone, former CEO of J.P. Morgan, on the
aftermath of the 1987 stock market crash. This new measure soon became an industry standard
and was eventually added to Basel I Accord in 1996.
“The Value-at-Risk (VaR) defines a probabilistic method of measuring the potential loss in
portfolio value over a given time period and for a given distribution of historical returns. The
VaR is expressed in dollars or percentage losses of a portfolio (asset) value that will be equalled
or exceeded only X percent of the time. In other words, there is an X percent probability that
the loss in portfolio value will be equal to or greater than the VaR measure.
For instance, assume a risk manager performing the daily 5% VaR as $10,000. The VaR (5%)
of $10,000 indicates that there is a 5% of chance that on any given day, the portfolio will
experience a loss of $10,000 or more.”1
Figure 1 - Probability distribution of a Value-at-Risk with 95% Confidence Level and 1day Time Horizon (Parametric VaR
expressed as with a Normal Law N(0,1))
Estimating the VaR requires the following parameters:
1
Financial Risk Management book 1, Foundations of Risk Management; Quantitative Analysis, page 23
© Global Research & Analytics Dept.| 2018 | All rights reserved
6
• The distribution of P&L – can be obtained either from a parametric assumption or
from non-parametric methodologies using historical values or Monte Carlo simulations;
• The Confidence Level – the probability that the loss will not be equal or superior to the
VaR;
• The Time Horizon – the given time period on which the probability is true.
One can note that the VaR can either be expressed in value ($, £, €, etc.) or in return (%) of an
asset value.
The regulator demands a time horizon of 10 days for the VaR. However, this 10 days VaR is
estimated from a 1-day result, since a N days VaR is usually assumed equal to the square root
of N multiplied by the 1-day VaR, under the commonly used assumption of independent and
identically distributed P&L returns.
ܸܴܽ∝ ,ேௗ௔௬௦ = √ܰ × ܸܴܽ∝,ଵௗ௔௬
2.1.2. Risk Measure Regulation
From a regulatory point of view, Basel III Accords require not only the use of the traditional
VaR, but also of 3 other additional measures:
• Stressed VaR calculation;
• A new Incremental Risk Charge (IRC) which aims to cover the Credit Migration Risk
(i.e. the loss that could come from an external / internal ratings downgrade or upgrade);
• A Comprehensive Risk Measure for credit correlation (CRM) which estimates the price
risk of covered credit correlation positions within the trading book.
The Basel Committee has fixed parameters for each of these risk measures, which are presented
in the following table:
VaR Stressed VaR IRC CRM
Confidence Level 99% 99% 99.9% 99.9%
Time Horizon 10 days 10 days 1 year 1 year
Frequency of
calculation
Daily Weekly - -
Historical Data 1 previous year 1 stressed year - -
Back-Test Yes No - -
2.1.3. VaR Calculation
VaR calculation is based on the estimation of the P&L distribution. Three methods are used by
financial institutions for VaR calculation: one parametric (Variance-Covariance), and two not
parametric (Historical and Monte-Carlo).
1. Variance-Covariance: this parametric approach consists in assuming the normality of
the returns. Correlations between risk factors are constant and the delta (or price
sensitivity to changes in a risk factor) of each portfolio constituent is constant. Using
the correlation method, the Standard Deviation (volatility) of each risk factor is
© Global Research & Analytics Dept.| 2018 | All rights reserved
7
extracted from the historical observation period. The potential effect of each component
of the portfolio on the overall portfolio value is then worked out from the component’s
delta (with respect to a particular risk factor) and that risk factor’s volatility.
2. Historical VaR: this method is the most frequently used method in banks. It consists in
applying historical shocks on risk factors to yield a P&L distribution for each scenario
and then compute the percentile.
3. Monte-Carlo VaR: this approach consists in assessing the P&L distribution based on
a large number of simulations of risk factors. The risks factors are calibrated using
historical data. Each simulation will be different but in total the simulations will
aggregate to the chosen statistical parameters.
For more details about these three methods, one can refer to the Chappuis Halder & Co.’s white
paper1
on the Value-at-Risk.
Other methods such as “Exponentially Weighted Moving Average” (EWMA), “Autoregressive
Conditional Heteroskedasticity” (ARCH) or the “declined (G)ARCH (1,1) model” exist but are
not addressed in this paper.
2.1.4. VaR Back-Testing
As mentioned earlier, financial institutions are required to use specific risk measures for Capital
requirements. However, they must also ensure that the models used to calculate these risk
measures are accurate. These tests, also called back-testing, are therefore as important as the
value of the risk measure itself. From a regulatory point of view, the back-testing of the risk
measure used for Capital requirements is an obligation for banks.
However, in the case of the ES for which no sound back-testing methods have yet been found,
regulators had to find a temporary solution. All this lead to the paradoxical situation where the
ES is used for Capital requirements calculations whereas the back-testing is still being
performed on the VaR. In its Fundamental Review of the Trading Book (FRTB), the Basel
Committee includes the results of VaR back-testing in the Capital calculations as a multiplier.
Financial institutions are required to back-test their VaR at least once a year, and on a period of
1 year. The VaR back-testing methodologies used by banks mostly fall into 3 categories of
tests: coverage tests (required by regulators), distribution tests, and independence tests
(optional).
Coverage tests: these tests assess if the number of exceedances during the tested year is
consistent with the quantile of loss the VaR is supposed to reflect.
Before going into details, it seems important to explain how this number of exceedances is
computed. In fact, each day of the tested year, the return of the day [t] is compared with the
calculated VaR of the previous day [t-1]. It is considered an exceedance if the t return is a loss
1
Value-at-Risk: Estimation methodology and best practices.
© Global Research & Analytics Dept.| 2018 | All rights reserved
8
greater than the t-1 VaR. At the end of the year, the total number of exceedances during the
year can be obtained by summing up all exceedances occurrences.
The main coverage tests are Kupiec’s “proportion of failures” (PoF)1
and The Basel
Committee’s Traffic Light coverage test. Only the latter will be detailed here.
The Traffic Light coverage test dates back to 1996 when the Basel Committee first introduced
it. It defines “light zones” (green, yellow and red) depending on the number of exceedances
observed for a certain VaR level of confidence. The colour of the zone determines the amount
of additional capital charges needed (from green to red being the most punitive).
Zone Exceptions
(out of 250)
Cumulative
probability
Green
0 8.11%
1 28.58%
2 54.32%
3 75.81%
4 89.22%
Yellow
5 95.88%
6 98.63%
7 99.60%
8 99.89%
9 99.97%
Red 10 99.99%
Table 1 - Traffic Light coverage test (Basel Committee, 1996), with a coverage of 99%
Ex: let’s say a bank chooses to back-test its 99% VaR using the last 252 days of data. It observes
6 exceedances during the year. The VaR measures therefore falls into the “yellow zone”. The
back-test is not rejected but the bank needs to add a certain amount of capital.
Distribution tests: these tests (Kolmogorov-Smirnov test, Kuiper’s test, Shapiro-Wilk test,
etc.) look for the consistency of VaR measures through the entire loss distribution. It assesses
the quality of the P&L distribution that the VaR measure characterizes.
Ex: instead of only applying a simple coverage test on a 99% quantile of loss, we apply the
same coverage test on different quantiles of loss (98%, 95%, 90%, 80%, etc.)
Independence tests: these tests assess some form of independence in a Value-at-Risk
measure’s performance from one period to the next. A failed independence test will raise doubts
on a coverage or distribution back-test results obtained for that VaR measure.
1
Kupiec (1995) introduced a variation on the binomial test called the proportion of failures (PoF) test. The PoF
test works with the binomial distribution approach. In addition, it uses a likelihood ratio to test whether the
probability of exceptions is synchronized with the probability “p” implied by the VaR confidence level. If the data
suggests that the probability of exceptions is different than p, the VaR model is rejected.
© Global Research & Analytics Dept.| 2018 | All rights reserved
9
To conclude, in this section were presented the different methodologies used for VaR
calculation and back-testing. However, this risk measure has been widely criticized during the
past years. Among the different arguments, one can notice its inability to predict or cover the
losses during a stressed period, the 2008 crisis unfortunately revealing this lack of efficiency.
Also, its incapacity to predict the tail loss (i.e. extreme and rare losses) makes it difficult for
banks to predict the severity of the loss encountered. The BCBS therefore decided to retire the
well-established measure and replace it by the Expected Shortfall. The following section will
aim at describing this new measure and explain how it defers from the VaR.
2.2. Expected Shortfall
The Expected Shortfall (ES), aka Conditional VaR (CVaR), was first introduced in 2001 as a
more coherent method than the VaR. The following years saw many debates comparing the
VaR and the ES but it’s not until 2013 that the BCBS decided to shift and adopt ES as the new
risk measure.
In this section are presented the different methodologies of ES calibration and the main
differences between the ES and the VaR. Finally, an introduction of the main issues concerning
the ES back-testing will be made, which will be the focus of the following chapter.
2.2.1. ES Definition
FRTB defines the ES as the “expected value of those losses beyond a given confidence level”,
over certain time horizon. In other words, the t-ES gives the average loss that can be expected
in t-days when the returns are above the t-VaR.
For example, let’s assume a Risk Manager uses the historical VaR and ES. The observed 97.5%
VaR is $1,000 and there were 3 exceedances ($1,200; $1,100; $1,600). The calibrated ES is
therefore $1,300.
Figure 2 – Expected shortfall (97.5%) illustration
VaR97,5
ES97,5
© Global Research & Analytics Dept.| 2018 | All rights reserved
10
2.2.2. ES Regulatory framework
The Basel 3 accords introduced the ES as the new measure of risk for capital requirement. As
for the VaR, the parameters for ES calculation are fixed by the regulators. The following table
highlights the regulatory requirements for the ES compared with those of the VaR.
VaR Expected Shortfall
Confidence Level 99% 97.5%
Time Horizon 10 days 10 days
Frequency of calculation Daily Daily
Historical Data 1 previous year 1 stressed year
Back-Test Yes Not for the moment
One can notice that the confidence level is lower for the ES than for the VaR. This difference
is due to the fact that the ES is systematically greater than the VaR and keeping a 99%
confidence level would have been overly conservative, leading to a much larger capital reserve
for banks.
2.2.3. ES Calculation
The calibration of ES is based on the same methodologies as the VaR’s. It mainly consists in
estimating the right P&L distribution, which can be done using one of the 3 following methods:
variance-covariance, historical and Monte-Carlo simulations. These methodologies are
described in part 2.1.2.
Once the P&L distribution is known, the Expected Shortfall is calculated as the mean of returns
exceeding the VaR.
‫ܵܧ‬∝,௧ሺܺሻ = −
1
1−∝
න ܲ௧
ିଵ
ሺ‫ݑ‬ሻ݀‫ݑ‬
ଵ
∝
Where :
- X is the P&L distribution;
- t is the time point;
- ∝ is the confidence level;
- ܲ௧
ିଵ
ሺ∝ሻ is the inverse of the VaR function of X at a time t and for a given ∝ confidence
level.
One must note that the ES is calibrated on a stressed period as it is actually a stressed ES in the
FRTB. The chosen period corresponds to the worst 250 days for the bank’s current portfolio in
recent memory.
© Global Research & Analytics Dept.| 2018 | All rights reserved
11
2.2.4. VaR vs. ES
This section aims at showing the main differences (advantages and drawbacks) between the
VaR and the ES. The following list is not exhaustive and will be summarized in Table 2:
• Amount: given a confidence level X%, the VaR X% is always inferior to the ES X%,
due to the definition of ES as the mean of losses beyond the VaR. This is, in fact, the
reason why the regulatory confidence level changed from 99% (VaR) to 97.5% (ES), as
banks couldn’t have coped with such a high amount of capital otherwise.
• Tail loss information: as mentioned earlier, one of the main drawbacks of the VaR is
its inability to predict tail losses. Indeed, the VaR predicts the probability of an event
but does not consider its severity. For example, a 99% VaR of 1 million predicts that
during the following 100 days, 1 loss will exceed 100k, but it doesn’t make any
difference between a loss of 1.1 million or 1 billion. The ES on the other hand is more
reliable as it does give information on the average amount of the loss than can be
expected.
• Consistency: the ES can be shown as a coherent risk measure contrary to the VaR. In
fact, the VaR lacks a mathematical property called sub-additivity, meaning the sum of
risk measures (RM) of 2 separate portfolios A and B should be equal or greater than the
risk measure of the merger of these 2 portfolios.
ܴ‫ܯ‬஺ + ܴ‫ܯ‬஻ ≥ ܴ‫ܯ‬஺ା஻
However, in the case of the VaR, one can notice that it does not always satisfy this
property which means that in some cases, it does not reflect the risk reduction from
diversification effects. Nonetheless, apart from theoretical cases, the lack of sub-
additivity of the VaR rarely seems to have practical consequences.
• Stability: the ES appears to be less stable than the VaR when it comes to the distribution.
For fat-tailed distributions for example, the errors in estimating an ES are much greater
than those of a VaR. Reducing the estimation error is possible but requires increasing
the sample size of the simulation. For the same error, an ES is costlier than the VaR
under a fat-tailed distribution.
• Cost / Time consumption: ES calibration seems to require more time and data storage
than the VaR’s. First, processing the ES systematically requires more work than
processing the VaR (VaR calibration being a requirement for ES calculation). Second,
the calibration of the ES requires more scenarios than for the VaR, which means either
more data storage or more simulations, both of which are costly and time consuming.
Third, most banks don’t want to lose their VaR framework, having spent vast amount
of time and money on its development and implementation. These banks are likely to
calculate the ES as a mean of several VaR, which will heavily weigh down calibration
time.
• Facility to back-test: one of the major issue for ES is its difficulty to be back-tested.
Research and financial institutions have been exploring this subject for some years now
but still struggle to find a solution that is both mathematically consistent and practically
implementable. This difficulty is mainly due to the fact that ES is characterised as
“model dependant” (contrary to the VaR which is not). This point is to be explained in
the following section.
© Global Research & Analytics Dept.| 2018 | All rights reserved
12
• Elicitability: The elicitability corresponds to the definition of a statistical measure that
allows to compare simulated estimates with observed data. The main purpose of this
measure is to assess the relevance and accuracy of the model used for simulation. To
achieve this, one will introduce a scoring function S(x, y) which is used to evaluate the
performance of x (forecasts) given some values on y (observations). Examples of
scoring functions are squared errors where S(x, y) = (x−y)² and absolute errors where
S(x, y) = |x − y|. Given this definition and due to the nature of the Expected Shortfall,
one will understand that the ES is not elicitable since there is no concrete observed data
to be compared to the forecasts.
VaR ES
Amount -
Originally greater than the VaR,
but change of regulatory
confidence level from 99% to
97.5%
Tail loss
information
Does not give information on the
severity of the loss
Gives the average amount of the
loss that can be expected
Consistency
Lack of sub-additivity:
VaR1+2 > VaR1 + VaR2
Consistent
Stability Relatively stable
Less stable: the estimation error
can be high for some distribution
Cost / Time
consumption
- Always greater than the VaR's
Facility to
back-test
Easy to back-test
Difficult to back-test due mainly
due to the fact that the back-
testing of ES is model dependant
Elicitability Is elicitable Isn’t elicitable
Table 2 - VaR versus ES: main advantages and drawbacks
2.2.5. ES Back-Testing
As mentioned above, the main issue with the ES is its difficulty to be back-tested. Although
research and institutions have already been searching for solutions to this issue for more than
10 years now, no solution seems to satisfy both mathematical properties and practical
requirements. Moreover, following the FRTB evolution and its change from VaR to ES for
Capital requirements, it has become a priority to be consistent in the use of risk measure (i.e.
using the same measure for both Capital calculations and back-testing).
One can wonder why the ES is so difficult to back-test. The main issue is due to the fact that
ES back-testing is characterized as model dependent, unlike the VaR which is model
independent. Both notions will be described in the following paragraphs.
Let’s consider the traditional VaR back-testing and why it is not applicable to the ES. When
back-testing VaR, one would look each day at the return “t” to see if it exceeded the VaR[t-1].
© Global Research & Analytics Dept.| 2018 | All rights reserved
13
The number of exceedances, corresponding to the sum of exceedance occurrences, would then
be compared to the quantile the VaR is supposed to reflect.
If one considers that the P&L distribution is likely to change over time, the VaR levels, to which
the returns are compared with, also change over time. One would therefore look at the number
of exceedances over a value that possibly changes every day. To illustrate this point, the exact
same return could be considered as an exceedance one day, and not on another day.
When back-testing the VaR, although the reference value (i.e. the VaR) changes over time,
calculating the total number of exceedances still makes sense as one can find convergence of
the results. This mathematical property characterizes the VaR back-testing as model
independent: results are still consistent when the P&L distribution changes during the year.
In the case of the ES however, one would not only look at the number of exceedances but also
their values. This additional information complicates the task as there is no convergence when
looking at the mean of the exceedances. To make sense, the P&L distribution (or more exactly
the VaR) should remain constant during the time horizon. The back-testing of ES is therefore
characterized as model dependent.
The characterization of ES back-testing as model dependent is one of the main issue that
financial institutions experience. Unlike the VaR, they cannot only compare the theoretical ES
with the observed value at the end of the year since in most cases the later value does not make
sense.
This constraint, combined with limitations in both data storage and time-implementation, makes
it difficult for financial institutions and researchers to find new ways to back-test the ES.
The following section aims at presenting the main results and findings of the last 10 years of
research and presents alternative solutions introduced by the Global Research & Analytics team
of Chappuis Halder & Co.
© Global Research & Analytics Dept.| 2018 | All rights reserved
14
3. ES Back-Testing
As mentioned earlier, the purpose of this chapter is to present the latest developments in terms
of ES back-testing methodologies and to introduce new methodologies developed by the Global
Research & Analytics (GRA) team of Chappuis Halder & Co.
3.1. Existing Methods
3.1.1. Wong’s Saddle point technique
Wong (2008) proposed a parametric method for the back-testing of the ES. The purpose of the
methodology is to find the probability density function of the Expected Shortfall, defined as a
mean of returns exceeding the VaR. Once such distribution is found, one can find the
confidence level using the Lugannani and Rice formulas, which provide the probability to find
a theoretical ES inferior to the sample (i.e. observed) ES. The results of the back-test depend
on this “p-value”: given a confidence level of 95%, the p-value must be at least superior to 5%
to accept the test.
The method relies on 2 major steps:
1. Inversion formula: find the PDF knowing the moment-generating function
2. Saddle point Technique to approximate the integral
Figure 3 - Overview of Wong's Approach
The ideas of the parametric method proposed by Wong (2008) are as follows. Let	ܴ ൌ
ሼܴଵ, ܴଶ, ܴଷ … ሽ be the portfolio returns which has predetermined CDF and PDF denoted by φ
and ݂ respectively. We denote by	q ൌ φିଵ
ሺߙሻ the theoretical α-quantile of the returns.
The statistic used to determine the observed expected shortfall is the following:
‫ܵܧ‬ே
ఈ
= −ܺത = −
෍ ܴ௧‫ܫ‬ሼோ೟ழ୯ሽ
୒
௧ୀଵ
෍ ‫ܫ‬ሼோ೟ழ୯ሽ
୒
௧ୀଵ
Where ‫ܫ‬	ሼ୶ழ୯ሽ is the logical test whether the value x is less than the ܸܴܽఈ ൌ ‫ݍ‬
The purpose of this method is to analytically estimate the density of this statistic and then see
where is positioned the observed value with respect to this density.
© Global Research & Analytics Dept.| 2018 | All rights reserved
15
Are denoted by ݊, the realised quantity	෍ ‫ܫ‬ሼோ೟ழ୯ሽ
୒
௧ୀଵ
which is the number of exceedances
observed in our sample, and ܺ‫ݐ‬ the realised return exceedances below the ߙ-quantile q. The
observed expected shortfall is then:
‫ܵܧ‬ே
ఈ෪ ൌ −‫̅ݔ‬ ൌ −
∑ ܺ௧
୬
௧ୀଵ
݊
Reminder of the moment-generating function:
The moment-generating function (MGF) provides an alternative way for describing a random
variable, which completely determines the behaviour and properties of the probability
distribution of the random variable X:
‫ܯ‬௑ሺ‫ݐ‬ሻ ൌ ॱሾ݁௧௑ሿ
The inversion formula that allows to find the density once we have the MGF is the following:
݂௑ሺ‫ݐ‬ሻ ൌ
1
2ߨ
න ݁ି௜௨௧
‫ܯ‬௑ሺ݅‫ݑ‬ሻ݀‫ݑ‬
ஶ
ିஶ
(1.1)
One of the known features of the moment-generating function is the following:	
‫ܯ‬௑ሺߙܺ + ߚܻሻ ൌ ‫ܯ‬௑ሺߙ‫ݐ‬ሻ ∙ ‫ܯ‬௒ሺߚ‫ݐ‬ሻ (1.2)
PROPOSITION 1:
Let ܺ be a continuous random variable with a density	ߙିଵ
݂ሺ‫ݔ‬ሻ, ‫ݔ‬ ∊ ሺ−∞, ‫ݍ‬ሻ. The moment
generating function of ܺ is then given by:
‫ܯ‬௑ሺ‫ݐ‬ሻ = ߙିଵ
݁‫݌ݔ‬ሺ‫ݐ‬ଶ
2⁄ ሻ݂ሺ‫ݍ‬ − ‫ݐ‬ሻ (1.3)
and its derivatives with respect to ‫ݐ‬ are given by:
‫ܯ‬௑
ᇱ ሺ‫ݐ‬ሻ = ‫ݐ‬ ∙ ‫ܯ‬௑ሺ‫ݐ‬ሻ − ߙିଵ
∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ (1.4)
‫ܯ‬௑
ᇱᇱሺ‫ݐ‬ሻ = ‫ݐ‬ ∙ ‫ܯ‬௑
ᇱ ሺ‫ݐ‬ሻ + ‫ܯ‬௑ሺ‫ݐ‬ሻ − ‫ߙ	ݍ‬ିଵ
∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ (1.5)
‫ܯ‬௑
ሺ௠ሻ
ሺ‫ݐ‬ሻ ൌ ‫ݐ‬ ∙ ‫ܯ‬௑
ሺ௠ିଵሻ
ሺ‫ݐ‬ሻ + ሺ݉ − 1ሻ‫ܯ‬௑
ሺ௠ିଵሻ
ሺ‫ݐ‬ሻ − ‫ݍ‬௠ିଵ
ߙିଵ
∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ
where ݉ ≥3
(1.6)
Using these, we can also show that the mean and variance of ܺ can be obtained easily:
ߤ௑ = ॱሾܺሿ = −
݂ሺ‫ݍ‬ሻ
ߙ
ߪ௑
ଶ
= ‫ݎܽݒ‬ሾܺሿ = 1 −
‫݂ݍ‬ሺ‫ݍ‬ሻ
ߙ
− ߤ௑
ଶ
The Lugannani and Rice formula
Lugannani and Rice (1980) provide a method which is used to determine the cumulative density
function of the statistic ܺത (1.1).
© Global Research & Analytics Dept.| 2018 | All rights reserved
16
It is supposed that the moment-generating function of the variable ܺ௧ = ܴ௧‫ܫ‬ሼோ೟ழ୯ሽ is known.
Using the property (1.2), one can compute ‫ܯ‬ଡ଼ഥሺ‫ݐ‬ሻ = ൬‫ܯ‬௑ ቀ
௧
௡
ቁ൰
௡
and via the inversion formula
will obtain:
݂ଡ଼ഥሺ‫ݔ‬ሻ =
1
2ߨ
න ݁ି௜௧௫
ቆ‫ܯ‬௑ ൬݅
‫ݐ‬
݊
൰ቇ
௡
݀‫ݐ‬
ஶ
ିஶ
=
݊
2ߨ
න ݁௡ሺ௄ሾ௜௧ሿି௜௧௫ሻ
݀‫ݐ‬
ஶ
ିஶ
where ݂ଡ଼ഥሺ‫ݔ‬ሻ denotes the PDF of the sample mean and ‫ܭ‬ሾ‫ݐ‬ሿ = ݈݊ ‫ܯ‬௑ሾ‫ݐ‬ሿ is the cumulative-
generating function of ݂௑ሺ‫ݔ‬ሻ. Then the tail probability can be written as:
ܲሺܺത > ‫̅ݔ‬ሻ = න ݂ଡ଼ഥሺ‫ݐ‬ሻ݀‫ݐ‬ =
௤
௫̅
1
2ߨ ݅
න ݁௡ሺ௄ሾ௧ሿି௧௫̅ሻ
Ωା௜ஶ
Ωି௜ஶ
݀‫ݐ‬
‫ݐ‬
where Ω is a saddle-point1
satisfying:
(1.9)
The saddle-point is obtained by solving the following expression deduced from (1.4) and (1.5):
(1.10)
Finally, Lugannani and Rice propose to approximate this integral as follows:
PROPOSITION 2:
Let Ω be a saddle-point satisfying the equation (1.9) and define:
߳ = Ω ඥ݊‫ܭ‬ᇱᇱሺΩሻ
ߜ = ‫݊݃ݏ‬ሺΩሻට2݊ ቀΩ‫ܵܧ‬ே
ఈ෪ − ‫ܭ‬ሺΩሻቁ
where ‫݊݃ݏ‬ሺΩሻ equals to zero when Ω = 0, or takes the value of 1/ሺ−1) when Ω < 0/ሺΩ > 0).
Then the tail probability of ‫̅ݔ‬ less than or equal to the sample mean ‫̅ݔ‬ is given by
ܲሺܺത ≤ ‫̅ݔ‬ሻ =
‫ە‬
ۖۖ
‫۔‬
ۖۖ
‫ۓ‬ φሺߜሻ − ݂ሺߜሻ ∙ ቆ
1
߳
−
1
ߜ
+ ܱ൫݊ିଷ ଶ⁄
൯ቇ ݂‫ݎ݋‬ ‫̅ݔ‬ < ‫ݍ‬ ܽ݊݀ ‫̅ݔ‬ ≠ ߤ௑
1 ݂‫ݎ݋‬ ‫ݔ‬ഥ > ‫ݍ‬
−
1
2
+
‫ܭ‬ሺଷሻ
ሺ0ሻ
6ට2ߨ݊൫‫ܭ‬ᇱᇱሺΩሻ൯
ଷ
+ ܱ൫݊ିଷ ଶ⁄
൯ ݂‫ݎ݋‬ ‫̅ݔ‬ = ߤ௑
1
In mathematics, a saddle point or minimax point is a point on the surface of the graph of a function where the
slopes (derivatives) of orthogonal function components defining the surface become zero (a stationary point) but
are not a local extremum on both axes.
‫ܭ‬ᇱሺΩሻ = ‫̅ݔ‬
‫ܭ‬ᇱሺΩሻ =
‫ܯ‬ᇱሺΩሻ
‫ܯ‬ሺΩሻ
= Ω − expሺ‫ݍ‬Ω − Ωଶ
2⁄ ሻ
݂ሺ‫ݍ‬ሻ
φሺq − Ωሻ
= ‫̅ݔ‬
© Global Research & Analytics Dept.| 2018 | All rights reserved
17
Once the tail probability is obtained, one can compute the observed expected shortfall ‫ܵܧ‬ே
ఈ෪ and
carry out a one-tailed back-test to check whether this value is too large. The null and alternative
hypotheses can be written as:
H0: ‫ܵܧ‬ே
ఈ෪ = ‫ܵܧ‬ே
ఈതതതതത versus H1: ‫ܵܧ‬ே
ఈ෪ > ‫ܵܧ‬ே
ఈതതതതത
where ‫ܵܧ‬ே
ఈതതതതത denotes the theoretical expected shortfall under the null hypothesis.
The p-value of this hypothesis test is simply given by the Lugannani and Rice formula as
Example:
For a portfolio composed of one S&P500 stock, it is assumed that the bank has predicted that
the daily P&L log-returns are i.i.d and follow a normal distribution calibrated on the
observations of the year 2014. Then all the observations of the year 2015 are normalised so one
can consider that the sample follows a standard normal distribution ࣨሺ0,1ሻ. Using Wong’s
method described above, the steps to follow in order to back-test the ‫ܵܧ‬ under these
assumptions and with ߙ = 2.5% are:
1. Calculate the theoretical ߙ-quantile: ‫ݍ‬ = −φିଵ
ሺ2.5%ሻ = 1.96
2. Calculate the observed ES of the normalized log-returns of 2015: ܺത = −2.84, ݊ = 19
3. Solve the equation (1.10) to find the saddle-point: Ω = −3.23
4. Calculate ‫ܭ‬ሾΩሿ and ‫ܭ‬ᇱᇱሾΩሿ where
‫ܭ‬ᇱᇱሾtሿ =
݀
݀‫ݐ‬
‫ܯ‬ᇱሺ‫ݐ‬ሻ
‫ܯ‬ሺ‫ݐ‬ሻ
=
‫ܯ‬ᇱᇱሺ‫ݐ‬ሻ‫ܯ‬ሺ‫ݐ‬ሻ − ‫ܯ‬ᇱሺ‫ݐ‬ሻଶ
‫ܯ‬ሺ‫ݐ‬ሻଶ
In our case, we found: ‫ܭ‬ሾΩሿ = 8.80 and ‫ܭ‬ᇱᇱሾΩሿ = 2.49
5. Calculate the tail probability of ‫ܵܧ‬ே
ఈ෪ and compare to the level of confidence tolerated
by the ‫݌‬௩௔௟௨௘test: ܲ൫‫ܵܧ‬ே
ఈ
≤ ‫ܵܧ‬ே
ఈ෪ ൯~0
In this example, the null hypothesis is rejected. Not only does it show that the movements of
2015 cannot be explained by the movements of 2014, but it also shows that the hypothesis of a
normal distribution of the log-returns is not likely to be true.
3.1.2. Righi and Ceretta
The method of Righi and Ceretta is less restrictive than Wong’s one in the sense that the law of
the returns may vary from one day to another. However, it requires the knowledge of the
truncated distribution below the negative VaR level.
‫݌‬௩௔௟௨௘ = ܲሺܺത ≤ ‫̅ݔ‬ሻ
© Global Research & Analytics Dept.| 2018 | All rights reserved
18
Figure 4 - Righi and Ceretta - Calculating the Observed statistic test
Figure 5 - Righi and Ceretta - Simulating the statistic test
Figure 6 - Righi and Ceretta – Overview
© Global Research & Analytics Dept.| 2018 | All rights reserved
19
In their article, they consider that the portfolio log-returns follow generalized autoregressive
conditional heteroscedastic ൫‫ܪܥܴܣܩ‬ሺ‫,݌‬ ‫ݍ‬ሻ൯ model, which are largely applied in finance:
‫ݎ‬௧ = ߤ௧ + ߝ௧, ߝ௧ = ߪ௧‫ݖ‬௧
ߪ௧
ଶ
= ω + ෍ ߩ௜ߝ௧ି௜
ଶ
௣
+ ෍ ߚ௝ߪ௧ି௝
ଶ
௤
where ‫ݎ‬௧ is the log-return, ߤ௧ is the conditional mean, ߪ௧ is the conditional variance and ߝ௧ is the
shock over the expected value of an asset in period t; ߱, ߩ and ߚ are parameters; ‫ݖ‬௧ represents
the white noise series which can assume many probability distribution and density functions
denoted respectively ‫ܨ‬௧ and ݂௧.
The interest behind using this model is mainly the fact that we can easily predict the truncated
distribution properties, mainly the ߙ − quantile/ES:
ܳఈ,௧ = ߤ௧ + ߪ௧‫ܨ‬ିଵሺߙሻ
‫ܵܧ‬௧ = ߤ௧ + ߪ௧ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻሿ (2.1)
But one can also calculate the dispersion of the truncated distribution as follows:
ܵ‫ܦ‬௧ = ටܸܽ‫ݎ‬൫ߤ௧ + ߪ௧‫ݖ‬௧ห‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻ൯ = ߪ௧ටܸܽ‫ݎ‬൫‫ݖ‬௧ห‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻ൯ (2.2)
The ES and SD are mainly calculated via Monte-Carlo simulations. In some cases, it is possible
to have their parametric formulas:
1. Case where ࢠ࢚ is normal |
It is assumed that ‫ݖ‬௧ is a standard Gaussian noise ࣨሺ0,1ሻ, which is a very common case. The
expectation of the truncated normal distribution is then:
ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ =
݂ሺܳሻ
‫ܨ‬ሺܳሻ
substituting this expression in the equation of (2.1), it is obtained:
‫ܵܧ‬௧ = ߤ௧ + ߪ௧
݂ ቀ‫ܨ‬ି૚ሺߙሻቁ
ߙ
The variance of a truncated normal distribution below a value ܳ is given by:
ܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ = 1 − ܳ
݂ሺܳሻ
‫ܨ‬ሺܳሻ
− ൬
݂ሺܳሻ
‫ܨ‬ሺܳሻ
൰
ଶ
Substituting this expression in the variance term of the formula (2.2), it is deduced:
ܵ‫ܦ‬௧ = ߪ௧ ∙ ൥1 − ‫ܨ‬ିଵሺߙሻ
݂ሺ‫ܨ‬ିଵሺߙሻሻ
ߙ
− ቆ
݂ሺ‫ܨ‬ିଵሺߙሻሻ
ߙ
ቇ
ଶ
൩
ଵ
ଶ
2. Case where ࢠ࢚ follows a Student’s distribution |
© Global Research & Analytics Dept.| 2018 | All rights reserved
20
It is assumed that ‫ݖ‬௧ is a Student’s ‫ݐ‬ distributed random variable with ‫ݒ‬ degrees of freedom.
One can show that the truncated expectation is as follow:
ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ =
1
2√‫ݒ‬ ‫ܨ‬ሺܳሻߚ ቀ
‫ݒ‬
2
,
1
2
ቁ
൭ܳଶ
‫ܪܩ‬ ቆ
1 + ‫ݒ‬
2
, 1; 2; −
ܳଶ
2
ቇ൱
substituting this expression in the expectation of (1), it is obtained:
‫ܵܧ‬௧ = ߤ௧ + ߪ௧ ൮
1
2√‫ߙݒ‬ ∙ ߚ ቀ
‫ݒ‬
2 ,
1
2ቁ
൭‫ܨ‬ିଵሺߙሻଶ
‫ܪܩ‬ ቆ
1 + ‫ݒ‬
2
, 1; 2; −
‫ܨ‬ିଵሺߙሻଶ
2
ቇ൱൲
where ߚሺ∙,∙ሻ and ‫ܪܩ‬ሺ∙ , ∙ ; ∙ ; ∙ሻ are the Beta and Gauss hyper geometric functions conform to:
ߚሺܽ, ܾሻ = න ‫ݑ‬௔ିଵሺ1 − ‫ݑ‬ሻ௕ିଵ
ଵ
଴
݀‫ݑ‬
‫ܪܩ‬ሺܽ, ܾ; ܿ; ‫ݖ‬ሻ = ෍
ሺܽሻ௞ሺܾሻ௞
ሺܿሻ௞
‫ݖ‬௞
݇!
ஶ
௞ୀ଴
Where ሺ∙ሻ௞ denotes the ascending factorial.
Similarly, for the standard normal SD, it is deduced from the variance of a truncated Student’s
t distribution:
ܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ =
1
3√‫ܨݒ‬ሺܳሻߚ ቀ
‫ݒ‬
2
,
1
2
ቁ
Qଷ
‫ܪܩ‬ ቆ
1 + ‫ݒ‬
2
,
3
2
;
5
2
; −
ܳଶ
2
ቇ
Again, substituting this variance term in (2.2), one will obtain an analytical form of the standard
deviation of the truncated distribution:
ܵ‫ܦ‬௧ = ߪ௧ ∙ ቎
1
3√‫ߙݒ‬ ∙ ߚ ቀ
‫ݒ‬
2
,
1
2
ቁ
‫ܨ‬ିଵሺߙሻଷ
‫ܪܩ‬ ቆ
1 + ‫ݒ‬
2
,
3
2
;
5
2
; −
‫ܨ‬ିଵሺߙሻଶ
2
ቇ቏
ଵ
ଶ
Once the ED and SD are expressed and computed, for each day in the forecast period for
which a violation in the predicted Value at Risk occurs, the following test statistic is defined:
‫ܶܤ‬௧ =
‫ݎ‬௧ − ‫ܵܧ‬௧
ܵ‫ܦ‬௧
‫ܶܪ‬ =
‫ݖ‬௧ − ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ
ඥܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ
(2.3)
(2.4)
Where ‫ݖ‬௧ is the realisation of the random variable ܼ௧ (in the Garch process, it is supposed that
ܼ௧ is iid but it is not necessarily the case).
© Global Research & Analytics Dept.| 2018 | All rights reserved
21
The idea of Righi and Ceretta is to see where the value of ‫ܶܤ‬௧ is situated with respect to the
“error” distribution of the estimator ‫ܶܪ‬ =
௓೟ିॱሾ௓೟|௓೟ழொሿ
ඥ௏஺ோሾ௓೟|௓೟ழொሿ
by calculating the probability
ℙሺ‫ܶܪ‬ < ‫ܶܤ‬௧ሻ and then take the median (or eventually the average) of these probabilities over
the time as a p-value over a certain confidence level ‫.݌‬
They propose to calculate this distribution using Monte-Carlo simulations following this
algorithm:
1) Generate ܰ times a sample of ݊ − ݅݅݀ random variable ‫ݑ‬௜௝ under the distribution ‫,ܨ‬ ݅ =
1, … , ݊; ݆ = 1, … , ܰ;
2) Estimating for each sample the quantity ॱൣ‫ݑ‬௜௝|‫ݑ‬௜௝ < ‫ݍ‬൫‫ݑ‬௜௝൯൧ and ܸ‫ܴܣ‬ൣ‫ݑ‬௜௝|‫ݑ‬௜௝ <
‫ݍ‬൫‫ݑ‬௜௝൯൧ where ‫ݍ‬൫‫ݑ‬௜௝൯ is the ߙ-th worst observation over the sample ‫ݑ‬௜௝
3) Calculate for each realisation ‫ݑ‬௜௝, the quantity ℎ௜௝ =
௨೔ೕିॱൣ௨೔ೕ|௨೔ೕழ௤൫௨೔ೕ൯൧
ට௏஺ோൣ௨೔ೕ|௨೔ೕழ௤൫௨೔ೕ൯൧
which is a
realisation of the random variable ‫ܶܪ‬ defined above.
4) Given the actual ‫ܶܤ‬௧, estimate ℙሺ‫ܪ‬௧ < ‫ܶܤ‬௧ሻ using the sample ℎ௜௝ as an empirical
distribution of ‫ܪ‬௧
5) Determine the test p-value as the median of ℙሺ‫ܪ‬௧ < ‫ܶܤ‬௧ሻ and compare the value to the
test level fixed at ‫.݌‬
The methodology has been applied on the test portfolio of the normalized daily returns for the
2014 to 2015 year. The results, where ‫ݖ‬௧ is a standard Gaussian noise ࣨሺ0,1ሻ, are the following:
Table 3 Summary of the Righi and Ceretta implementation
For the test value of a fixed level at 97.5%, the Righi and Ceretta methodology gives satisfactory
results with a pass for both the median and mean computations. Finally, one can conclude that
this methodology is acceptable, nevertheless it relies on a parametric assumption that may
not fit the portfolio, which is not captured by the test statistics.
Distribution* Standard Normal
Level of freedom* none
Confidence Level of the ES* 97,5%
Scenario 05/08/2014
VaRth 1,96
ESth 2,34
VaRobs 2,38
Number of exceedances 12
ESobs 2,49
var(X<-VaRobs) 0,0941
Critical Value** - median 0,00%
Critical Value** - mean 0,00%
Output Final Output PASS
BT Results
Inputs
© Global Research & Analytics Dept.| 2018 | All rights reserved
22
In the table below are displayed the exceedance rates and the associated test statistics:
Table 4 Exceedance rates and test statistics of the portfolio
3.1.3. Emmer, Kratz and Tasche
The method presented by Emmer and al. (2013) consists in replacing the ES back-testing with
a VaR back-testing. This substitution relies on the approximation of the ES as a mean of several
VaR levels, according to the following formula:
‫ܵܧ‬∝ =
1
1−∝
න ܸܴܽ௨݀‫ݑ‬
ଵ
∝
= lim
ே→ାஶ
1
ܰ
෍ ܸܴܽ∝ା௞ቀ
ଵି∝
ே
ቁ
ேିଵ
௞ୀ଴
≈
1
5
൬ܸܴܽ∝ + ܸܴܽ∝ା
ଵି∝
ହ
+ ܸܴܽ∝ାଶ∙
ଵି∝
ହ
+ ܸܴܽ∝ାଷ∙
ଵି∝
ହ
+ ܸܴܽ∝ାସ∙
ଵି∝
ହ
൰
Hence, assuming α=97.5%, the formula becomes:
‫ܵܧ‬ଽ଻.ହ% ൌ
1
5
ሺܸܴܽଽ଻.ହ% + ܸܴܽଽ଼% + ܸܴܽଽ଼.ହ% + ܸܴܽଽଽ% + ܸܴܽଽଽ.ହ%ሻ
Therefore, by back-testing the different VaR 97.5%, 98%, 98.5%. 99% and 99.5% one should
complete the back-testing of ES. If all these levels of VaR are validated, then the ES should be
considered as well.
However, this methodology has many drawbacks since one should determine an appropriate N
that ensures that the average VaR converges to the ES, otherwise it would imply too many
uncertainties in the approximation. Given the value of N, the tests could not be implemented
due to high computation time (calculation of N different VaRs). For instance, in the Emmer and
al. proposal, it is assumed that the convergence is obtained for N=5 which explains the means
performed on 5 Value-at-Risk.
Finally, one should also propose an adapted traffic light table since it may be not relevant
or too restrictive to require a pass on all the VaR levels.
Exceedance # Exceedance Value Test statistic P(Ht<Bt)
1 2,128- 1,179 2,4%
2 2,128- 1,180 2,9%
3 2,304- 0,606 1,7%
4 1,996- 1,610 4,3%
5 2,361- 0,420 1,4%
6 2,879- 1,270- 0,3%
7 2,831- 1,111- 0,4%
8 3,082- 1,930- 0,2%
9 3,077- 1,916- 0,2%
10 2,681- 0,623- 0,6%
11 2,396- 0,306 1,3%
12 2,014- 1,550 4,1%
© Global Research & Analytics Dept.| 2018 | All rights reserved
23
3.1.4. Summary of the methods
In this section, it is summarised the three different methods in term of application and
implementation as well as their drawbacks:
Wong’s method |
Figure 7 - Summary of Wong’s methodology
© Global Research & Analytics Dept.| 2018 | All rights reserved
24
Righi and Ceretta Method |
Figure 8 - Summary of Righi and Cereta’s methodology
Emmer, Kratz and Tasche Method |
Figure 9 - Summary of Emmer, Kratz and Tasche’s methodology
© Global Research & Analytics Dept.| 2018 | All rights reserved
25
3.2. Alternative Methods
In the following sections are presented alternative methods introduced by the Global Research
& Analytics (GRA) department of Chappuis Halder &Co.
First of all, it is important to note that some of the following methods rely on a major hypothesis,
which is the consistency of the theoretical VaR for a given period of time. This strong – and
not often met - assumption is due to the use of what is called “observed ES”.
The observed ES reflects the realised average loss (above the 97.5% quantile) during a 1-year
time period as illustrated in the below formula:
‫ܵܧ‬௢௕௦ ൌ
∑ ܺ௧ାଵ‫ܫ‬ሺܺ௧ାଵ > ܸܴܽ௧ሻଶହ଴
௧ୀଵ
ܰ
Where ܺ௧ corresponds to the return day ‫ݐ‬ and ܰ is the number of exceedances during the year
(ܰ ൌ ∑ ‫ܫ‬ሺܺ௧ାଵ > ܸܴܽ௧ሻଶହ଴
௧ୀଵ with ‫ܫ‬ the identity function)1
.
However, this value only makes sense as long as the theoretical VaR (or more broadly the P&L
distribution used for calibration) doesn’t change during this time period. Should the opposite
occur, one would look at the loss beyond a level that changes with time, and the average of
these losses would lose any meaning.
3.2.1. ES Benchmarking
This method focuses on the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ between the theoretical ES (obtained by
calibration) and the observed ES (corresponding to realised returns). The main goal of this
methodology is to make sure the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ (back-testing date) is located within
the confidence interval. This interval can be found by recreating a distribution from historical
values. The output of the back-test depends on the position of the observed distance: if the value
is within the interval, the back-test is accepted, otherwise it is rejected.
The historical distribution is based on 5 years returns (i.e. 5*250 values). For each day of these
5 years, the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦, ‫ܵܧ‬௧௛ and ‫ܵܧ‬௢௕௦ is calculated as described in the introduction
of this section. The 1,250 values collected can therefore be used to build a distribution that fits
historical behaviour.
1
As mentioned in part 2.1.1, VaR is calculated with a 1-day time horizon. Therefore, the return that is compared
to the VaR[t] is the return X[t+1]
© Global Research & Analytics Dept.| 2018 | All rights reserved
26
Figure 10 - Illustration of ES Benchmarking Methodology
One of the main downside of the methodology is that it relies on the notion of observed ES.
However, as mentioned earlier, this particular value requires a constant VaR, which is not often
met in reality.
Finally, once the confidence interval is obtained, one can used it in order to back-test the
simulated ES on the future back-testing horizon.
3.2.2. Bootstrap
This methodology focuses on the value of the observed ES. As for the previous methodology,
the goal of this method is to verify that the observed ES is located within the confidence interval.
The latest can be found by recreating a distribution from historical values using the bootstrap
approach which is detailed below. The output of the test depends on the position of the observed
ES (back-testing date): if the value is in the interval, the back-test is accepted, otherwise it is
rejected.
In this methodology, the bootstrap approach is used to build a more consequent vector of returns
in order to find the distribution of the ES as the annual mean of returns exceeding the VaR. This
approach consists in simulating returns, using only values from a historical sample. The vector
formed by all simulated values therefore only contains historical data that came from the
original sample.
The overall methodology relies in 3 steps as illustrated in Figure 11:
1. The sample vector is obtained and contains the returns of 1-year data;
2. Use of the bootstrap method to create a bigger vector, filled only with values from the
sample vector. This vector will be called the “Bootstrap Vector”;
3. The final vector, used for compiling the distribution, is obtained by selecting only
returns exceeding the VaR from the bootstrap vector;
4. The distribution can be reconstructed, using the final vector.
© Global Research & Analytics Dept.| 2018 | All rights reserved
27
Figure 11 - Illustration of Bootstrap Methodology
3.2.3. Quantile Approaches
Whereas the Expected Shortfall is usually expressed as a value of the loss (i.e. in £, $, etc.), the
two methodologies Quantile 1 and Quantile 2 choose to focus on the ES as a quantile, or a
probability value of the P&L distribution. The two methods differ in the choice of the quantile
adopted for the approach.
The following paragraphs describe the two different options for the choice of the quantile.
Quantile 1
This methodology focuses on the quantile of the observed ES (back-testing date), in other words
the answer to the question: “which probability is associated to the value of a specific Expected
Shortfall in the P&L distribution?”
One must notice that this quantile is not the confidence level of the Expected Shortfall. Indeed,
let’s take a confidence level of 97.5% as requested by the regulation. It is possible to estimate
an observed VaR and therefore an observed ES as a mean of returns exceeding the VaR. The
observed quantile can be found by looking at the P&L distribution and spotting the probability
associated to the ES value. The ES being strictly greater than the VaR, the quantile will always
be strictly greater than 97.5%.
Figure 12 - Calculation of the Quantile Q1
Quantile 2
This methodology looks at the stressed-retroactive quantile of the observed ES (back-testing
date), that is to say the answer to the question: “To which quantile correspond the observed ES
© Global Research & Analytics Dept.| 2018 | All rights reserved
28
at the time of the back-testing if it was observed in the reference stressed period used for
calibration?”
Figure 13 - Calculation of the quantile Q2
Back-testing methodology
Once the choice of the quantile computation is done, the approach is the same for the two
methodologies: it consists in verifying that the calculated quantile at the date of the back-testing
is located in the confidence interval obtained from a reconstructed historical distribution. If the
quantile is within the confidence interval, the back-test is accepted, otherwise it is rejected.
The distribution is obtained using the same framework as for the ES Benchmarking
methodology (see Section 3.2.1). The quantile is computed each day over 5 years (for the first
method the observed quantile and for the second one, the stressed-retroactive quantile). Those
1,250 values are used to build a historical distribution of the chosen quantile and the confidence
interval immediately follows.
Figure 14 - Illustration of the Quantile 1 Methodology
3.2.4. Summary of the methods
ES Benchmarking |
© Global Research & Analytics Dept.| 2018 | All rights reserved
29
Figure 15- Summary of the ES Benchmarking methodology
Quantile method |
Figure 16 - Summary of the Quantile methodology
© Global Research & Analytics Dept.| 2018 | All rights reserved
30
Bootstrap method |
Figure 17 - Summary of the Bootstrap methodology
© Global Research & Analytics Dept.| 2018 | All rights reserved
31
4. Applications of the ES methodology and back-testing
4.1. ES simulations
In this section, the back-testing approaches presented in the 3.2 section have been applied
on simulations of the S&P 500 index1. Indeed, instead of performing back-testing on
parametric distributions; it has been decided to perform a back-testing exercise on simulated
values of an equity (S&P 500 index) based on Monte Carlo simulations. The historic levels of
the S&P 500 are displayed in Figure 18 below:
Figure 18 - S&P 500 Level – From January 2011 to December 2015
A stochastic model has been used in order to forecast the one-day return of the stock price
which has been compared to the observed returns. The stochastic model relies on a Geometric
Brownian Motion (hereafter GBM) and the simulations are done with a daily reset as it
could be done in a context of Market Risk estimation.
The stochastic differential equation (SDE) of a GBM in order to diffuse the stock price is as
follows:
݀ܵ ൌ ܵሺߤ݀‫ݐ‬ + ߪܹ݀ሻ
And the closed form solution of the SDE is:
ܵሺ‫ݐ‬ሻ ൌ ܵሺ0ሻ݁
൤൬ఓି
ఙమ
ଶ
൰௧ାఙௐሺ௧ሻ൨
Where:
− S is the Stock price
1
The time period and data selected (from January 2011 to December 2015) is arbitrary and one would obtain
similar results and findings with other data.
1 000
1 200
1 400
1 600
1 800
2 000
2 200
2 400
S&P 500 Level - January 2011 to December 2015
S&P 500 Level
© Global Research & Analytics Dept.| 2018 | All rights reserved
32
− ߤ is the expected return
− ߪ is the standard deviation of the expected return
− ‫ݐ‬ the time
− ܹሺ‫ݐ‬ሻ is a Brownian Motion
Simulations are performed on a day-to-day basis over the year 2011 and 1,000 scenarios are
produced per time points. Therefore, thanks to these simulations, it is possible to compute
a one-day VaR99% as well as a one-day ES97.5% of the return which are then compared to
the observed return price.
Both VaR and ES are computed as follows:
ܸܴܽଽଽ%ሺ‫ݐ‬ሻ ൌ ܳଽଽ% ቆ
ܵ௦௜௠ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
ቇ
‫ܵܧ‬ଽ଻.ହ%ሺ‫ݐ‬ሻ ൌ
∑ ൬
ܵ௦௜௠ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
൰ I
൜
ௌೞ೔೘ሺ௧ሻି ௌ೚್ೞሺ௧ିଵሻ
ௌ೚್ೞሺ௧ିଵሻ
ஹ௏௔ோవళ.ఱ%ሺ௧ሻൠ
௡
௜ୀଵ
∑ I
൜
ௌೞ೔೘ሺ௧ሻି ௌ೚್ೞሺ௧ିଵሻ
ௌ೚್ೞሺ௧ିଵሻ
ஹ௏௔ோవళ.ఱ%ሺ௧ሻൠ
௡
௜ୀଵ
Where n is the total number of scenarios per time point t.
The figure below shows the results of our simulations and computations of the VaR99% and
ES97.5%:
Figure 19 - Observed returns vs. simulations – From January 2011 to January 2012
-8,0%
-6,0%
-4,0%
-2,0%
0,0%
2,0%
4,0%
6,0%
S&P 500 - Observed vs. Simulations
returns 1d VaR - 99% ES -97,5%
© Global Research & Analytics Dept.| 2018 | All rights reserved
33
Figure 19 shows that in comparison to the observed daily returns, the VaR99% and the ES97,5%
gives the same level of conservativeness. This is further illustrated with Figure 20 where it is
observed that the level of VaR and Expected shortfall are close.
Figure 20 – Comparison of the VaR99% with the ES97.5% - From January 2011 to January 2012
When looking at Figure 20, one can notice that the ES97.5% doesn’t always lead to more
conservative results in comparison to the VaR99%. This is explained by the fact that the ES is
the mean of the values above the VaR97.5%, consequently and depending on the Monte Carlo
simulations it is realistic to observe ES97.5% slightly below the VaR99%.
Finally, when looking at the simulations, one can conclude that both risk measures are really
close. Indeed the distribution of the spread between the simulated VaR99% and the ES97.5% (see
Figure 21 below); it is observed that 95% of the spread between both risk measures are within
the ]-0.1%, 0.275%] interval.
-2,7%
-2,6%
-2,5%
-2,4%
-2,3%
-2,2%
-2,1%
-2,0%
-1,9%
-1,8%
-1,7%
S&P 500 - Comparison of the VaR with the ES
VaR - 99% ES - 97,5%
© Global Research & Analytics Dept.| 2018 | All rights reserved
34
Figure 21 – Spread VaR99% vs. ES97.5% - January 2011 to September 2015
Following the computation of these simulated ES, it can be concluded that in comparison
to a VaR measure, the ES is not overly conservative and severe measure. Given these
findings and knowing that the ES is a more consistent measure in comparison to the VaR (due
to the way it is estimated), it can be accepted as a suitable risk measure provided that a reliable
approach is used in order to back-test the results.
4.2. Back-test of the ES using our alternative methods
Following the previous conclusions, it has been decided to focus on some approaches
defined in section 3.2. That’s why an observed ES has been computed based on the daily
VaR97.5% (obtained via the MC simulations) and the observed returns over the year following
the simulation date. Its expression is as follows:
‫ܵܧ‬௢௕௦ሺ‫ݐ‬ሻ ൌ
∑ ܴሺ‫ݐ‬ + ݅ሻIሼோሺ௧ା௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ
௠
௜ୀଵ
∑ Iሼோሺ௧ା௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ
௠
௜ୀଵ
Where m is the number of days in the year following the date t and R(t) the daily return observed
at date t:
ܴሺ‫ݐ‬ሻ ൌ
ܵ௢௕௦ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ
As presented in the section 3.2.1, this observed ES has been compared to the theoretical daily
simulated ES.
1%
2%
3%
8%
13%
18% 18%
15%
11%
5%
2% 1%
0% 1% 1% 1% 0% 0% 0% 0%
(VaR99% - ES97,5%)
Distribution of the spread of the (VaR99% - ES97,5%)- From January
2011 to December 2015
© Global Research & Analytics Dept.| 2018 | All rights reserved
35
Figure 22 – Comparison of the theoretical ES against the observed ES - From January 2011 to January 2012
Figure 22 shows both theoretical and observed ES computed over the year 2011 whereas Figure
23 presents the distribution of the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦.
Figure 23 – Distribution of the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ - From January 2011 to January 2012
Figure 23 shows that, over the year 2011, the observed ES is lower than the theoretical ES in
98% of the case and 95% of the distance range between [0.04%; 0.95%]. Based on these results,
it is possible to define a 95% confidence interval for the future comparison of the observed ES
vs. the theoretical ES in order to assess the accuracy and conservativeness of the theoretical ES.
This confidence interval has been applied to the data of the year 2012, 2013 and 2014 where:
-3,5%
-3,0%
-2,5%
-2,0%
-1,5%
-1,0%
Theoretical ES vs. Observed ES - From January 2011 to
January 2012
ES (th.) ES (obs.)
2,0%
4,0%
12,6%
10,7%
7,5%
6,7%
9,9%
11,1%
18,2%
12,3%
4,0%
0,4% 0,8%
ES(th.)-ES(obs.)distribution -From 05/01/2011 to 05/01/2012
© Global Research & Analytics Dept.| 2018 | All rights reserved
36
- a positive result is obtained when the distance between the theoretical and observed
ES is below the lower bound (i.e. the theoretical ES is conservative),
- a neutral result is obtained when the distance is within the confidence interval (the
theoretical and observed ES are assumed close),
- a negative result is obtained when the distance is above the upper bound (i.e. the
theoretical ES lack of conservativeness).
Results are presented in Table 5 below, where it is noted that the interval computed over the
year 2011 leads to satisfactory results since a majority of positive results are observed.
Table 5 – Results of the ES back-testing – 2012, 2013 and 2014
The benefits of this approach is that it gives a way to back-test the level of the simulated
ES via the computation of thresholds based on the results of the previous year.
Nevertheless, one can challenge the way the observed ES is computed. Indeed, instead of
relying on a forward-looking approach; it could be computed via a backward-looking approach:
‫ܵܧ‬௢௕௦ሺ‫ݐ‬ሻ =
∑ ܴሺ‫ݐ‬ − ݅ሻIሼோሺ௧ି௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ
௠
௜ୀଵ
∑ Iሼோሺ௧ି௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ
௠
௜ୀଵ
This approach has been tested on the 2013 data. Figure 24 shows both theoretical and observed
ES computed over the year 2013 using a backward-looking approach whereas Figure 25 the
results of the forward-looking methodology.
# % # % # %
2012 72 28,7% 179 71,3% - 0,0%
2013 174 69,3% 77 30,7% - 0,0%
2014 159 63,3% 92 36,7% - 0,0%
Total 405 53,7% 348 46,2% 0 0,0%
Year
Positive Neutral Negative
© Global Research & Analytics Dept.| 2018 | All rights reserved
37
Figure 24 – Comparison of the theoretical ES against the observed ES (backward looking)- January 2013 to January 2014
Figure 25 – Comparison of the theoretical ES against the observed ES (forward looking)- January 2013 to January 2014
The comparison of both Figure 24 and Figure 25 reveals that the backward-looking approach
leads to a more conservative and consistent computation of the observed ES since the distance
between the simulations and the observations is marginal. Furthermore, the use of a
backward-looking approach can be implemented on a daily basis, whereas the forward-
looking relies on future observation of returns.
The results of the backward-looking approach have been used in order to recalibrate the
interval, as expected the new interval is now narrowed and is equal to [-0.07%; 0.26%]. The
results of the ES back-testing are presented in Table 6:
-2,5%
-2,4%
-2,3%
-2,2%
-2,1%
-2,0%
-1,9%
-1,8%
Theoretical ES vs. Observed ES (backward looking) - From
January 2013 to January 2014
ES (th.) ES (obs.)
-2,5%
-2,4%
-2,3%
-2,2%
-2,1%
-2,0%
-1,9%
-1,8%
Theoretical ES vs. Observed ES (forward looking) - From
January 2013 to January 2014
ES (th.) ES (obs.)
© Global Research & Analytics Dept.| 2018 | All rights reserved
38
Table 6 – Results of the ES back-testing (backward looking) – 2014 and 2015
When looking at Table 6, one can notice that the ES back-testing relying on the backward-
looking approach leads to more situations where the simulated ES are underestimated which is
explained by the interval being smaller.
Overall, this shows the complexity of back-testing the ES since it is less straight forward
than a VaR back-testing and based on the definition of the observed ES. Furthermore, it
can be noted in Table 5 and Table 6 that when using the ES as a risk measure it could lead
to instability in the back-testing results over years, which shows the importance of
defining a proper back-testing methodology.
Then, it has been decided to test the boostrap alternative method presented in the Section 3.2.
As a first step, a sample vector corresponding to the one-day returns of the year 2011 has been
computed. As a second step, the boostrap vector has been constructed; this vector is filled
with the values of the sample vector that have been selected randomly ten thousand times.
Figure 26 below shows the bootstrap vector distribution:
Figure 26 – Bootstrap Vector – Random sampling of the 2011 one-day returns - 10 000 observations
# % # % # %
2014 78 31,1% 172 68,5% 1 0,4%
2015 141 56,2% 21 8,4% 89 35,5%
Total 227 30,1% 495 49,3% 282 28,1%
Positive Neutral Negative
Year
0,4% 0,0% 1,2% 0,8%
5,9%
10,5%
25,8%
36,5%
13,6%
3,2%
1,1% 1,2%
Bootstrap Vectordistribution
© Global Research & Analytics Dept.| 2018 | All rights reserved
39
Finally, for each date of the year 2011, a final vector with all the value exceeding the daily
estimated VaR is estimated. For instance, as of January 6 2011, the estimated VaR97.5% is-1.97%
which leads to a final vector distribution as follows:
Figure 27 – Final Vector – 06.01.2011 – VaR97.5% = -1.97%
As such, for each time point of the year 2011, it is possible to estimate a Bootstrapped ES97.5%
which will be used as a reference value to back-test the simulated ES97.5%. The results of the ES
back-testing are presented in Figure 28 below:
Figure 28 – ES Comparisons – January 2011 to January 2012
43%
28%
5% 5%
9%
5%
0% 0% 0%
5%
Bootstrapped returnsabove theVaR - Final Vector - 06.01.2011
-3,3%
-3,1%
-2,9%
-2,7%
-2,5%
-2,3%
-2,1%
-1,9%
-1,7%
-1,5%
ES comparisons- 5 January2011 to January 2012
ES - 97,5% ES - 97,5% - Boostrap Observed ES (forward looking)
© Global Research & Analytics Dept.| 2018 | All rights reserved
40
When looking at Figure 28, one can compare the bootstrapped ES97.5% and the Observed ES as
reference value for back-testing purpose. In particular, it is observed that both curves are similar
for the first 9 months, then the observed ES significantly decreases from this date. Hence, the
bootstrapped ES appears to be more stable over the year and would be a more reliable
value to back-test the simulated ES since no breach are observed in comparison to the
observed ES where it fails on the last months. The same exercise has been done on the data
of the year 2015 and with an observed ES computed using a backward-looking approach.
Results are displayed in Figure 29 where similar conclusions are drawn.
Figure 29 - ES Comparisons – 02 January 2015 to 31 December 2015
-3,3%
-3,1%
-2,9%
-2,7%
-2,5%
-2,3%
-2,1%
-1,9%
-1,7%
-1,5%
ES comparisons - 2 january 2015 to 31 december2015
ES - 97,5% ES - 97,5% - Boostrap Observed ES (backward looking)
© Global Research & Analytics Dept.| 2018 | All rights reserved
41
5. Conclusion
This white paper presented the latest developments in terms of ES back-testing methodologies
and introduced new methodologies developed by the Global Research & Analytics (GRA) team
of Chappuis Halder & Co. After presenting and testing several methods that can be found in the
literature, it has been concluded that these methods may not fit for the purpose of a regulatory
back-testing since they rely on questionable assumptions or heavy computation time.
Then, in order to highlight the specificities of the back-testing of the Expected Shortfall, it has
been decided to implement and test the alternative methods that have been presented in this
article. Overall, it has been concluded that the complexity of back-testing the Expected Shortfall
relies on a proper definition of the observed ES, which should serve as a reference value for
back-testing. Indeed, it is clear that the estimation of a simulated Expected Shortfall is quite
straightforward since it relies on the computation of the simulated Value-at-Risk; this is not the
case of the computation of the observed Expected Shortfall. Indeed, in order to perform an
apple-to-apple comparison, one can’t just compare a simulated daily Expected Shortfall to a
daily observed return. Knowing that the Expected shortfall corresponds to the average value
above the worst loss defined under a specific quantile, it sounds natural to introduce these
features while estimating the observed Expected Shortfall.
Hence, in order to propose a relevant back-testing of the simulated ES, one should first decide
on the assumptions used for the computation of the observed ES. For example, it is important
to choose if it has to be computed with a backward or forward-looking approach, the number
of time points to use, the frequency of the calculations, etc.
These assumptions need to be chosen wisely in order to calibrate a relevant interval of
confidence for ES comparisons. Indeed, it has been discussed in this article that the back-testing
results could be different and instable with regards to the computation methodology of the
observed ES.
That’s why, on the basis of the tests performed in this article, it has been observed that the more
reliable back-testing results came from the computation of a bootstrapped ES since it as the
advantage of considering a P&L distribution constant during the time horizon, which produced
a stable but conservative level of confidence.
© Global Research & Analytics Dept.| 2018 | All rights reserved
42
References
− Consultative document Fundamental Review of the trading book: A revised market risk
framework, Basel Committee on Banking Supervision, January 2014
− Minimum capital requirements for market risk, Basel Committee on Banking Supervision,
January 2016
− Introducing three model-independent, non-parametric back-test methodologies for
Expected Shortfall, Carlo Acerbi and Balazs Szekely, December 2014
− Individual and Flexible Expected Shortfall Backtesting, Marcelo Righi and Paulo Sergio
Ceretta, June 2013
− Backtesting Expected Shortfall: the design and implementation of different backtests, Lisa
Wimmerstedt, August 2015
− Backtesting trading risk of commercial banks using expected shortfall, Woon K Wong,
2008
− Techniques for verifying the accuracy of risk measurement models, P.H. Kupiec, 1995

More Related Content

PDF
CH&Co Latest White Paper on VaR
Genest Benoit
 
PDF
CH&Co. latest white paper on VaR
Genest Benoit
 
PDF
Increase Hazard Discovery and Minimize Errors in your Process Hazard Analyses...
R. Maqbool Qadir
 
PDF
Risk modelling hot topics
Genest Benoit
 
PPT
Presentazione tesi
LucaGravina
 
PDF
Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails an...
Stefano Bochicchio
 
PDF
Market Risk Modelling after Basel III: New Challenges for Banks and Supervisors
Jean-Paul Laurent
 
PDF
VaR Or Expected Shortfall
Alex Kouam
 
CH&Co Latest White Paper on VaR
Genest Benoit
 
CH&Co. latest white paper on VaR
Genest Benoit
 
Increase Hazard Discovery and Minimize Errors in your Process Hazard Analyses...
R. Maqbool Qadir
 
Risk modelling hot topics
Genest Benoit
 
Presentazione tesi
LucaGravina
 
Backtesting Value at Risk and Expected Shortfall with Underlying Fat Tails an...
Stefano Bochicchio
 
Market Risk Modelling after Basel III: New Challenges for Banks and Supervisors
Jean-Paul Laurent
 
VaR Or Expected Shortfall
Alex Kouam
 

Similar to Expected shortfall-back testing (20)

PPTX
Backtesting var
eguno_12
 
PPT
stress testing for market risk finance.ppt
mahtobibha
 
PDF
Chappuis Halder & Co - VaR estimation solutions
Augustin Beyot
 
PDF
CH&CO - VaR methodology whitepaper - 2015
C Louiza
 
PDF
Introduction to VaR
TOSHI STATS Co.,Ltd.
 
PDF
Market Liquidity Risk
Chris Chan
 
PDF
CH&Cie white paper value-at-risk in tuburlent times_VaR
Thibault Le Pomellec
 
PDF
Risk Europe 2002 Retail Bank Va R Pdf Min
Bank Risk Advisors
 
PPTX
Chapter 8-The VaR Approach.pptx
MohsinAli822745
 
PDF
Historical Simulation with Component Weight and Ghosted Scenarios
simonliuxinyi
 
PDF
Value-at-Risk in Turbulence Time
GRATeam
 
PDF
The value at risk
Jibin Lin
 
PDF
2008 implementation of va r in financial institutions
crmbasel
 
PDF
Making the best out of Value at Risk in a Basel III context
Jean-Paul Laurent
 
PPTX
Value at Risk
Clarus Financial Technology
 
PDF
VaR Approximation Methods
Cognizant
 
PPTX
Financial Risk Mgt - Lec 3 by Dr. Syed Muhammad Ali Tirmizi
Dr. Muhammad Ali Tirmizi., Ph.D.
 
PDF
Determinants of the implied equity risk premium in Brazil
FGV Brazil
 
PPTX
Financial Risk Mgt - Lec 2 by Dr. Syed Muhammad Ali Tirmizi
Dr. Muhammad Ali Tirmizi., Ph.D.
 
PPTX
Chapter 4 - Risk Management - 2nd Semester - M.Com - Bangalore University
Swaminath Sam
 
Backtesting var
eguno_12
 
stress testing for market risk finance.ppt
mahtobibha
 
Chappuis Halder & Co - VaR estimation solutions
Augustin Beyot
 
CH&CO - VaR methodology whitepaper - 2015
C Louiza
 
Introduction to VaR
TOSHI STATS Co.,Ltd.
 
Market Liquidity Risk
Chris Chan
 
CH&Cie white paper value-at-risk in tuburlent times_VaR
Thibault Le Pomellec
 
Risk Europe 2002 Retail Bank Va R Pdf Min
Bank Risk Advisors
 
Chapter 8-The VaR Approach.pptx
MohsinAli822745
 
Historical Simulation with Component Weight and Ghosted Scenarios
simonliuxinyi
 
Value-at-Risk in Turbulence Time
GRATeam
 
The value at risk
Jibin Lin
 
2008 implementation of va r in financial institutions
crmbasel
 
Making the best out of Value at Risk in a Basel III context
Jean-Paul Laurent
 
VaR Approximation Methods
Cognizant
 
Financial Risk Mgt - Lec 3 by Dr. Syed Muhammad Ali Tirmizi
Dr. Muhammad Ali Tirmizi., Ph.D.
 
Determinants of the implied equity risk premium in Brazil
FGV Brazil
 
Financial Risk Mgt - Lec 2 by Dr. Syed Muhammad Ali Tirmizi
Dr. Muhammad Ali Tirmizi., Ph.D.
 
Chapter 4 - Risk Management - 2nd Semester - M.Com - Bangalore University
Swaminath Sam
 
Ad

More from Genest Benoit (11)

PDF
climate risk 7 proposals
Genest Benoit
 
PDF
Basel II IRB Risk Weight Functions : Demonstration and Analysis
Genest Benoit
 
PDF
Model Risk Management | How to measure and quantify model risk?
Genest Benoit
 
PDF
EAD Parameter : A stochastic way to model the Credit Conversion Factor
Genest Benoit
 
PDF
Article cem into sa-ccr
Genest Benoit
 
PDF
Comments on Basel Op Risk proposal finally published ...
Genest Benoit
 
PDF
Ch gra wp_cat_bonds_2017
Genest Benoit
 
PDF
Gra wp modelling perspectives
Genest Benoit
 
PDF
Data Science by Chappuis Halder & Co.
Genest Benoit
 
PDF
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
Genest Benoit
 
PDF
CHCie - Booklet GRA.compressed
Genest Benoit
 
climate risk 7 proposals
Genest Benoit
 
Basel II IRB Risk Weight Functions : Demonstration and Analysis
Genest Benoit
 
Model Risk Management | How to measure and quantify model risk?
Genest Benoit
 
EAD Parameter : A stochastic way to model the Credit Conversion Factor
Genest Benoit
 
Article cem into sa-ccr
Genest Benoit
 
Comments on Basel Op Risk proposal finally published ...
Genest Benoit
 
Ch gra wp_cat_bonds_2017
Genest Benoit
 
Gra wp modelling perspectives
Genest Benoit
 
Data Science by Chappuis Halder & Co.
Genest Benoit
 
Booklet_GRA_RISK MODELLING_Second Edition (002).compressed
Genest Benoit
 
CHCie - Booklet GRA.compressed
Genest Benoit
 
Ad

Recently uploaded (20)

PDF
Illuminating the Future: Universal Electrification in South Africa by Matthew...
Matthews Bantsijang
 
PDF
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
PDF
Cryptocurrency Wallet Security Protecting Your Digital Assets.pdf
Kabir Singh
 
PPTX
d and f block elements chapter 4 in class 12
dynamicplays04
 
PPTX
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
PDF
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
PDF
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
PPT
financial system chapter 1 overview of FS
kumlachewTegegn1
 
PPTX
creation economic value Chapter 2 - PPT.pptx
ahmed5156
 
PPTX
Principles of Management buisness sti.pptx
CarToonMaNia5
 
PDF
Tran Quoc Bao named in Fortune - Asia Healthcare Leadership Index 2025
Gorman Bain Capital
 
PPT
The reporting entity and financial statements
Adugna37
 
PDF
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
PPTX
Unit1_Managerial_Economics_SEM 1-PPT.pptx
RISHIRISHI87
 
PPTX
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
PPTX
US inequality along numerous dimensions
Gaetan Lion
 
PPTX
Accounting for fixed ASSETS AND INTANGIBLE ASSETS
Adugna37
 
PDF
PROBABLE ECONOMIC SHOCKWAVES APPROACHING: HOW BAYER'S GLYPHOSATE EXIT IN THE ...
Srivaanchi Nathan
 
PPTX
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 
Illuminating the Future: Universal Electrification in South Africa by Matthew...
Matthews Bantsijang
 
SCB EIC expects CLMV outlook to face diverging risks amid global trade headwinds
SCBEICSCB
 
Cryptocurrency Wallet Security Protecting Your Digital Assets.pdf
Kabir Singh
 
d and f block elements chapter 4 in class 12
dynamicplays04
 
办理加利福尼亚大学圣芭芭拉分校文凭|购买UCSB毕业证录取通知书学位证书
1cz3lou8
 
Why Most People Misunderstand Risk in Personal Finance.
Harsh Mishra
 
Torex to Acquire Prime Mining - July 2025
Adnet Communications
 
financial system chapter 1 overview of FS
kumlachewTegegn1
 
creation economic value Chapter 2 - PPT.pptx
ahmed5156
 
Principles of Management buisness sti.pptx
CarToonMaNia5
 
Tran Quoc Bao named in Fortune - Asia Healthcare Leadership Index 2025
Gorman Bain Capital
 
The reporting entity and financial statements
Adugna37
 
Asia’s Top 10 Hospital CEOs Transforming Healthcare in 2025
Gorman Bain Capital
 
Unit1_Managerial_Economics_SEM 1-PPT.pptx
RISHIRISHI87
 
Session 1 FTP 2023 25th June 25 TRADE FINANCE
NarinderKumarBhasin
 
US inequality along numerous dimensions
Gaetan Lion
 
Accounting for fixed ASSETS AND INTANGIBLE ASSETS
Adugna37
 
PROBABLE ECONOMIC SHOCKWAVES APPROACHING: HOW BAYER'S GLYPHOSATE EXIT IN THE ...
Srivaanchi Nathan
 
PUrposive-commmunicatuon112uospptxyynsns
yunaselle7
 

Expected shortfall-back testing

  • 1. © CHAPPUIS HALDER & CO Back-testing of Expected Shortfall: Main challenges and methodologies By Leonard BRIE with Benoit GENEST and Matthieu ARSAC Global Research & Analytics1 1 This work was supported by the Global Research & Analytics Dept. of Chappuis Halder & Co. Many collaborators from Chappuis Halder & Co. have been involved in the writing and the reflection around this paper; hence we would like to send out special thanks to Claire Poinsignon, Mahdi Kallel, Mikaël Benizri and Julien Desnoyers-Chehade
  • 2. © Global Research & Analytics Dept.| 2018 | All rights reserved 2 Executive Summary In a context of an ever-changing regulatory environment over the last years, Banks have witnessed the draft and publication of several regulatory guidelines and requirements in order to frame and structure their internal Risk Management. Among these guidelines, one has been specifically designed for the risk measurement of market activities. In January 2016, the Basel Committee on Banking Supervision (BCBS) published the Fundamental Review of the Trading Book (FRTB). Amid the multiple evolutions discussed in this paper, the BCBS presents the technical context in which the potential loss estimation has changed from a Value-at-Risk (VaR) computation to an Expected Shortfall (ES) evaluation. The many advantages of an ES measure are not to be demonstrated, however this measure is also known for its major drawback: its difficulty to be back-tested. Therefore, after recalling the context around the VaR and ES models, this white paper will review ES back-testing findings and insights along many methodologies; these have either been drawn from the latest publications or have been developed by the Global Research & Analytics (GRA) team of Chappuis Halder & Co. As a conclusion, it has been observed that the existing methods rely on strong assumptions and that they may lead to inconsistent results. The developed methodologies proposed in this paper also show that even though the ES97.5% metric is close to a VaR99,9% metric, it is not as easily back-tested as a VaR metric; this is mostly due to the non-elicitability of the ES measure. Keywords: Value-at-Risk, Expected Shortfall, Back-testing, Basel III, FRTB, Risk Management EL Classification: C02, C63, G01, G21, G17
  • 3. © Global Research & Analytics Dept.| 2018 | All rights reserved 3 Table of Contents Table of Contents ....................................................................................................................... 3 1. Introduction ........................................................................................................................ 4 2. Context ............................................................................................................................... 4 2.1. Value-at-Risk............................................................................................................... 5 2.1.1. VaR Definition ..................................................................................................... 5 2.1.2. Risk Measure Regulation ..................................................................................... 6 2.1.3. VaR Calculation ................................................................................................... 6 2.1.4. VaR Back-Testing................................................................................................ 7 2.2. Expected Shortfall ....................................................................................................... 9 2.2.1. ES Definition........................................................................................................ 9 2.2.2. ES Regulatory framework.................................................................................. 10 2.2.3. ES Calculation.................................................................................................... 10 2.2.4. VaR vs. ES ......................................................................................................... 11 2.2.5. ES Back-Testing................................................................................................. 12 3. ES Back-Testing............................................................................................................... 14 3.1. Existing Methods....................................................................................................... 14 3.1.1. Wong’s Saddle point technique.......................................................................... 14 3.1.2. Righi and Ceretta................................................................................................ 17 3.1.3. Emmer, Kratz and Tasche.................................................................................. 22 3.1.4. Summary of the three methods........................................................................... 23 3.2. Alternative Methods .................................................................................................. 25 3.2.1. ES Benchmarking............................................................................................... 25 3.2.2. Bootstrap ............................................................................................................ 26 3.2.3. Quantile Approaches.......................................................................................... 27 4. Applications of the ES methodology and back-testing .................................................... 31 4.1. ES simulations........................................................................................................... 31 4.2. Back-test of the ES using our alternative methods.................................................... 34 5. Conclusion........................................................................................................................ 41
  • 4. © Global Research & Analytics Dept.| 2018 | All rights reserved 4 1. Introduction Following recent financial crises and their disastrous impacts on the industry, regulators are proposing tighter monitoring on banks so that they can survive in extreme market conditions. More recently, the Basel Committee on Banking Supervision (BCBS) announced a change in the Market Risk measure used for Capital requirements in its Fundamental Review of the Trading Book (FRTB), moving from the Value-at-Risk (VaR) to the Expected Shortfall (ES). However, if the ES captures risks more efficiently than the VaR, it also has one main downside which is its difficulty to be back-tested. This leads to a situation where banks use the ES to perform Capital calculations and then perform the back-testing on a VaR. The focus for banks’ research is now to try to find ways to back-test using the ES, as it can be expected that regulators will require so in a near-future. This paper aims at presenting the latest developments in the field of ES back-testing methodologies and introducing new methodologies developed by the Global Research & Analytics (GRA) team of Chappuis Halder & Co. First, a presentation of the context in which the back-testing of Expected Shortfall takes place will be provided. This context starts with calculation and back-testing methodologies of the Value-at-Risk, followed by a focus on the ES, analysing its calculation and how it defers from the previous risk measure. The main issues of ES back-testing will then be exposed and discussed. Second, back-testing methodologies for ES will be reviewed in detail, beginning with methodologies that have already been presented in previous years and then with alternative ones introduced by the research department of Chappuis Halder &Co. Third, some of the alternative back-testing methodology will be simulated on a hypothetical portfolio and a comparison of the methodologies will be conducted. 2. Context Recall that in January 2016, the Basel Committee on Banking Supervision (BCBS) issued its final guidelines on the Fundamental Review of the Trading Book (FRTB). The purpose of the FRTB is to cover shortcomings that both regulations and internal risk processes failed to capture during the 2008 crisis. It shows a strategic reversal and the acceptance of regulators for: - a convergence between risk measurement methods; - an integrated assessment of risk types (from a silo risk assessment to a more comprehensive risk identification); - an alignment between prudential and accounting rules. One of the main requirements and evolutions of the FRTB is the switch from a Value-at-Risk (VaR) to an Expected Shortfall risk measurement approach. Hence, Banks now face the paradox of using the ES for the computation of their Market Risk Capital requirements and the Value- at-Risk for the back-testing. This situation is mainly due to the difficulty of finding an ES back- testing methodology that is both mathematically consistent and practically implementable. However, it can be expected that upcoming regulations will require banks to back-test the ES.
  • 5. © Global Research & Analytics Dept.| 2018 | All rights reserved 5 The following sections aim at reminding the existing Market Risk framework for back-testing, as most of the presented notions must be understood for the following chapters of this article. The VaR will therefore be presented at first, given that its calculation and back-testing lay the foundation of this paper. Then, a focus will be made on the ES, by analysing its calculation and the way it defers from the VaR. Finally, the main issues concerning the back-testing of this new measure will be explained. Possible solutions will be the subject of the next chapter. 2.1. Value-at-Risk 2.1.1. VaR Definition The VaR was first devised by Dennis Weatherstone, former CEO of J.P. Morgan, on the aftermath of the 1987 stock market crash. This new measure soon became an industry standard and was eventually added to Basel I Accord in 1996. “The Value-at-Risk (VaR) defines a probabilistic method of measuring the potential loss in portfolio value over a given time period and for a given distribution of historical returns. The VaR is expressed in dollars or percentage losses of a portfolio (asset) value that will be equalled or exceeded only X percent of the time. In other words, there is an X percent probability that the loss in portfolio value will be equal to or greater than the VaR measure. For instance, assume a risk manager performing the daily 5% VaR as $10,000. The VaR (5%) of $10,000 indicates that there is a 5% of chance that on any given day, the portfolio will experience a loss of $10,000 or more.”1 Figure 1 - Probability distribution of a Value-at-Risk with 95% Confidence Level and 1day Time Horizon (Parametric VaR expressed as with a Normal Law N(0,1)) Estimating the VaR requires the following parameters: 1 Financial Risk Management book 1, Foundations of Risk Management; Quantitative Analysis, page 23
  • 6. © Global Research & Analytics Dept.| 2018 | All rights reserved 6 • The distribution of P&L – can be obtained either from a parametric assumption or from non-parametric methodologies using historical values or Monte Carlo simulations; • The Confidence Level – the probability that the loss will not be equal or superior to the VaR; • The Time Horizon – the given time period on which the probability is true. One can note that the VaR can either be expressed in value ($, £, €, etc.) or in return (%) of an asset value. The regulator demands a time horizon of 10 days for the VaR. However, this 10 days VaR is estimated from a 1-day result, since a N days VaR is usually assumed equal to the square root of N multiplied by the 1-day VaR, under the commonly used assumption of independent and identically distributed P&L returns. ܸܴܽ∝ ,ேௗ௔௬௦ = √ܰ × ܸܴܽ∝,ଵௗ௔௬ 2.1.2. Risk Measure Regulation From a regulatory point of view, Basel III Accords require not only the use of the traditional VaR, but also of 3 other additional measures: • Stressed VaR calculation; • A new Incremental Risk Charge (IRC) which aims to cover the Credit Migration Risk (i.e. the loss that could come from an external / internal ratings downgrade or upgrade); • A Comprehensive Risk Measure for credit correlation (CRM) which estimates the price risk of covered credit correlation positions within the trading book. The Basel Committee has fixed parameters for each of these risk measures, which are presented in the following table: VaR Stressed VaR IRC CRM Confidence Level 99% 99% 99.9% 99.9% Time Horizon 10 days 10 days 1 year 1 year Frequency of calculation Daily Weekly - - Historical Data 1 previous year 1 stressed year - - Back-Test Yes No - - 2.1.3. VaR Calculation VaR calculation is based on the estimation of the P&L distribution. Three methods are used by financial institutions for VaR calculation: one parametric (Variance-Covariance), and two not parametric (Historical and Monte-Carlo). 1. Variance-Covariance: this parametric approach consists in assuming the normality of the returns. Correlations between risk factors are constant and the delta (or price sensitivity to changes in a risk factor) of each portfolio constituent is constant. Using the correlation method, the Standard Deviation (volatility) of each risk factor is
  • 7. © Global Research & Analytics Dept.| 2018 | All rights reserved 7 extracted from the historical observation period. The potential effect of each component of the portfolio on the overall portfolio value is then worked out from the component’s delta (with respect to a particular risk factor) and that risk factor’s volatility. 2. Historical VaR: this method is the most frequently used method in banks. It consists in applying historical shocks on risk factors to yield a P&L distribution for each scenario and then compute the percentile. 3. Monte-Carlo VaR: this approach consists in assessing the P&L distribution based on a large number of simulations of risk factors. The risks factors are calibrated using historical data. Each simulation will be different but in total the simulations will aggregate to the chosen statistical parameters. For more details about these three methods, one can refer to the Chappuis Halder & Co.’s white paper1 on the Value-at-Risk. Other methods such as “Exponentially Weighted Moving Average” (EWMA), “Autoregressive Conditional Heteroskedasticity” (ARCH) or the “declined (G)ARCH (1,1) model” exist but are not addressed in this paper. 2.1.4. VaR Back-Testing As mentioned earlier, financial institutions are required to use specific risk measures for Capital requirements. However, they must also ensure that the models used to calculate these risk measures are accurate. These tests, also called back-testing, are therefore as important as the value of the risk measure itself. From a regulatory point of view, the back-testing of the risk measure used for Capital requirements is an obligation for banks. However, in the case of the ES for which no sound back-testing methods have yet been found, regulators had to find a temporary solution. All this lead to the paradoxical situation where the ES is used for Capital requirements calculations whereas the back-testing is still being performed on the VaR. In its Fundamental Review of the Trading Book (FRTB), the Basel Committee includes the results of VaR back-testing in the Capital calculations as a multiplier. Financial institutions are required to back-test their VaR at least once a year, and on a period of 1 year. The VaR back-testing methodologies used by banks mostly fall into 3 categories of tests: coverage tests (required by regulators), distribution tests, and independence tests (optional). Coverage tests: these tests assess if the number of exceedances during the tested year is consistent with the quantile of loss the VaR is supposed to reflect. Before going into details, it seems important to explain how this number of exceedances is computed. In fact, each day of the tested year, the return of the day [t] is compared with the calculated VaR of the previous day [t-1]. It is considered an exceedance if the t return is a loss 1 Value-at-Risk: Estimation methodology and best practices.
  • 8. © Global Research & Analytics Dept.| 2018 | All rights reserved 8 greater than the t-1 VaR. At the end of the year, the total number of exceedances during the year can be obtained by summing up all exceedances occurrences. The main coverage tests are Kupiec’s “proportion of failures” (PoF)1 and The Basel Committee’s Traffic Light coverage test. Only the latter will be detailed here. The Traffic Light coverage test dates back to 1996 when the Basel Committee first introduced it. It defines “light zones” (green, yellow and red) depending on the number of exceedances observed for a certain VaR level of confidence. The colour of the zone determines the amount of additional capital charges needed (from green to red being the most punitive). Zone Exceptions (out of 250) Cumulative probability Green 0 8.11% 1 28.58% 2 54.32% 3 75.81% 4 89.22% Yellow 5 95.88% 6 98.63% 7 99.60% 8 99.89% 9 99.97% Red 10 99.99% Table 1 - Traffic Light coverage test (Basel Committee, 1996), with a coverage of 99% Ex: let’s say a bank chooses to back-test its 99% VaR using the last 252 days of data. It observes 6 exceedances during the year. The VaR measures therefore falls into the “yellow zone”. The back-test is not rejected but the bank needs to add a certain amount of capital. Distribution tests: these tests (Kolmogorov-Smirnov test, Kuiper’s test, Shapiro-Wilk test, etc.) look for the consistency of VaR measures through the entire loss distribution. It assesses the quality of the P&L distribution that the VaR measure characterizes. Ex: instead of only applying a simple coverage test on a 99% quantile of loss, we apply the same coverage test on different quantiles of loss (98%, 95%, 90%, 80%, etc.) Independence tests: these tests assess some form of independence in a Value-at-Risk measure’s performance from one period to the next. A failed independence test will raise doubts on a coverage or distribution back-test results obtained for that VaR measure. 1 Kupiec (1995) introduced a variation on the binomial test called the proportion of failures (PoF) test. The PoF test works with the binomial distribution approach. In addition, it uses a likelihood ratio to test whether the probability of exceptions is synchronized with the probability “p” implied by the VaR confidence level. If the data suggests that the probability of exceptions is different than p, the VaR model is rejected.
  • 9. © Global Research & Analytics Dept.| 2018 | All rights reserved 9 To conclude, in this section were presented the different methodologies used for VaR calculation and back-testing. However, this risk measure has been widely criticized during the past years. Among the different arguments, one can notice its inability to predict or cover the losses during a stressed period, the 2008 crisis unfortunately revealing this lack of efficiency. Also, its incapacity to predict the tail loss (i.e. extreme and rare losses) makes it difficult for banks to predict the severity of the loss encountered. The BCBS therefore decided to retire the well-established measure and replace it by the Expected Shortfall. The following section will aim at describing this new measure and explain how it defers from the VaR. 2.2. Expected Shortfall The Expected Shortfall (ES), aka Conditional VaR (CVaR), was first introduced in 2001 as a more coherent method than the VaR. The following years saw many debates comparing the VaR and the ES but it’s not until 2013 that the BCBS decided to shift and adopt ES as the new risk measure. In this section are presented the different methodologies of ES calibration and the main differences between the ES and the VaR. Finally, an introduction of the main issues concerning the ES back-testing will be made, which will be the focus of the following chapter. 2.2.1. ES Definition FRTB defines the ES as the “expected value of those losses beyond a given confidence level”, over certain time horizon. In other words, the t-ES gives the average loss that can be expected in t-days when the returns are above the t-VaR. For example, let’s assume a Risk Manager uses the historical VaR and ES. The observed 97.5% VaR is $1,000 and there were 3 exceedances ($1,200; $1,100; $1,600). The calibrated ES is therefore $1,300. Figure 2 – Expected shortfall (97.5%) illustration VaR97,5 ES97,5
  • 10. © Global Research & Analytics Dept.| 2018 | All rights reserved 10 2.2.2. ES Regulatory framework The Basel 3 accords introduced the ES as the new measure of risk for capital requirement. As for the VaR, the parameters for ES calculation are fixed by the regulators. The following table highlights the regulatory requirements for the ES compared with those of the VaR. VaR Expected Shortfall Confidence Level 99% 97.5% Time Horizon 10 days 10 days Frequency of calculation Daily Daily Historical Data 1 previous year 1 stressed year Back-Test Yes Not for the moment One can notice that the confidence level is lower for the ES than for the VaR. This difference is due to the fact that the ES is systematically greater than the VaR and keeping a 99% confidence level would have been overly conservative, leading to a much larger capital reserve for banks. 2.2.3. ES Calculation The calibration of ES is based on the same methodologies as the VaR’s. It mainly consists in estimating the right P&L distribution, which can be done using one of the 3 following methods: variance-covariance, historical and Monte-Carlo simulations. These methodologies are described in part 2.1.2. Once the P&L distribution is known, the Expected Shortfall is calculated as the mean of returns exceeding the VaR. ‫ܵܧ‬∝,௧ሺܺሻ = − 1 1−∝ න ܲ௧ ିଵ ሺ‫ݑ‬ሻ݀‫ݑ‬ ଵ ∝ Where : - X is the P&L distribution; - t is the time point; - ∝ is the confidence level; - ܲ௧ ିଵ ሺ∝ሻ is the inverse of the VaR function of X at a time t and for a given ∝ confidence level. One must note that the ES is calibrated on a stressed period as it is actually a stressed ES in the FRTB. The chosen period corresponds to the worst 250 days for the bank’s current portfolio in recent memory.
  • 11. © Global Research & Analytics Dept.| 2018 | All rights reserved 11 2.2.4. VaR vs. ES This section aims at showing the main differences (advantages and drawbacks) between the VaR and the ES. The following list is not exhaustive and will be summarized in Table 2: • Amount: given a confidence level X%, the VaR X% is always inferior to the ES X%, due to the definition of ES as the mean of losses beyond the VaR. This is, in fact, the reason why the regulatory confidence level changed from 99% (VaR) to 97.5% (ES), as banks couldn’t have coped with such a high amount of capital otherwise. • Tail loss information: as mentioned earlier, one of the main drawbacks of the VaR is its inability to predict tail losses. Indeed, the VaR predicts the probability of an event but does not consider its severity. For example, a 99% VaR of 1 million predicts that during the following 100 days, 1 loss will exceed 100k, but it doesn’t make any difference between a loss of 1.1 million or 1 billion. The ES on the other hand is more reliable as it does give information on the average amount of the loss than can be expected. • Consistency: the ES can be shown as a coherent risk measure contrary to the VaR. In fact, the VaR lacks a mathematical property called sub-additivity, meaning the sum of risk measures (RM) of 2 separate portfolios A and B should be equal or greater than the risk measure of the merger of these 2 portfolios. ܴ‫ܯ‬஺ + ܴ‫ܯ‬஻ ≥ ܴ‫ܯ‬஺ା஻ However, in the case of the VaR, one can notice that it does not always satisfy this property which means that in some cases, it does not reflect the risk reduction from diversification effects. Nonetheless, apart from theoretical cases, the lack of sub- additivity of the VaR rarely seems to have practical consequences. • Stability: the ES appears to be less stable than the VaR when it comes to the distribution. For fat-tailed distributions for example, the errors in estimating an ES are much greater than those of a VaR. Reducing the estimation error is possible but requires increasing the sample size of the simulation. For the same error, an ES is costlier than the VaR under a fat-tailed distribution. • Cost / Time consumption: ES calibration seems to require more time and data storage than the VaR’s. First, processing the ES systematically requires more work than processing the VaR (VaR calibration being a requirement for ES calculation). Second, the calibration of the ES requires more scenarios than for the VaR, which means either more data storage or more simulations, both of which are costly and time consuming. Third, most banks don’t want to lose their VaR framework, having spent vast amount of time and money on its development and implementation. These banks are likely to calculate the ES as a mean of several VaR, which will heavily weigh down calibration time. • Facility to back-test: one of the major issue for ES is its difficulty to be back-tested. Research and financial institutions have been exploring this subject for some years now but still struggle to find a solution that is both mathematically consistent and practically implementable. This difficulty is mainly due to the fact that ES is characterised as “model dependant” (contrary to the VaR which is not). This point is to be explained in the following section.
  • 12. © Global Research & Analytics Dept.| 2018 | All rights reserved 12 • Elicitability: The elicitability corresponds to the definition of a statistical measure that allows to compare simulated estimates with observed data. The main purpose of this measure is to assess the relevance and accuracy of the model used for simulation. To achieve this, one will introduce a scoring function S(x, y) which is used to evaluate the performance of x (forecasts) given some values on y (observations). Examples of scoring functions are squared errors where S(x, y) = (x−y)² and absolute errors where S(x, y) = |x − y|. Given this definition and due to the nature of the Expected Shortfall, one will understand that the ES is not elicitable since there is no concrete observed data to be compared to the forecasts. VaR ES Amount - Originally greater than the VaR, but change of regulatory confidence level from 99% to 97.5% Tail loss information Does not give information on the severity of the loss Gives the average amount of the loss that can be expected Consistency Lack of sub-additivity: VaR1+2 > VaR1 + VaR2 Consistent Stability Relatively stable Less stable: the estimation error can be high for some distribution Cost / Time consumption - Always greater than the VaR's Facility to back-test Easy to back-test Difficult to back-test due mainly due to the fact that the back- testing of ES is model dependant Elicitability Is elicitable Isn’t elicitable Table 2 - VaR versus ES: main advantages and drawbacks 2.2.5. ES Back-Testing As mentioned above, the main issue with the ES is its difficulty to be back-tested. Although research and institutions have already been searching for solutions to this issue for more than 10 years now, no solution seems to satisfy both mathematical properties and practical requirements. Moreover, following the FRTB evolution and its change from VaR to ES for Capital requirements, it has become a priority to be consistent in the use of risk measure (i.e. using the same measure for both Capital calculations and back-testing). One can wonder why the ES is so difficult to back-test. The main issue is due to the fact that ES back-testing is characterized as model dependent, unlike the VaR which is model independent. Both notions will be described in the following paragraphs. Let’s consider the traditional VaR back-testing and why it is not applicable to the ES. When back-testing VaR, one would look each day at the return “t” to see if it exceeded the VaR[t-1].
  • 13. © Global Research & Analytics Dept.| 2018 | All rights reserved 13 The number of exceedances, corresponding to the sum of exceedance occurrences, would then be compared to the quantile the VaR is supposed to reflect. If one considers that the P&L distribution is likely to change over time, the VaR levels, to which the returns are compared with, also change over time. One would therefore look at the number of exceedances over a value that possibly changes every day. To illustrate this point, the exact same return could be considered as an exceedance one day, and not on another day. When back-testing the VaR, although the reference value (i.e. the VaR) changes over time, calculating the total number of exceedances still makes sense as one can find convergence of the results. This mathematical property characterizes the VaR back-testing as model independent: results are still consistent when the P&L distribution changes during the year. In the case of the ES however, one would not only look at the number of exceedances but also their values. This additional information complicates the task as there is no convergence when looking at the mean of the exceedances. To make sense, the P&L distribution (or more exactly the VaR) should remain constant during the time horizon. The back-testing of ES is therefore characterized as model dependent. The characterization of ES back-testing as model dependent is one of the main issue that financial institutions experience. Unlike the VaR, they cannot only compare the theoretical ES with the observed value at the end of the year since in most cases the later value does not make sense. This constraint, combined with limitations in both data storage and time-implementation, makes it difficult for financial institutions and researchers to find new ways to back-test the ES. The following section aims at presenting the main results and findings of the last 10 years of research and presents alternative solutions introduced by the Global Research & Analytics team of Chappuis Halder & Co.
  • 14. © Global Research & Analytics Dept.| 2018 | All rights reserved 14 3. ES Back-Testing As mentioned earlier, the purpose of this chapter is to present the latest developments in terms of ES back-testing methodologies and to introduce new methodologies developed by the Global Research & Analytics (GRA) team of Chappuis Halder & Co. 3.1. Existing Methods 3.1.1. Wong’s Saddle point technique Wong (2008) proposed a parametric method for the back-testing of the ES. The purpose of the methodology is to find the probability density function of the Expected Shortfall, defined as a mean of returns exceeding the VaR. Once such distribution is found, one can find the confidence level using the Lugannani and Rice formulas, which provide the probability to find a theoretical ES inferior to the sample (i.e. observed) ES. The results of the back-test depend on this “p-value”: given a confidence level of 95%, the p-value must be at least superior to 5% to accept the test. The method relies on 2 major steps: 1. Inversion formula: find the PDF knowing the moment-generating function 2. Saddle point Technique to approximate the integral Figure 3 - Overview of Wong's Approach The ideas of the parametric method proposed by Wong (2008) are as follows. Let ܴ ൌ ሼܴଵ, ܴଶ, ܴଷ … ሽ be the portfolio returns which has predetermined CDF and PDF denoted by φ and ݂ respectively. We denote by q ൌ φିଵ ሺߙሻ the theoretical α-quantile of the returns. The statistic used to determine the observed expected shortfall is the following: ‫ܵܧ‬ே ఈ = −ܺത = − ෍ ܴ௧‫ܫ‬ሼோ೟ழ୯ሽ ୒ ௧ୀଵ ෍ ‫ܫ‬ሼோ೟ழ୯ሽ ୒ ௧ୀଵ Where ‫ܫ‬ ሼ୶ழ୯ሽ is the logical test whether the value x is less than the ܸܴܽఈ ൌ ‫ݍ‬ The purpose of this method is to analytically estimate the density of this statistic and then see where is positioned the observed value with respect to this density.
  • 15. © Global Research & Analytics Dept.| 2018 | All rights reserved 15 Are denoted by ݊, the realised quantity ෍ ‫ܫ‬ሼோ೟ழ୯ሽ ୒ ௧ୀଵ which is the number of exceedances observed in our sample, and ܺ‫ݐ‬ the realised return exceedances below the ߙ-quantile q. The observed expected shortfall is then: ‫ܵܧ‬ே ఈ෪ ൌ −‫̅ݔ‬ ൌ − ∑ ܺ௧ ୬ ௧ୀଵ ݊ Reminder of the moment-generating function: The moment-generating function (MGF) provides an alternative way for describing a random variable, which completely determines the behaviour and properties of the probability distribution of the random variable X: ‫ܯ‬௑ሺ‫ݐ‬ሻ ൌ ॱሾ݁௧௑ሿ The inversion formula that allows to find the density once we have the MGF is the following: ݂௑ሺ‫ݐ‬ሻ ൌ 1 2ߨ න ݁ି௜௨௧ ‫ܯ‬௑ሺ݅‫ݑ‬ሻ݀‫ݑ‬ ஶ ିஶ (1.1) One of the known features of the moment-generating function is the following: ‫ܯ‬௑ሺߙܺ + ߚܻሻ ൌ ‫ܯ‬௑ሺߙ‫ݐ‬ሻ ∙ ‫ܯ‬௒ሺߚ‫ݐ‬ሻ (1.2) PROPOSITION 1: Let ܺ be a continuous random variable with a density ߙିଵ ݂ሺ‫ݔ‬ሻ, ‫ݔ‬ ∊ ሺ−∞, ‫ݍ‬ሻ. The moment generating function of ܺ is then given by: ‫ܯ‬௑ሺ‫ݐ‬ሻ = ߙିଵ ݁‫݌ݔ‬ሺ‫ݐ‬ଶ 2⁄ ሻ݂ሺ‫ݍ‬ − ‫ݐ‬ሻ (1.3) and its derivatives with respect to ‫ݐ‬ are given by: ‫ܯ‬௑ ᇱ ሺ‫ݐ‬ሻ = ‫ݐ‬ ∙ ‫ܯ‬௑ሺ‫ݐ‬ሻ − ߙିଵ ∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ (1.4) ‫ܯ‬௑ ᇱᇱሺ‫ݐ‬ሻ = ‫ݐ‬ ∙ ‫ܯ‬௑ ᇱ ሺ‫ݐ‬ሻ + ‫ܯ‬௑ሺ‫ݐ‬ሻ − ‫ߙ ݍ‬ିଵ ∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ (1.5) ‫ܯ‬௑ ሺ௠ሻ ሺ‫ݐ‬ሻ ൌ ‫ݐ‬ ∙ ‫ܯ‬௑ ሺ௠ିଵሻ ሺ‫ݐ‬ሻ + ሺ݉ − 1ሻ‫ܯ‬௑ ሺ௠ିଵሻ ሺ‫ݐ‬ሻ − ‫ݍ‬௠ିଵ ߙିଵ ∙ ݁‫݌ݔ‬ሺ‫ݐݍ‬ሻ ∙ ݂ሺ‫ݍ‬ሻ where ݉ ≥3 (1.6) Using these, we can also show that the mean and variance of ܺ can be obtained easily: ߤ௑ = ॱሾܺሿ = − ݂ሺ‫ݍ‬ሻ ߙ ߪ௑ ଶ = ‫ݎܽݒ‬ሾܺሿ = 1 − ‫݂ݍ‬ሺ‫ݍ‬ሻ ߙ − ߤ௑ ଶ The Lugannani and Rice formula Lugannani and Rice (1980) provide a method which is used to determine the cumulative density function of the statistic ܺത (1.1).
  • 16. © Global Research & Analytics Dept.| 2018 | All rights reserved 16 It is supposed that the moment-generating function of the variable ܺ௧ = ܴ௧‫ܫ‬ሼோ೟ழ୯ሽ is known. Using the property (1.2), one can compute ‫ܯ‬ଡ଼ഥሺ‫ݐ‬ሻ = ൬‫ܯ‬௑ ቀ ௧ ௡ ቁ൰ ௡ and via the inversion formula will obtain: ݂ଡ଼ഥሺ‫ݔ‬ሻ = 1 2ߨ න ݁ି௜௧௫ ቆ‫ܯ‬௑ ൬݅ ‫ݐ‬ ݊ ൰ቇ ௡ ݀‫ݐ‬ ஶ ିஶ = ݊ 2ߨ න ݁௡ሺ௄ሾ௜௧ሿି௜௧௫ሻ ݀‫ݐ‬ ஶ ିஶ where ݂ଡ଼ഥሺ‫ݔ‬ሻ denotes the PDF of the sample mean and ‫ܭ‬ሾ‫ݐ‬ሿ = ݈݊ ‫ܯ‬௑ሾ‫ݐ‬ሿ is the cumulative- generating function of ݂௑ሺ‫ݔ‬ሻ. Then the tail probability can be written as: ܲሺܺത > ‫̅ݔ‬ሻ = න ݂ଡ଼ഥሺ‫ݐ‬ሻ݀‫ݐ‬ = ௤ ௫̅ 1 2ߨ ݅ න ݁௡ሺ௄ሾ௧ሿି௧௫̅ሻ Ωା௜ஶ Ωି௜ஶ ݀‫ݐ‬ ‫ݐ‬ where Ω is a saddle-point1 satisfying: (1.9) The saddle-point is obtained by solving the following expression deduced from (1.4) and (1.5): (1.10) Finally, Lugannani and Rice propose to approximate this integral as follows: PROPOSITION 2: Let Ω be a saddle-point satisfying the equation (1.9) and define: ߳ = Ω ඥ݊‫ܭ‬ᇱᇱሺΩሻ ߜ = ‫݊݃ݏ‬ሺΩሻට2݊ ቀΩ‫ܵܧ‬ே ఈ෪ − ‫ܭ‬ሺΩሻቁ where ‫݊݃ݏ‬ሺΩሻ equals to zero when Ω = 0, or takes the value of 1/ሺ−1) when Ω < 0/ሺΩ > 0). Then the tail probability of ‫̅ݔ‬ less than or equal to the sample mean ‫̅ݔ‬ is given by ܲሺܺത ≤ ‫̅ݔ‬ሻ = ‫ە‬ ۖۖ ‫۔‬ ۖۖ ‫ۓ‬ φሺߜሻ − ݂ሺߜሻ ∙ ቆ 1 ߳ − 1 ߜ + ܱ൫݊ିଷ ଶ⁄ ൯ቇ ݂‫ݎ݋‬ ‫̅ݔ‬ < ‫ݍ‬ ܽ݊݀ ‫̅ݔ‬ ≠ ߤ௑ 1 ݂‫ݎ݋‬ ‫ݔ‬ഥ > ‫ݍ‬ − 1 2 + ‫ܭ‬ሺଷሻ ሺ0ሻ 6ට2ߨ݊൫‫ܭ‬ᇱᇱሺΩሻ൯ ଷ + ܱ൫݊ିଷ ଶ⁄ ൯ ݂‫ݎ݋‬ ‫̅ݔ‬ = ߤ௑ 1 In mathematics, a saddle point or minimax point is a point on the surface of the graph of a function where the slopes (derivatives) of orthogonal function components defining the surface become zero (a stationary point) but are not a local extremum on both axes. ‫ܭ‬ᇱሺΩሻ = ‫̅ݔ‬ ‫ܭ‬ᇱሺΩሻ = ‫ܯ‬ᇱሺΩሻ ‫ܯ‬ሺΩሻ = Ω − expሺ‫ݍ‬Ω − Ωଶ 2⁄ ሻ ݂ሺ‫ݍ‬ሻ φሺq − Ωሻ = ‫̅ݔ‬
  • 17. © Global Research & Analytics Dept.| 2018 | All rights reserved 17 Once the tail probability is obtained, one can compute the observed expected shortfall ‫ܵܧ‬ே ఈ෪ and carry out a one-tailed back-test to check whether this value is too large. The null and alternative hypotheses can be written as: H0: ‫ܵܧ‬ே ఈ෪ = ‫ܵܧ‬ே ఈതതതതത versus H1: ‫ܵܧ‬ே ఈ෪ > ‫ܵܧ‬ே ఈതതതതത where ‫ܵܧ‬ே ఈതതതതത denotes the theoretical expected shortfall under the null hypothesis. The p-value of this hypothesis test is simply given by the Lugannani and Rice formula as Example: For a portfolio composed of one S&P500 stock, it is assumed that the bank has predicted that the daily P&L log-returns are i.i.d and follow a normal distribution calibrated on the observations of the year 2014. Then all the observations of the year 2015 are normalised so one can consider that the sample follows a standard normal distribution ࣨሺ0,1ሻ. Using Wong’s method described above, the steps to follow in order to back-test the ‫ܵܧ‬ under these assumptions and with ߙ = 2.5% are: 1. Calculate the theoretical ߙ-quantile: ‫ݍ‬ = −φିଵ ሺ2.5%ሻ = 1.96 2. Calculate the observed ES of the normalized log-returns of 2015: ܺത = −2.84, ݊ = 19 3. Solve the equation (1.10) to find the saddle-point: Ω = −3.23 4. Calculate ‫ܭ‬ሾΩሿ and ‫ܭ‬ᇱᇱሾΩሿ where ‫ܭ‬ᇱᇱሾtሿ = ݀ ݀‫ݐ‬ ‫ܯ‬ᇱሺ‫ݐ‬ሻ ‫ܯ‬ሺ‫ݐ‬ሻ = ‫ܯ‬ᇱᇱሺ‫ݐ‬ሻ‫ܯ‬ሺ‫ݐ‬ሻ − ‫ܯ‬ᇱሺ‫ݐ‬ሻଶ ‫ܯ‬ሺ‫ݐ‬ሻଶ In our case, we found: ‫ܭ‬ሾΩሿ = 8.80 and ‫ܭ‬ᇱᇱሾΩሿ = 2.49 5. Calculate the tail probability of ‫ܵܧ‬ே ఈ෪ and compare to the level of confidence tolerated by the ‫݌‬௩௔௟௨௘test: ܲ൫‫ܵܧ‬ே ఈ ≤ ‫ܵܧ‬ே ఈ෪ ൯~0 In this example, the null hypothesis is rejected. Not only does it show that the movements of 2015 cannot be explained by the movements of 2014, but it also shows that the hypothesis of a normal distribution of the log-returns is not likely to be true. 3.1.2. Righi and Ceretta The method of Righi and Ceretta is less restrictive than Wong’s one in the sense that the law of the returns may vary from one day to another. However, it requires the knowledge of the truncated distribution below the negative VaR level. ‫݌‬௩௔௟௨௘ = ܲሺܺത ≤ ‫̅ݔ‬ሻ
  • 18. © Global Research & Analytics Dept.| 2018 | All rights reserved 18 Figure 4 - Righi and Ceretta - Calculating the Observed statistic test Figure 5 - Righi and Ceretta - Simulating the statistic test Figure 6 - Righi and Ceretta – Overview
  • 19. © Global Research & Analytics Dept.| 2018 | All rights reserved 19 In their article, they consider that the portfolio log-returns follow generalized autoregressive conditional heteroscedastic ൫‫ܪܥܴܣܩ‬ሺ‫,݌‬ ‫ݍ‬ሻ൯ model, which are largely applied in finance: ‫ݎ‬௧ = ߤ௧ + ߝ௧, ߝ௧ = ߪ௧‫ݖ‬௧ ߪ௧ ଶ = ω + ෍ ߩ௜ߝ௧ି௜ ଶ ௣ + ෍ ߚ௝ߪ௧ି௝ ଶ ௤ where ‫ݎ‬௧ is the log-return, ߤ௧ is the conditional mean, ߪ௧ is the conditional variance and ߝ௧ is the shock over the expected value of an asset in period t; ߱, ߩ and ߚ are parameters; ‫ݖ‬௧ represents the white noise series which can assume many probability distribution and density functions denoted respectively ‫ܨ‬௧ and ݂௧. The interest behind using this model is mainly the fact that we can easily predict the truncated distribution properties, mainly the ߙ − quantile/ES: ܳఈ,௧ = ߤ௧ + ߪ௧‫ܨ‬ିଵሺߙሻ ‫ܵܧ‬௧ = ߤ௧ + ߪ௧ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻሿ (2.1) But one can also calculate the dispersion of the truncated distribution as follows: ܵ‫ܦ‬௧ = ටܸܽ‫ݎ‬൫ߤ௧ + ߪ௧‫ݖ‬௧ห‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻ൯ = ߪ௧ටܸܽ‫ݎ‬൫‫ݖ‬௧ห‫ݖ‬௧ < ‫ܨ‬ିଵሺߙሻ൯ (2.2) The ES and SD are mainly calculated via Monte-Carlo simulations. In some cases, it is possible to have their parametric formulas: 1. Case where ࢠ࢚ is normal | It is assumed that ‫ݖ‬௧ is a standard Gaussian noise ࣨሺ0,1ሻ, which is a very common case. The expectation of the truncated normal distribution is then: ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ = ݂ሺܳሻ ‫ܨ‬ሺܳሻ substituting this expression in the equation of (2.1), it is obtained: ‫ܵܧ‬௧ = ߤ௧ + ߪ௧ ݂ ቀ‫ܨ‬ି૚ሺߙሻቁ ߙ The variance of a truncated normal distribution below a value ܳ is given by: ܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ = 1 − ܳ ݂ሺܳሻ ‫ܨ‬ሺܳሻ − ൬ ݂ሺܳሻ ‫ܨ‬ሺܳሻ ൰ ଶ Substituting this expression in the variance term of the formula (2.2), it is deduced: ܵ‫ܦ‬௧ = ߪ௧ ∙ ൥1 − ‫ܨ‬ିଵሺߙሻ ݂ሺ‫ܨ‬ିଵሺߙሻሻ ߙ − ቆ ݂ሺ‫ܨ‬ିଵሺߙሻሻ ߙ ቇ ଶ ൩ ଵ ଶ 2. Case where ࢠ࢚ follows a Student’s distribution |
  • 20. © Global Research & Analytics Dept.| 2018 | All rights reserved 20 It is assumed that ‫ݖ‬௧ is a Student’s ‫ݐ‬ distributed random variable with ‫ݒ‬ degrees of freedom. One can show that the truncated expectation is as follow: ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ = 1 2√‫ݒ‬ ‫ܨ‬ሺܳሻߚ ቀ ‫ݒ‬ 2 , 1 2 ቁ ൭ܳଶ ‫ܪܩ‬ ቆ 1 + ‫ݒ‬ 2 , 1; 2; − ܳଶ 2 ቇ൱ substituting this expression in the expectation of (1), it is obtained: ‫ܵܧ‬௧ = ߤ௧ + ߪ௧ ൮ 1 2√‫ߙݒ‬ ∙ ߚ ቀ ‫ݒ‬ 2 , 1 2ቁ ൭‫ܨ‬ିଵሺߙሻଶ ‫ܪܩ‬ ቆ 1 + ‫ݒ‬ 2 , 1; 2; − ‫ܨ‬ିଵሺߙሻଶ 2 ቇ൱൲ where ߚሺ∙,∙ሻ and ‫ܪܩ‬ሺ∙ , ∙ ; ∙ ; ∙ሻ are the Beta and Gauss hyper geometric functions conform to: ߚሺܽ, ܾሻ = න ‫ݑ‬௔ିଵሺ1 − ‫ݑ‬ሻ௕ିଵ ଵ ଴ ݀‫ݑ‬ ‫ܪܩ‬ሺܽ, ܾ; ܿ; ‫ݖ‬ሻ = ෍ ሺܽሻ௞ሺܾሻ௞ ሺܿሻ௞ ‫ݖ‬௞ ݇! ஶ ௞ୀ଴ Where ሺ∙ሻ௞ denotes the ascending factorial. Similarly, for the standard normal SD, it is deduced from the variance of a truncated Student’s t distribution: ܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ = 1 3√‫ܨݒ‬ሺܳሻߚ ቀ ‫ݒ‬ 2 , 1 2 ቁ Qଷ ‫ܪܩ‬ ቆ 1 + ‫ݒ‬ 2 , 3 2 ; 5 2 ; − ܳଶ 2 ቇ Again, substituting this variance term in (2.2), one will obtain an analytical form of the standard deviation of the truncated distribution: ܵ‫ܦ‬௧ = ߪ௧ ∙ ቎ 1 3√‫ߙݒ‬ ∙ ߚ ቀ ‫ݒ‬ 2 , 1 2 ቁ ‫ܨ‬ିଵሺߙሻଷ ‫ܪܩ‬ ቆ 1 + ‫ݒ‬ 2 , 3 2 ; 5 2 ; − ‫ܨ‬ିଵሺߙሻଶ 2 ቇ቏ ଵ ଶ Once the ED and SD are expressed and computed, for each day in the forecast period for which a violation in the predicted Value at Risk occurs, the following test statistic is defined: ‫ܶܤ‬௧ = ‫ݎ‬௧ − ‫ܵܧ‬௧ ܵ‫ܦ‬௧ ‫ܶܪ‬ = ‫ݖ‬௧ − ॱሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ ඥܸ‫ܴܣ‬ሾ‫ݖ‬௧|‫ݖ‬௧ < ܳሿ (2.3) (2.4) Where ‫ݖ‬௧ is the realisation of the random variable ܼ௧ (in the Garch process, it is supposed that ܼ௧ is iid but it is not necessarily the case).
  • 21. © Global Research & Analytics Dept.| 2018 | All rights reserved 21 The idea of Righi and Ceretta is to see where the value of ‫ܶܤ‬௧ is situated with respect to the “error” distribution of the estimator ‫ܶܪ‬ = ௓೟ିॱሾ௓೟|௓೟ழொሿ ඥ௏஺ோሾ௓೟|௓೟ழொሿ by calculating the probability ℙሺ‫ܶܪ‬ < ‫ܶܤ‬௧ሻ and then take the median (or eventually the average) of these probabilities over the time as a p-value over a certain confidence level ‫.݌‬ They propose to calculate this distribution using Monte-Carlo simulations following this algorithm: 1) Generate ܰ times a sample of ݊ − ݅݅݀ random variable ‫ݑ‬௜௝ under the distribution ‫,ܨ‬ ݅ = 1, … , ݊; ݆ = 1, … , ܰ; 2) Estimating for each sample the quantity ॱൣ‫ݑ‬௜௝|‫ݑ‬௜௝ < ‫ݍ‬൫‫ݑ‬௜௝൯൧ and ܸ‫ܴܣ‬ൣ‫ݑ‬௜௝|‫ݑ‬௜௝ < ‫ݍ‬൫‫ݑ‬௜௝൯൧ where ‫ݍ‬൫‫ݑ‬௜௝൯ is the ߙ-th worst observation over the sample ‫ݑ‬௜௝ 3) Calculate for each realisation ‫ݑ‬௜௝, the quantity ℎ௜௝ = ௨೔ೕିॱൣ௨೔ೕ|௨೔ೕழ௤൫௨೔ೕ൯൧ ට௏஺ோൣ௨೔ೕ|௨೔ೕழ௤൫௨೔ೕ൯൧ which is a realisation of the random variable ‫ܶܪ‬ defined above. 4) Given the actual ‫ܶܤ‬௧, estimate ℙሺ‫ܪ‬௧ < ‫ܶܤ‬௧ሻ using the sample ℎ௜௝ as an empirical distribution of ‫ܪ‬௧ 5) Determine the test p-value as the median of ℙሺ‫ܪ‬௧ < ‫ܶܤ‬௧ሻ and compare the value to the test level fixed at ‫.݌‬ The methodology has been applied on the test portfolio of the normalized daily returns for the 2014 to 2015 year. The results, where ‫ݖ‬௧ is a standard Gaussian noise ࣨሺ0,1ሻ, are the following: Table 3 Summary of the Righi and Ceretta implementation For the test value of a fixed level at 97.5%, the Righi and Ceretta methodology gives satisfactory results with a pass for both the median and mean computations. Finally, one can conclude that this methodology is acceptable, nevertheless it relies on a parametric assumption that may not fit the portfolio, which is not captured by the test statistics. Distribution* Standard Normal Level of freedom* none Confidence Level of the ES* 97,5% Scenario 05/08/2014 VaRth 1,96 ESth 2,34 VaRobs 2,38 Number of exceedances 12 ESobs 2,49 var(X<-VaRobs) 0,0941 Critical Value** - median 0,00% Critical Value** - mean 0,00% Output Final Output PASS BT Results Inputs
  • 22. © Global Research & Analytics Dept.| 2018 | All rights reserved 22 In the table below are displayed the exceedance rates and the associated test statistics: Table 4 Exceedance rates and test statistics of the portfolio 3.1.3. Emmer, Kratz and Tasche The method presented by Emmer and al. (2013) consists in replacing the ES back-testing with a VaR back-testing. This substitution relies on the approximation of the ES as a mean of several VaR levels, according to the following formula: ‫ܵܧ‬∝ = 1 1−∝ න ܸܴܽ௨݀‫ݑ‬ ଵ ∝ = lim ே→ାஶ 1 ܰ ෍ ܸܴܽ∝ା௞ቀ ଵି∝ ே ቁ ேିଵ ௞ୀ଴ ≈ 1 5 ൬ܸܴܽ∝ + ܸܴܽ∝ା ଵି∝ ହ + ܸܴܽ∝ାଶ∙ ଵି∝ ହ + ܸܴܽ∝ାଷ∙ ଵି∝ ହ + ܸܴܽ∝ାସ∙ ଵି∝ ହ ൰ Hence, assuming α=97.5%, the formula becomes: ‫ܵܧ‬ଽ଻.ହ% ൌ 1 5 ሺܸܴܽଽ଻.ହ% + ܸܴܽଽ଼% + ܸܴܽଽ଼.ହ% + ܸܴܽଽଽ% + ܸܴܽଽଽ.ହ%ሻ Therefore, by back-testing the different VaR 97.5%, 98%, 98.5%. 99% and 99.5% one should complete the back-testing of ES. If all these levels of VaR are validated, then the ES should be considered as well. However, this methodology has many drawbacks since one should determine an appropriate N that ensures that the average VaR converges to the ES, otherwise it would imply too many uncertainties in the approximation. Given the value of N, the tests could not be implemented due to high computation time (calculation of N different VaRs). For instance, in the Emmer and al. proposal, it is assumed that the convergence is obtained for N=5 which explains the means performed on 5 Value-at-Risk. Finally, one should also propose an adapted traffic light table since it may be not relevant or too restrictive to require a pass on all the VaR levels. Exceedance # Exceedance Value Test statistic P(Ht<Bt) 1 2,128- 1,179 2,4% 2 2,128- 1,180 2,9% 3 2,304- 0,606 1,7% 4 1,996- 1,610 4,3% 5 2,361- 0,420 1,4% 6 2,879- 1,270- 0,3% 7 2,831- 1,111- 0,4% 8 3,082- 1,930- 0,2% 9 3,077- 1,916- 0,2% 10 2,681- 0,623- 0,6% 11 2,396- 0,306 1,3% 12 2,014- 1,550 4,1%
  • 23. © Global Research & Analytics Dept.| 2018 | All rights reserved 23 3.1.4. Summary of the methods In this section, it is summarised the three different methods in term of application and implementation as well as their drawbacks: Wong’s method | Figure 7 - Summary of Wong’s methodology
  • 24. © Global Research & Analytics Dept.| 2018 | All rights reserved 24 Righi and Ceretta Method | Figure 8 - Summary of Righi and Cereta’s methodology Emmer, Kratz and Tasche Method | Figure 9 - Summary of Emmer, Kratz and Tasche’s methodology
  • 25. © Global Research & Analytics Dept.| 2018 | All rights reserved 25 3.2. Alternative Methods In the following sections are presented alternative methods introduced by the Global Research & Analytics (GRA) department of Chappuis Halder &Co. First of all, it is important to note that some of the following methods rely on a major hypothesis, which is the consistency of the theoretical VaR for a given period of time. This strong – and not often met - assumption is due to the use of what is called “observed ES”. The observed ES reflects the realised average loss (above the 97.5% quantile) during a 1-year time period as illustrated in the below formula: ‫ܵܧ‬௢௕௦ ൌ ∑ ܺ௧ାଵ‫ܫ‬ሺܺ௧ାଵ > ܸܴܽ௧ሻଶହ଴ ௧ୀଵ ܰ Where ܺ௧ corresponds to the return day ‫ݐ‬ and ܰ is the number of exceedances during the year (ܰ ൌ ∑ ‫ܫ‬ሺܺ௧ାଵ > ܸܴܽ௧ሻଶହ଴ ௧ୀଵ with ‫ܫ‬ the identity function)1 . However, this value only makes sense as long as the theoretical VaR (or more broadly the P&L distribution used for calibration) doesn’t change during this time period. Should the opposite occur, one would look at the loss beyond a level that changes with time, and the average of these losses would lose any meaning. 3.2.1. ES Benchmarking This method focuses on the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ between the theoretical ES (obtained by calibration) and the observed ES (corresponding to realised returns). The main goal of this methodology is to make sure the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ (back-testing date) is located within the confidence interval. This interval can be found by recreating a distribution from historical values. The output of the back-test depends on the position of the observed distance: if the value is within the interval, the back-test is accepted, otherwise it is rejected. The historical distribution is based on 5 years returns (i.e. 5*250 values). For each day of these 5 years, the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦, ‫ܵܧ‬௧௛ and ‫ܵܧ‬௢௕௦ is calculated as described in the introduction of this section. The 1,250 values collected can therefore be used to build a distribution that fits historical behaviour. 1 As mentioned in part 2.1.1, VaR is calculated with a 1-day time horizon. Therefore, the return that is compared to the VaR[t] is the return X[t+1]
  • 26. © Global Research & Analytics Dept.| 2018 | All rights reserved 26 Figure 10 - Illustration of ES Benchmarking Methodology One of the main downside of the methodology is that it relies on the notion of observed ES. However, as mentioned earlier, this particular value requires a constant VaR, which is not often met in reality. Finally, once the confidence interval is obtained, one can used it in order to back-test the simulated ES on the future back-testing horizon. 3.2.2. Bootstrap This methodology focuses on the value of the observed ES. As for the previous methodology, the goal of this method is to verify that the observed ES is located within the confidence interval. The latest can be found by recreating a distribution from historical values using the bootstrap approach which is detailed below. The output of the test depends on the position of the observed ES (back-testing date): if the value is in the interval, the back-test is accepted, otherwise it is rejected. In this methodology, the bootstrap approach is used to build a more consequent vector of returns in order to find the distribution of the ES as the annual mean of returns exceeding the VaR. This approach consists in simulating returns, using only values from a historical sample. The vector formed by all simulated values therefore only contains historical data that came from the original sample. The overall methodology relies in 3 steps as illustrated in Figure 11: 1. The sample vector is obtained and contains the returns of 1-year data; 2. Use of the bootstrap method to create a bigger vector, filled only with values from the sample vector. This vector will be called the “Bootstrap Vector”; 3. The final vector, used for compiling the distribution, is obtained by selecting only returns exceeding the VaR from the bootstrap vector; 4. The distribution can be reconstructed, using the final vector.
  • 27. © Global Research & Analytics Dept.| 2018 | All rights reserved 27 Figure 11 - Illustration of Bootstrap Methodology 3.2.3. Quantile Approaches Whereas the Expected Shortfall is usually expressed as a value of the loss (i.e. in £, $, etc.), the two methodologies Quantile 1 and Quantile 2 choose to focus on the ES as a quantile, or a probability value of the P&L distribution. The two methods differ in the choice of the quantile adopted for the approach. The following paragraphs describe the two different options for the choice of the quantile. Quantile 1 This methodology focuses on the quantile of the observed ES (back-testing date), in other words the answer to the question: “which probability is associated to the value of a specific Expected Shortfall in the P&L distribution?” One must notice that this quantile is not the confidence level of the Expected Shortfall. Indeed, let’s take a confidence level of 97.5% as requested by the regulation. It is possible to estimate an observed VaR and therefore an observed ES as a mean of returns exceeding the VaR. The observed quantile can be found by looking at the P&L distribution and spotting the probability associated to the ES value. The ES being strictly greater than the VaR, the quantile will always be strictly greater than 97.5%. Figure 12 - Calculation of the Quantile Q1 Quantile 2 This methodology looks at the stressed-retroactive quantile of the observed ES (back-testing date), that is to say the answer to the question: “To which quantile correspond the observed ES
  • 28. © Global Research & Analytics Dept.| 2018 | All rights reserved 28 at the time of the back-testing if it was observed in the reference stressed period used for calibration?” Figure 13 - Calculation of the quantile Q2 Back-testing methodology Once the choice of the quantile computation is done, the approach is the same for the two methodologies: it consists in verifying that the calculated quantile at the date of the back-testing is located in the confidence interval obtained from a reconstructed historical distribution. If the quantile is within the confidence interval, the back-test is accepted, otherwise it is rejected. The distribution is obtained using the same framework as for the ES Benchmarking methodology (see Section 3.2.1). The quantile is computed each day over 5 years (for the first method the observed quantile and for the second one, the stressed-retroactive quantile). Those 1,250 values are used to build a historical distribution of the chosen quantile and the confidence interval immediately follows. Figure 14 - Illustration of the Quantile 1 Methodology 3.2.4. Summary of the methods ES Benchmarking |
  • 29. © Global Research & Analytics Dept.| 2018 | All rights reserved 29 Figure 15- Summary of the ES Benchmarking methodology Quantile method | Figure 16 - Summary of the Quantile methodology
  • 30. © Global Research & Analytics Dept.| 2018 | All rights reserved 30 Bootstrap method | Figure 17 - Summary of the Bootstrap methodology
  • 31. © Global Research & Analytics Dept.| 2018 | All rights reserved 31 4. Applications of the ES methodology and back-testing 4.1. ES simulations In this section, the back-testing approaches presented in the 3.2 section have been applied on simulations of the S&P 500 index1. Indeed, instead of performing back-testing on parametric distributions; it has been decided to perform a back-testing exercise on simulated values of an equity (S&P 500 index) based on Monte Carlo simulations. The historic levels of the S&P 500 are displayed in Figure 18 below: Figure 18 - S&P 500 Level – From January 2011 to December 2015 A stochastic model has been used in order to forecast the one-day return of the stock price which has been compared to the observed returns. The stochastic model relies on a Geometric Brownian Motion (hereafter GBM) and the simulations are done with a daily reset as it could be done in a context of Market Risk estimation. The stochastic differential equation (SDE) of a GBM in order to diffuse the stock price is as follows: ݀ܵ ൌ ܵሺߤ݀‫ݐ‬ + ߪܹ݀ሻ And the closed form solution of the SDE is: ܵሺ‫ݐ‬ሻ ൌ ܵሺ0ሻ݁ ൤൬ఓି ఙమ ଶ ൰௧ାఙௐሺ௧ሻ൨ Where: − S is the Stock price 1 The time period and data selected (from January 2011 to December 2015) is arbitrary and one would obtain similar results and findings with other data. 1 000 1 200 1 400 1 600 1 800 2 000 2 200 2 400 S&P 500 Level - January 2011 to December 2015 S&P 500 Level
  • 32. © Global Research & Analytics Dept.| 2018 | All rights reserved 32 − ߤ is the expected return − ߪ is the standard deviation of the expected return − ‫ݐ‬ the time − ܹሺ‫ݐ‬ሻ is a Brownian Motion Simulations are performed on a day-to-day basis over the year 2011 and 1,000 scenarios are produced per time points. Therefore, thanks to these simulations, it is possible to compute a one-day VaR99% as well as a one-day ES97.5% of the return which are then compared to the observed return price. Both VaR and ES are computed as follows: ܸܴܽଽଽ%ሺ‫ݐ‬ሻ ൌ ܳଽଽ% ቆ ܵ௦௜௠ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ ቇ ‫ܵܧ‬ଽ଻.ହ%ሺ‫ݐ‬ሻ ൌ ∑ ൬ ܵ௦௜௠ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ ൰ I ൜ ௌೞ೔೘ሺ௧ሻି ௌ೚್ೞሺ௧ିଵሻ ௌ೚್ೞሺ௧ିଵሻ ஹ௏௔ோవళ.ఱ%ሺ௧ሻൠ ௡ ௜ୀଵ ∑ I ൜ ௌೞ೔೘ሺ௧ሻି ௌ೚್ೞሺ௧ିଵሻ ௌ೚್ೞሺ௧ିଵሻ ஹ௏௔ோవళ.ఱ%ሺ௧ሻൠ ௡ ௜ୀଵ Where n is the total number of scenarios per time point t. The figure below shows the results of our simulations and computations of the VaR99% and ES97.5%: Figure 19 - Observed returns vs. simulations – From January 2011 to January 2012 -8,0% -6,0% -4,0% -2,0% 0,0% 2,0% 4,0% 6,0% S&P 500 - Observed vs. Simulations returns 1d VaR - 99% ES -97,5%
  • 33. © Global Research & Analytics Dept.| 2018 | All rights reserved 33 Figure 19 shows that in comparison to the observed daily returns, the VaR99% and the ES97,5% gives the same level of conservativeness. This is further illustrated with Figure 20 where it is observed that the level of VaR and Expected shortfall are close. Figure 20 – Comparison of the VaR99% with the ES97.5% - From January 2011 to January 2012 When looking at Figure 20, one can notice that the ES97.5% doesn’t always lead to more conservative results in comparison to the VaR99%. This is explained by the fact that the ES is the mean of the values above the VaR97.5%, consequently and depending on the Monte Carlo simulations it is realistic to observe ES97.5% slightly below the VaR99%. Finally, when looking at the simulations, one can conclude that both risk measures are really close. Indeed the distribution of the spread between the simulated VaR99% and the ES97.5% (see Figure 21 below); it is observed that 95% of the spread between both risk measures are within the ]-0.1%, 0.275%] interval. -2,7% -2,6% -2,5% -2,4% -2,3% -2,2% -2,1% -2,0% -1,9% -1,8% -1,7% S&P 500 - Comparison of the VaR with the ES VaR - 99% ES - 97,5%
  • 34. © Global Research & Analytics Dept.| 2018 | All rights reserved 34 Figure 21 – Spread VaR99% vs. ES97.5% - January 2011 to September 2015 Following the computation of these simulated ES, it can be concluded that in comparison to a VaR measure, the ES is not overly conservative and severe measure. Given these findings and knowing that the ES is a more consistent measure in comparison to the VaR (due to the way it is estimated), it can be accepted as a suitable risk measure provided that a reliable approach is used in order to back-test the results. 4.2. Back-test of the ES using our alternative methods Following the previous conclusions, it has been decided to focus on some approaches defined in section 3.2. That’s why an observed ES has been computed based on the daily VaR97.5% (obtained via the MC simulations) and the observed returns over the year following the simulation date. Its expression is as follows: ‫ܵܧ‬௢௕௦ሺ‫ݐ‬ሻ ൌ ∑ ܴሺ‫ݐ‬ + ݅ሻIሼோሺ௧ା௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ ௠ ௜ୀଵ ∑ Iሼோሺ௧ା௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ ௠ ௜ୀଵ Where m is the number of days in the year following the date t and R(t) the daily return observed at date t: ܴሺ‫ݐ‬ሻ ൌ ܵ௢௕௦ሺ‫ݐ‬ሻ − ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ ܵ௢௕௦ሺ‫ݐ‬ − 1ሻ As presented in the section 3.2.1, this observed ES has been compared to the theoretical daily simulated ES. 1% 2% 3% 8% 13% 18% 18% 15% 11% 5% 2% 1% 0% 1% 1% 1% 0% 0% 0% 0% (VaR99% - ES97,5%) Distribution of the spread of the (VaR99% - ES97,5%)- From January 2011 to December 2015
  • 35. © Global Research & Analytics Dept.| 2018 | All rights reserved 35 Figure 22 – Comparison of the theoretical ES against the observed ES - From January 2011 to January 2012 Figure 22 shows both theoretical and observed ES computed over the year 2011 whereas Figure 23 presents the distribution of the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦. Figure 23 – Distribution of the distance ‫ܵܧ‬௧௛ − ‫ܵܧ‬௢௕௦ - From January 2011 to January 2012 Figure 23 shows that, over the year 2011, the observed ES is lower than the theoretical ES in 98% of the case and 95% of the distance range between [0.04%; 0.95%]. Based on these results, it is possible to define a 95% confidence interval for the future comparison of the observed ES vs. the theoretical ES in order to assess the accuracy and conservativeness of the theoretical ES. This confidence interval has been applied to the data of the year 2012, 2013 and 2014 where: -3,5% -3,0% -2,5% -2,0% -1,5% -1,0% Theoretical ES vs. Observed ES - From January 2011 to January 2012 ES (th.) ES (obs.) 2,0% 4,0% 12,6% 10,7% 7,5% 6,7% 9,9% 11,1% 18,2% 12,3% 4,0% 0,4% 0,8% ES(th.)-ES(obs.)distribution -From 05/01/2011 to 05/01/2012
  • 36. © Global Research & Analytics Dept.| 2018 | All rights reserved 36 - a positive result is obtained when the distance between the theoretical and observed ES is below the lower bound (i.e. the theoretical ES is conservative), - a neutral result is obtained when the distance is within the confidence interval (the theoretical and observed ES are assumed close), - a negative result is obtained when the distance is above the upper bound (i.e. the theoretical ES lack of conservativeness). Results are presented in Table 5 below, where it is noted that the interval computed over the year 2011 leads to satisfactory results since a majority of positive results are observed. Table 5 – Results of the ES back-testing – 2012, 2013 and 2014 The benefits of this approach is that it gives a way to back-test the level of the simulated ES via the computation of thresholds based on the results of the previous year. Nevertheless, one can challenge the way the observed ES is computed. Indeed, instead of relying on a forward-looking approach; it could be computed via a backward-looking approach: ‫ܵܧ‬௢௕௦ሺ‫ݐ‬ሻ = ∑ ܴሺ‫ݐ‬ − ݅ሻIሼோሺ௧ି௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ ௠ ௜ୀଵ ∑ Iሼோሺ௧ି௜ሻஹ௏௔ோవళ.ఱ%ሺ௧ሻሽ ௠ ௜ୀଵ This approach has been tested on the 2013 data. Figure 24 shows both theoretical and observed ES computed over the year 2013 using a backward-looking approach whereas Figure 25 the results of the forward-looking methodology. # % # % # % 2012 72 28,7% 179 71,3% - 0,0% 2013 174 69,3% 77 30,7% - 0,0% 2014 159 63,3% 92 36,7% - 0,0% Total 405 53,7% 348 46,2% 0 0,0% Year Positive Neutral Negative
  • 37. © Global Research & Analytics Dept.| 2018 | All rights reserved 37 Figure 24 – Comparison of the theoretical ES against the observed ES (backward looking)- January 2013 to January 2014 Figure 25 – Comparison of the theoretical ES against the observed ES (forward looking)- January 2013 to January 2014 The comparison of both Figure 24 and Figure 25 reveals that the backward-looking approach leads to a more conservative and consistent computation of the observed ES since the distance between the simulations and the observations is marginal. Furthermore, the use of a backward-looking approach can be implemented on a daily basis, whereas the forward- looking relies on future observation of returns. The results of the backward-looking approach have been used in order to recalibrate the interval, as expected the new interval is now narrowed and is equal to [-0.07%; 0.26%]. The results of the ES back-testing are presented in Table 6: -2,5% -2,4% -2,3% -2,2% -2,1% -2,0% -1,9% -1,8% Theoretical ES vs. Observed ES (backward looking) - From January 2013 to January 2014 ES (th.) ES (obs.) -2,5% -2,4% -2,3% -2,2% -2,1% -2,0% -1,9% -1,8% Theoretical ES vs. Observed ES (forward looking) - From January 2013 to January 2014 ES (th.) ES (obs.)
  • 38. © Global Research & Analytics Dept.| 2018 | All rights reserved 38 Table 6 – Results of the ES back-testing (backward looking) – 2014 and 2015 When looking at Table 6, one can notice that the ES back-testing relying on the backward- looking approach leads to more situations where the simulated ES are underestimated which is explained by the interval being smaller. Overall, this shows the complexity of back-testing the ES since it is less straight forward than a VaR back-testing and based on the definition of the observed ES. Furthermore, it can be noted in Table 5 and Table 6 that when using the ES as a risk measure it could lead to instability in the back-testing results over years, which shows the importance of defining a proper back-testing methodology. Then, it has been decided to test the boostrap alternative method presented in the Section 3.2. As a first step, a sample vector corresponding to the one-day returns of the year 2011 has been computed. As a second step, the boostrap vector has been constructed; this vector is filled with the values of the sample vector that have been selected randomly ten thousand times. Figure 26 below shows the bootstrap vector distribution: Figure 26 – Bootstrap Vector – Random sampling of the 2011 one-day returns - 10 000 observations # % # % # % 2014 78 31,1% 172 68,5% 1 0,4% 2015 141 56,2% 21 8,4% 89 35,5% Total 227 30,1% 495 49,3% 282 28,1% Positive Neutral Negative Year 0,4% 0,0% 1,2% 0,8% 5,9% 10,5% 25,8% 36,5% 13,6% 3,2% 1,1% 1,2% Bootstrap Vectordistribution
  • 39. © Global Research & Analytics Dept.| 2018 | All rights reserved 39 Finally, for each date of the year 2011, a final vector with all the value exceeding the daily estimated VaR is estimated. For instance, as of January 6 2011, the estimated VaR97.5% is-1.97% which leads to a final vector distribution as follows: Figure 27 – Final Vector – 06.01.2011 – VaR97.5% = -1.97% As such, for each time point of the year 2011, it is possible to estimate a Bootstrapped ES97.5% which will be used as a reference value to back-test the simulated ES97.5%. The results of the ES back-testing are presented in Figure 28 below: Figure 28 – ES Comparisons – January 2011 to January 2012 43% 28% 5% 5% 9% 5% 0% 0% 0% 5% Bootstrapped returnsabove theVaR - Final Vector - 06.01.2011 -3,3% -3,1% -2,9% -2,7% -2,5% -2,3% -2,1% -1,9% -1,7% -1,5% ES comparisons- 5 January2011 to January 2012 ES - 97,5% ES - 97,5% - Boostrap Observed ES (forward looking)
  • 40. © Global Research & Analytics Dept.| 2018 | All rights reserved 40 When looking at Figure 28, one can compare the bootstrapped ES97.5% and the Observed ES as reference value for back-testing purpose. In particular, it is observed that both curves are similar for the first 9 months, then the observed ES significantly decreases from this date. Hence, the bootstrapped ES appears to be more stable over the year and would be a more reliable value to back-test the simulated ES since no breach are observed in comparison to the observed ES where it fails on the last months. The same exercise has been done on the data of the year 2015 and with an observed ES computed using a backward-looking approach. Results are displayed in Figure 29 where similar conclusions are drawn. Figure 29 - ES Comparisons – 02 January 2015 to 31 December 2015 -3,3% -3,1% -2,9% -2,7% -2,5% -2,3% -2,1% -1,9% -1,7% -1,5% ES comparisons - 2 january 2015 to 31 december2015 ES - 97,5% ES - 97,5% - Boostrap Observed ES (backward looking)
  • 41. © Global Research & Analytics Dept.| 2018 | All rights reserved 41 5. Conclusion This white paper presented the latest developments in terms of ES back-testing methodologies and introduced new methodologies developed by the Global Research & Analytics (GRA) team of Chappuis Halder & Co. After presenting and testing several methods that can be found in the literature, it has been concluded that these methods may not fit for the purpose of a regulatory back-testing since they rely on questionable assumptions or heavy computation time. Then, in order to highlight the specificities of the back-testing of the Expected Shortfall, it has been decided to implement and test the alternative methods that have been presented in this article. Overall, it has been concluded that the complexity of back-testing the Expected Shortfall relies on a proper definition of the observed ES, which should serve as a reference value for back-testing. Indeed, it is clear that the estimation of a simulated Expected Shortfall is quite straightforward since it relies on the computation of the simulated Value-at-Risk; this is not the case of the computation of the observed Expected Shortfall. Indeed, in order to perform an apple-to-apple comparison, one can’t just compare a simulated daily Expected Shortfall to a daily observed return. Knowing that the Expected shortfall corresponds to the average value above the worst loss defined under a specific quantile, it sounds natural to introduce these features while estimating the observed Expected Shortfall. Hence, in order to propose a relevant back-testing of the simulated ES, one should first decide on the assumptions used for the computation of the observed ES. For example, it is important to choose if it has to be computed with a backward or forward-looking approach, the number of time points to use, the frequency of the calculations, etc. These assumptions need to be chosen wisely in order to calibrate a relevant interval of confidence for ES comparisons. Indeed, it has been discussed in this article that the back-testing results could be different and instable with regards to the computation methodology of the observed ES. That’s why, on the basis of the tests performed in this article, it has been observed that the more reliable back-testing results came from the computation of a bootstrapped ES since it as the advantage of considering a P&L distribution constant during the time horizon, which produced a stable but conservative level of confidence.
  • 42. © Global Research & Analytics Dept.| 2018 | All rights reserved 42 References − Consultative document Fundamental Review of the trading book: A revised market risk framework, Basel Committee on Banking Supervision, January 2014 − Minimum capital requirements for market risk, Basel Committee on Banking Supervision, January 2016 − Introducing three model-independent, non-parametric back-test methodologies for Expected Shortfall, Carlo Acerbi and Balazs Szekely, December 2014 − Individual and Flexible Expected Shortfall Backtesting, Marcelo Righi and Paulo Sergio Ceretta, June 2013 − Backtesting Expected Shortfall: the design and implementation of different backtests, Lisa Wimmerstedt, August 2015 − Backtesting trading risk of commercial banks using expected shortfall, Woon K Wong, 2008 − Techniques for verifying the accuracy of risk measurement models, P.H. Kupiec, 1995