## Overview of VaR Backtesting

*Market risk* is the risk of losses in positions arising from
movements in market prices. Value-at-risk (VaR) is one of the main measures of financial
risk. VaR is an estimate of how much value a portfolio can lose in a given time period
with a given confidence level. For example, if the one-day 95% VaR of a portfolio is
10MM, then there is a 95% chance that the portfolio loses less than 10MM the following
day. In other words, only 5% of the time (or about once in 20 days) the portfolio losses
exceed 10MM.

For many portfolios, especially trading portfolios, VaR is computed daily. At the closing of the following day, the actual profits and losses for the portfolio are known and can be compared to the VaR estimated the day before. You can use this daily data to assess the performance of VaR models, which is the goal of VaR backtesting. The performance of VaR models can be measured in different ways. In practice, many different metrics and statistical tests are used to identify VaR models that are performing poorly or performing better. As a best practice, use more than one criterion to backtest the performance of VaR models, because all tests have strengths and weaknesses.

Suppose that you have VaR limits and corresponding returns or profits and losses for
days *t* = 1,…,*N*. Use VaR*t* to
denote the VaR estimate for day *t* (determined on day
*t* − 1). Use *Rt* to denote the actual return or
profit and loss observed on day *t*. Profits and losses are expressed
in monetary units and represent value changes in a portfolio. The corresponding VaR
limits are also given in monetary units. Returns represent the change in portfolio value
as a proportion (or percentage) of its value on the previous day. The corresponding VaR
limits are also given as a proportion (or percentage). The VaR limits must be produced
from existing VaR models. Then, to perform a VaR backtesting analysis, provide these
limits and their corresponding returns as data inputs to the VaR backtesting tools in
Risk Management Toolbox™.

The toolbox supports these VaR backtests:

Binomial test

Traffic light test

Kupiec’s tests

Christoffersen’s tests

Haas’s tests

### Binomial Test

The most straightforward test is to compare the observed number of exceptions,
*x*, to the expected number of exceptions. From the properties
of a binomial distribution, you can build a confidence interval for the expected
number of exceptions. Using exact probabilities from the binomial distribution or a
normal approximation, the `bin`

function uses a normal
approximation. By computing the probability of observing *x*
exceptions, you can compute the probability of wrongly rejecting a good model when
*x* exceptions occur. This is the *p*-value
for the observed number of exceptions *x*. For a given test
confidence level, a straightforward accept-or-reject result in this case is to fail
the VaR model whenever *x* is outside the test confidence interval
for the expected number of exceptions. “Outside the confidence
interval” can mean too many exceptions, or too few exceptions. Too few
exceptions might be a sign that the VaR model is too conservative.

The test statistic is

$${Z}_{bin}=\frac{x-Np}{\sqrt{Np(1-p)}}$$

where *x* is the number of failures, *N* is the
number of observations, and *p* = `1`

– VaR level.
The binomial test is approximately distributed as a standard normal
distribution.

For more information, see References for Jorion and `bin`

.

### Traffic Light Test

A variation on the binomial test proposed by the Basel Committee is the
*traffic light test* or *three zones
test*. For a given number of exceptions *x*, you can
compute the probability of observing up to *x* exceptions. That is,
any number of exceptions from 0 to *x*, or the cumulative
probability up to *x*. The probability is computed using a binomial
distribution. The three zones are defined as follows:

The “red” zone starts at the number of exceptions where this probability equals or exceeds 99.99%. It is unlikely that too many exceptions come from a correct VaR model.

The “yellow” zone covers the number of exceptions where the probability equals or exceeds 95% but is smaller than 99.99%. Even though there is a high number of violations, the violation count is not exceedingly high.

Everything below the yellow zone is "green." If you have too few failures, they fall in the green zone. Only too many failures lead to model rejections.

For more information, see References for Basel Committee on Banking Supervision and
`tl`

.

### Kupiec’s POF and TUFF Tests

Kupiec (1995) introduced a variation on the binomial test called the proportion of
failures (POF) test. The POF test works with the binomial distribution approach. In
addition, it uses a likelihood ratio to test whether the probability of exceptions
is synchronized with the probability *p* implied by the VaR
confidence level. If the data suggests that the probability of exceptions is
different than *p*, the VaR model is rejected. The POF test
statistic is

$$L{R}_{POF}=-2\mathrm{log}\left(\frac{{\left(1-p\right)}^{N-x}{p}^{x}}{{\left(1-\frac{x}{N}\right)}^{N-x}{\left(\frac{x}{N}\right)}^{x}}\right)$$

where *x* is the number of failures, *N* the
number of observations and *p* = `1`

– VaR
level.

This statistic is asymptotically distributed as a chi-square variable with 1 degree of freedom. The VaR model fails the test if this likelihood ratio exceeds a critical value. The critical value depends on the test confidence level.

Kupiec also proposed a second test called the time until first failure (TUFF). The
TUFF test looks at when the first rejection occurred. If it happens too soon, the
test fails the VaR model. Checking only the first exception leaves much information
out, specifically, whatever happened after the first exception is ignored. The TBFI
test extends the TUFF approach to include all the failures. See `tbfi`

.

The TUFF test is also based on a likelihood ratio, but the underlying distribution
is a geometric distribution. If *n* is the number of days until the
first rejection, the test statistic is given by

$$L{R}_{TUFF}=-2\mathrm{log}\left(\frac{p{\left(1-p\right)}^{n-1}}{\left(\frac{1}{n}\right){\left(1-\frac{1}{n}\right)}^{n-1}}\right)$$

This statistic is asymptotically distributed as a chi-square variable with 1
degree of freedom. For more information, see References for
Kupiec, `pof`

, and `tuff`

.

### Christoffersen’s Interval Forecast Tests

Christoffersen (1998) proposed a test to measure whether the probability of observing an exception on a particular day depends on whether an exception occurred. Unlike the unconditional probability of observing an exception, Christoffersen's test measures the dependency between consecutive days only. The test statistic for independence in Christoffersen’s interval forecast (IF) approach is given by

$$L{R}_{CCI}=-2\mathrm{log}\left(\frac{{\left(1-\pi \right)}^{n00+n10}{\pi}^{n01+n11}}{{\left(1-{\pi}_{0}\right)}^{n00}{\pi}_{0}^{n01}{\left(1-{\pi}_{1}\right)}^{n10}{\pi}_{1}^{n11}}\right)$$

where

*n*`00`

= Number of periods with no failures followed by a period with no failures.*n*`10`

= Number of periods with failures followed by a period with no failures.*n*`01`

= Number of periods with no failures followed by a period with failures.*n*`11`

= Number of periods with failures followed by a period with failures.

and

*π*_{0}— Probability of having a failure on period*t*, given that no failure occurred on period*t*− 1 =*n*`01`

/ (*n*`00`

+*n*`01`

)*π*_{1}— Probability of having a failure on period*t*, given that a failure occurred on period*t*− 1 =*n*`11`

/ (*n*`10`

+*n*`11`

)*π*— Probability of having a failure on period*t*= (*n*`01`

+*n*`11`

/ (*n*`00`

+*n*`01`

+*n*`10`

+*n*`11`

)

This statistic is asymptotically distributed as a chi-square with 1 degree of freedom. You can combine this statistic with the frequency POF test to get a conditional coverage (CC) mixed test:

`LR`

=
_{CC}`LR`

+
_{POF}`LR`

_{CCI}

This test is asymptotically distributed as a chi-square variable with 2 degrees of freedom.

For more information, see References for Christoffersen, `cc`

, and `cci`

.

### Haas’s Time Between Failures or Mixed Kupiec’s Test

Haas (2001) extended Kupiec’s TUFF test to incorporate the time information between all the exceptions in the sample. Haas’s test applies the TUFF test to each exception in the sample and aggregates the time between failures (TBF) test statistic.

$$L{R}_{TBFI}=-2{\displaystyle {\sum}_{i=1}^{x}\mathrm{log}}\left(\frac{p{\left(1-p\right)}^{{n}_{i}-1}}{\left(\frac{1}{{n}_{i}}\right){\left(1-\frac{1}{{n}_{i}}\right)}^{{n}_{i}-1}}\right)$$

In this statistic, *p* = `1`

– VaR level and
*n*_{i} is the number of
days between failures *i*-1 and *i* (or until the
first exception for *i* = 1). This statistic is asymptotically
distributed as a chi-square variable with *x* degrees of freedom,
where *x* is the number of failures.

Like Christoffersen’s test, you can combine this test with the frequency POF test to get a TBF mixed test, sometimes called Haas’ mixed Kupiec’s test:

$$L{R}_{TBF}=L{R}_{POF}+L{R}_{TBFI}$$

This test is asymptotically distributed as a chi-square variable with
*x*+1 degrees of freedom. For more information, see References for
Haas, `tbf`

, and `tbfi`

.

## References

[1] Basel Committee on Banking Supervision, *Supervisory framework
for the use of “backtesting” in conjunction with the internal
models approach to market risk capital requirements.* January 1996,
https://www.bis.org/publ/bcbs22.htm.

[2] Christoffersen, P. "Evaluating Interval Forecasts."
*International Economic Review.* Vol. 39, 1998, pp.
841–862.

[3] Cogneau, P. *“Backtesting Value-at-Risk: how good is the
model?"* Intelligent Risk, PRMIA, July, 2015.

[4] Haas, M. *"New Methods in Backtesting."* Financial
Engineering, Research Center Caesar, Bonn, 2001.

[5] Jorion, P. *Financial Risk Manager Handbook.*
*6th Edition*, Wiley Finance, 2011.

[6] Kupiec, P. "Techniques for Verifying the Accuracy of Risk Management
Models." *Journal of Derivatives.* Vol. 3, 1995, pp.
73–84.

[7] McNeil, A., Frey, R., and Embrechts, P. *Quantitative Risk
Management.* Princeton University Press, 2005.

[8] Nieppola, O. “Backtesting Value-at-Risk Models.” Helsinki School of Economics, 2009.

## See Also

`varbacktest`

| `tl`

| `bin`

| `pof`

| `tuff`

| `cc`

| `cci`

| `tbf`

| `tbfi`

| `summary`

| `runtests`

| `select`

| `plot`

| `exceptions`

| `append`