Alternative ARIMA Model Representations

Mathematical Development of regARIMA to ARIMAX Model Conversion

ARIMAX models and regression models with ARIMA errors are closely related, and the choice of which to use is generally dictated by your goals for the analysis. If your objective is to fit a parsimonious model to data and forecast responses, then there is very little difference between the two models.

If you are more interested in preserving the usual interpretation of a regression coefficient as a measure of sensitivity, i.e., the effect of a unit change in a predictor variable on the response, then use a regression model with ARIMA errors. Regression coefficients in ARIMAX models do not possess that interpretation because of the dynamic dependence on the response [1].

Suppose that you have the parameter estimates from a regression model with ARIMA errors, and you want to see how the model structure compares to ARIMAX model. Or, suppose you want some insight as to the underlying relationship between the two models.

The ARIMAX model is (t = 1,...,T):

 $Η\left(L\right){y}_{t}=c+{X}_{t}\beta +Ν\left(L\right){\epsilon }_{t},$ (1)
where

• yt is the univariate response series.

• Xt is row t of X, which is the matrix of concatenated predictor series. That is, Xt is observation t of each predictor series.

• β is the regression coefficient.

• c is the regression model intercept.

• $Η\left(L\right)=\varphi \left(L\right){\left(1-L\right)}^{D}\Phi \left(L\right)\left(1-{L}^{s}\right)=1-{\eta }_{1}L-{\eta }_{2}{L}^{2}-...-{\eta }_{P}{L}^{P},$ which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials. For more details on notation, see Multiplicative ARIMA Model.

• $Ν\left(L\right)=\theta \left(L\right)\Theta \left(L\right)=1+{\nu }_{1}L+{\nu }_{2}{L}^{2}+...+{\nu }_{Q}{L}^{Q},$ which is the degree Q lag operator polynomial that captures the combined effect of the seasonal and nonseasonal moving average polynomials.

• εt is a white noise innovation process.

The regression model with ARIMA errors is (t = 1,...,T)

 $\begin{array}{l}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ A\left(L\right){u}_{t}=B\left(L\right){\epsilon }_{t},\end{array}$ (2)
where

• ut is the unconditional disturbances process.

• $A\left(L\right)=\varphi \left(L\right){\left(1-L\right)}^{D}\Phi \left(L\right)\left(1-{L}^{s}\right)=1-{a}_{1}L-{a}_{2}{L}^{2}-...-{a}_{P}{L}^{P},$ which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials.

• $B\left(L\right)=\theta \left(L\right)\Theta \left(L\right)=1+{b}_{1}L+{b}_{2}{L}^{2}+...+{b}_{Q}{L}^{Q},$ which is the degree Q lag operator polynomial that captures the combined effect of the seasonal and nonseasonal moving average polynomials.

The values of the variables defined in Equation 2 are not necessarily equivalent to the values of the variables in Equation 1, even though the notation might be similar.

Consider Equation 2, the regression model with ARIMA errors. Use the following operations to convert the regression model with ARIMA errors to its corresponding ARIMAX model.

1. Solve for ut.

$\begin{array}{l}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ {u}_{t}=\frac{B\left(L\right)}{A\left(L\right)}{\epsilon }_{t}.\end{array}$

2. Substitute ut into the regression equation.

$\begin{array}{c}{y}_{t}=c+{X}_{t}\beta +\frac{B\left(L\right)}{A\left(L\right)}{\epsilon }_{t}\\ A\left(L\right){y}_{t}=A\left(L\right)c+A\left(L\right){X}_{t}\beta +B\left(L\right){\epsilon }_{t}.\end{array}$

3. Solve for yt.

 $\begin{array}{c}{y}_{t}=A\left(L\right)c+A\left(L\right){X}_{t}\beta +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+B\left(L\right){\epsilon }_{t}\\ =A\left(L\right)c+{Z}_{t}\Gamma +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+B\left(L\right){\epsilon }_{t}.\end{array}$ (3)
In Equation 3,

• A(L)c = (1 – a1a2 –...– aP)c. That is, the constant in the ARIMAX model is the intercept in the regression model with ARIMA errors with a nonlinear constraint. Though applications, such as simulate, handle this constraint, estimate cannot incorporate such a constraint. In the latter case, the models are equivalent when you fix the intercept and constant to 0.

• In the term A(L)Xtβ, the lag operator polynomial A(L) filters the T-by-1 vector Xtβ, which is the linear combination of the predictors weighted by the regression coefficients. This filtering process requires P presample observations of the predictor series.

• arima constructs the matrix Zt as follows:

• Each column of Zt corresponds to each term in A(L).

• The first column of Zt is the vector Xtβ.

• The second column of Zt is a sequence of d2 NaNs (d2 is the degree of the second term in A(L)), followed by the product ${L}^{{d}_{j}}{X}_{t}\beta$. That is, the software attaches d2 NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by d2 observations.

• The jth column of Zt is a sequence of dj NaNs (dj is the degree of the jth term in A(L)), followed by the product ${L}^{{d}_{j}}{X}_{t}\beta$. That is, the software attaches dj NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by dj observations.

.

• Γ = [1 –a1 –a2 ... –aP]'.

The arima converter removes all zero-valued autoregressive coefficients of the difference equation. Subsequently, the arima converter does not associate zero-valued autoregressive coefficients with columns in Zt, nor does it include corresponding, zero-valued coefficients in Γ.

4. Rewrite Equation 3,

${y}_{t}=\left(1-\sum _{k=1}^{P}{a}_{k}\right)c+{X}_{t}\beta -\sum _{k=1}^{P}{a}_{k}{X}_{t-k}\beta +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+{\epsilon }_{t}+\sum _{k=1}^{Q}{\epsilon }_{t-k}.$

For example, consider the following regression model whose errors are ARMA(2,1):

 $\begin{array}{c}{y}_{t}=0.2+0.5{X}_{t}+{u}_{t}\\ \left(1-0.8L+0.4{L}^{2}\right){u}_{t}=\left(1+0.3L\right){\epsilon }_{t}.\end{array}$ (4)
The equivalent ARMAX model is:

$\begin{array}{c}{y}_{t}=0.12+\left(0.5-0.4L+0.2{L}^{2}\right){X}_{t}+0.8{y}_{t-1}-0.4{y}_{t-2}+\left(1+0.3L\right){\epsilon }_{t}\\ =0.12+{Z}_{t}\Gamma +0.8{y}_{t-1}-0.4{y}_{t-2}+\left(1+0.3L\right){\epsilon }_{t},\end{array}$

or

$\left(1-0.8L+0.4{L}^{2}\right){y}_{t}=0.12+{Z}_{t}\Gamma +\left(1+0.3L\right){\epsilon }_{t},$

where Γ = [1 –0.8 0.4]' and

${Z}_{t}=0.5\left[\begin{array}{ccc}{x}_{1}& NaN& NaN\\ {x}_{2}& {x}_{1}& NaN\\ {x}_{3}& {x}_{2}& {x}_{1}\\ ⋮& ⋮& ⋮\\ {x}_{T}& {x}_{T-1}& {x}_{T-2}\end{array}\right].$

This model is not integrated because all of the eigenvalues associated with the AR polynomial are within the unit circle, but the predictors might affect the otherwise stable process. Also, you need presample predictor data going back at least 2 periods to, for example, fit the model to data.

Show Conversion in MATLAB®

Illustrate the conversion in MATLAB® by model simulation and estimation.

Specify the regression model with ARIMA errors in Equation 4.

MdlregARIMA0 = regARIMA('Intercept',0.2,'AR',{0.8 -0.4}, ...
'MA',0.3,'Beta',[0.3 -0.2],'Variance',0.2);

Generate presample observations and predictor data.

rng(1); % For reproducibility
T = 100;
maxPQ = max(MdlregARIMA0.P,MdlregARIMA0.Q);
numObs  = T + maxPQ;            % Adjust number of observations to account for presample
XregARIMA = randn(numObs,2);    % Simulate predictor data
u0 = randn(maxPQ,1);            % Presample unconditional disturbances u(t)
e0 = randn(maxPQ,1);            % Presample innovations e(t)

Simulate data from the regression model with ARIMA errors MdlregARIMA0.

rng(100) % For consistent seed with later call
[y1,e1,u1] = simulate(MdlregARIMA0,T,'U0',u0, ...
'E0',e0,'X',XregARIMA);

Convert the regression model with ARIMA errors to an ARIMAX model.

[MdlARIMAX0,XARIMAX] = arima(MdlregARIMA0,'X',XregARIMA);
MdlARIMAX0
MdlARIMAX0 =
arima with properties:

Description: "ARIMAX(2,0,1) Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
P: 2
D: 0
Q: 1
Constant: 0.12
AR: {0.8 -0.4} at lags [1 2]
SAR: {}
MA: {0.3} at lag [1]
SMA: {}
Seasonality: 0
Beta: [1 -0.8 0.4]
Variance: 0.2

Generate presample responses for the ARIMAX model to ensure consistency with the regression model with ARIMA errors. Simulate data from the ARIMAX model.

y0 = MdlregARIMA0.Intercept + XregARIMA(1:maxPQ,:)*MdlregARIMA0.Beta' + u0;
rng(100) % For consistent seed with earlier call
y2 = simulate(MdlARIMAX0,T,'Y0',y0,'E0',e0,'X',XARIMAX);

figure
plot(y1,'LineWidth',3)
hold on
plot(y2,'r:','LineWidth',2.5)
hold off
title("\bf Simulated Paths")
legend("regARIMA Model","ARIMAX Model",'Location','best')

The simulated paths are equal because the arima converter enforces the nonlinear constraint when it converts the regression model intercept to the ARIMAX model constant.

Fit a regression model with ARIMA errors to the simulated data.

MdlregARIMA0 = regARIMA('ARLags',[1 2],'MALags',1);
EstMdlregARIMA = estimate(MdlregARIMA0,y1,'E0',e0,'U0',u0,'X',XregARIMA);

Regression with ARMA(2,1) Error Model (Gaussian Distribution):

Value      StandardError    TStatistic      PValue
________    _____________    __________    __________

Intercept     0.14074        0.1014         1.3879         0.16518
AR{1}         0.83061        0.1375         6.0407      1.5349e-09
AR{2}        -0.45402        0.1164        -3.9007      9.5927e-05
MA{1}         0.42803       0.15145         2.8262       0.0047109
Beta(1)       0.29552      0.022938         12.883       5.597e-38
Beta(2)      -0.17601      0.030607        -5.7506      8.8941e-09
Variance      0.18231      0.027765         6.5663      5.1569e-11

Fit an ARIMAX model to the simulated data.

MdlARIMAX = arima('ARLags',[1 2],'MALags',1);
EstMdlARIMAX = estimate(MdlARIMAX,y2,'E0',e0,'Y0',...
y0,'X',XARIMAX);

ARIMAX(2,0,1) Model (Gaussian Distribution):

Value      StandardError    TStatistic      PValue
________    _____________    __________    __________

Constant    0.084996      0.064217         1.3236         0.18564
AR{1}        0.83136       0.13634         6.0975      1.0775e-09
AR{2}       -0.45599       0.11788        -3.8683       0.0001096
MA{1}          0.426       0.15753         2.7043       0.0068446
Beta(1)        1.053       0.13685         7.6949      1.4166e-14
Beta(2)      -0.6904       0.19262        -3.5843      0.00033796
Beta(3)      0.45399       0.15352         2.9572       0.0031047
Variance     0.18112      0.028836          6.281      3.3635e-10

Convert the estimated regression model with ARIMA errors EstMdlregARIMA to an ARIMAX model.

ConvertedMdlARIMAX = arima(EstMdlregARIMA,'X',XregARIMA)
ConvertedMdlARIMAX =
arima with properties:

Description: "ARIMAX(2,0,1) Model (Gaussian Distribution)"
Distribution: Name = "Gaussian"
P: 2
D: 0
Q: 1
Constant: 0.087737
AR: {0.830611 -0.454025} at lags [1 2]
SAR: {}
MA: {0.428031} at lag [1]
SMA: {}
Seasonality: 0
Beta: [1 -0.830611 0.454025]
Variance: 0.182313

The estimated ARIMAX model constant is not equal to the ARIMAX model constant converted from the regression model with ARIMA errors. In other words, EstMdlARIMAX.Constant is 0.084996 and ConvertedMdlARIMAX.Constant = 0.087737. The reason for the discrepancy is estimate does not enforce the nonlinear constraint that the arima converter enforces. As a result, the other estimates are close, but not equal.

References

[1] Hyndman, R. J. (2010, October). "The ARIMAX Model Muddle." Rob J. Hyndman. Retrieved May 4, 2017 from https://robjhyndman.com/hyndsight/arimax/.