vecm model for high frequency trading

Question

Niccolò Ghionzoli 2022년 1월 31일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1639790-vecm-model-for-high-frequency-trading

답변: Vinayak 2024년 2월 9일

microsoft simulation level 5.m

Good morning, I am a student and I have to analyse a dataset based on LOBSTER data, Microsoft level 5, 21st June 2012. I hope that someone in the community can help me because I don't know how to go on.

https://lobsterdata.com/info/DataSamples.php

I need to estimate a vecm model, but the function that I am using does not work at all.

https://it.mathworks.com/help/econ/modeling-the-united-states-economy.html

Any suggestions to improve my code? Please find attached the code I am developing.

In addition, I report a fact which might be linked to the problem, which might help to solve the problem.

Last saturday evening I tried again to run the code and after trying to manipulate the input data in order to transform data_vector in a sort of time-varying vector, I got some similar results in H1 and H2 Johansen test (rejection up to 7 for H2 and rejection up to 8 for H1), and the Likelihood ratio test worked. However, the vecm continued to return errors and I modified something, but now H1 rejects only up to 2 as before saturday evening and the LR test does not work at all.

The problem lies in this part of the code. Do not consider the section related to ADF and KPSS test. Please find below the faulty code.

data_vector = [log_book_1_ask log_book_1_bid lev_1_ask lev_2_ask lev_3_ask lev_1_bid lev_2_bid lev_3_bid BUY_t SELL_t];

P = 15 % number of lags

[h,pValue,stat,cValue,mleH2] = jcitest(data_vector, 'lags', P-1, 'Model', 'H2');

[h,pValue,stat,cValue,mleH1] = jcitest(data_vector, 'lags', P-1, 'Model', 'H1');

%the likelihood ratio test

r = 7; % Cointegrating rank

uLogL = mleH2.r7.rLL; % Loglikelihood of the unrestricted H2 model for r = 7

rLogL = mleH1.r7.rLL; % Loglikelihood of the restricted H1 model for r = 7

[h,pValue,stat,cValue] = lratiotest(uLogL, rLogL, r);

%higher value for uLogL

% %create the VEC model object

time_message = mess(:,1)

my_time = time_message(1:end)

%

[Mdl,se] = estimate(vecm(size(data_vector,2),r,P-1), data_vector, 'Model', 'H2');

toFit = vecm(Mdl.NumSeries, Mdl.Rank, Mdl.P - 1);

toFit.Constant(abs(Mdl.Constant ./ se.Constant) < 2) = 0;

toFit.ShortRun{1}(abs(Mdl.ShortRun{1} ./ se.ShortRun{1}) < 2) = 0;

toFit.Adjustment(abs(Mdl.Adjustment ./ se.Adjustment) < 2) = 0;

Fit = estimate(toFit, data_vector, 'Model', 'H2');

B = [Fit.Cointegration ; Fit.CointegrationConstant' ; Fit.CointegrationTrend'];

figure

plot(my_time, [data_vector ones(size(data_vector,1),1) (-(Fit.P - 1):(size(data_vector,1) - Fit.P))'] * B)

title('Cointegrating Relations')

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Niccolò Ghionzoli 2022년 1월 31일

Warning: Rank deficient, rank = 126, tol = 5.417920e-08.

> In vecm/estimate>johansen (line 1295)

In vecm/estimate (line 643)

Warning: Rank deficient, rank = 126, tol = 5.417920e-08.

> In vecm/estimate>johansen (line 1296)

In vecm/estimate (line 643)

Warning: Rank deficient, rank = 126, tol = 5.417920e-08.

> In vecm/estimate>johansen (line 1342)

In vecm/estimate (line 643)

Out of memory.

Error in varm/estimate (line 417)

D{t-P} = Z(:,solve).*W(t);

Error in vecm/estimate (line 886)

[MDL,sigmaVARX,logL,residuals,errorCovarianceBlocks] = estimate(VAR, dY, 'X', [Y1(P+1:end,:) X],

'MaxIterations', maxIterations);

Related documentation

The errors which I got by running the above function.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Vinayak 2024년 2월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1639790-vecm-model-for-high-frequency-trading#answer_1405436

MATLAB Online에서 열기

Hi Niccollo

The code you shared is designed to analyze intraday trading data for the Microsoft stock (MSFT) obtained from LOBSTER (Limit Order Book System Theoretical and Empirical Results) data files. The rank deficiency warning suggests lack of independent variables, you should focus on removing redundant variables and reducing dimensionality:

Variable Selection: You can identify redundancy based on correlation matrix and select the variables you want to consider based on your requirements.

correlation_matrix = corr(data_vector);     
highly_correlated_pairs = find(abs(correlation_matrix) > 0.9 & eye(size(correlation_matrix)) == 0);     
variables_to_remove = unique(highly_correlated_pairs);     
data_vector(:, variables_to_remove) = [];

2. Perform Principal Component Analysis to reduce dimensionality in case you still have large number of variables:

coeff = pca(data_vector);
explained_variance_ratio = cumsum(var(data_vector) / sum(var(data_vector)));
num_components_to_keep = find(explained_variance_ratio > 0.95, 1); % Keep 95% of variance
data_vector_pca = data_vector * coeff(:, 1:num_components_to_keep);

The out of memory error can happen due to large datasets, it may eradicate based on reduced variables, but you may also consider clearing memory and processing data in batches to keep the memory available.

% Define batch size
batch_size = 1000; % Adjust based on your memory constraints
% Determine the number of batches
num_batches = ceil(size(data_vector, 1) / batch_size);
% Perform VECM estimation in batches
for i = 1:num_batches
    % Select data for the current batch
    start_idx = (i - 1) * batch_size + 1;
    end_idx = min(i * batch_size, size(data_vector, 1));
    data_batch = data_vector(start_idx:end_idx, :);
    % Perform VECM estimation on the current batch
    P = 15; % Number of lags
    r = 2; % Number of cointegrating relationships (example value)
    toFit = vecm(size(data_batch, 2), r, P-1);
    Fit = estimate(toFit, data_batch, 'Model', 'H2');
    % Additional processing or analysis on the current batch if needed
    % Display progress
    fprintf('Processed batch %d/%d\n', i, num_batches);
end