Most of the run time is used by the line
A(row,col) = A(row, col) - temp;
There is an MLint warning in the editor, that this kind of sparse indexing operations, which change the number of non-zero elements, is slow. Using full matrices instead would create a 27GB matrix, which is beyond my available RAM, such that I cannot compare the speed.
Using the suggested prod will improve the readability of the code (as an auto-indentation also), but has only tiny effects to the speed.
You cann accelerate the code by replacing
A = sparse(Y*N*T, N*Y*(T+1));
A = spalloc(Y*N*T, N*Y*(T+1), 1e7);
With Y=30 this reduces the runtime from 34 to 16 seconds.