How to enhance the performance of for-loops and cell-arrays (related to statistical calculations)?

조회 수: 1 (최근 30일)
The following code calculates some performance measures out of different periodes. No error messages occured, but the processing time is very long.
F = 'runoff.txt'; % name of the file
D = 'C:\Users\heute\model\results\model_standalone\'; % absolute or relative path of base directory
S = dir(fullfile(D,'results*'));
X = [S.isdir] & ~ismember({S.name},{'.','..'});
N = {S(X).name};
L = cell(size(N));
C = cell(size(N));
for k = 1:numel(N)
T = fullfile(D,N{k},F);
fid = fopen(T,'rt');
fmt = ['%s',repmat('%f',1,6)];
opt = {'HeaderLines',1,'CollectOutput',true};
Z = textscan(fid,fmt,opt{:});
fclose(fid);
L{k} = Z{1}; % timestamp
C{k} = Z{2}; % data
%
Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array
%
% define the periodes for computing performance measures
sdatelim_neu = [datenum(2013,10,01,00,00,00) datenum(2016,10,01,00,00,00)];
dt = 1/24;
date = sdatelim_neu(1):dt:sdatelim_neu(2);
date_runoff = transpose(date);
%
sdatelim1 = [datenum(2014,05,01,00,00,00) datenum(2014,10,01,00,00,00)];
dt = 1/24;
sdate_sdatelim1 = sdatelim1(1):dt:sdatelim1(2);
%
sdatelim2 = [datenum(2015,05,01,00,00,00) datenum(2015,10,01,00,00,00)];
sdate_sdatelim2 = sdatelim2(1):dt:sdatelim2(2);
%
sdatelim3 = [datenum(2016,05,01,00,00,00) datenum(2016,10,01,00,00,00)];
sdate_sdatelim3 = sdatelim3(1):dt:sdatelim3(2);
%
% loop over the different periodes
for s = 1:length(sdate_sdatelim1);
for a = 1:length(sdate_sdatelim2);
for b = 1:length(sdate_sdatelim3);
j = find(date_runoff >= sdate_sdatelim1(s) & date_runoff < sdate_sdatelim1(k)+dt) & find(date_runoff >= sdate_sdatelim2(a) & date_runoff < sdate_sdatelim2(a)+dt) & find(date_runoff >= sdate_sdatelim3(b) & date_runoff < sdate_sdatelim3(b)+dt);
f_1k = 1-cov(Qs(j) - Qo)/var(Qo); %NSE
f_2k = sqrt(mean((Qs(j) - Qo).^2)); %RMSE
f_3k = abs(mean(Qs(j)- Qo)); %BIAS
%Qo is the observed runoff -> imported from file
%
% write into matrix YA -> for use in further analysis
YA = [f_1k, f_2k, f_3k];
end
end
end
end
As a test case, I ran this code for two inputfiles (each of them has 26280 rows in column 6). In the end however several 1000 input-files should be processed.
How can I reduce the computing time?
or is there an error within the for-loop over the different periods? or is this:
Qs = C{k}(:,6); % define the simulated runoff, as column 6 in each cell array
an inefficient command?
(I use Matlab R2012a)
  댓글 수: 7
Glazio
Glazio 2017년 6월 6일
@Walter Roberson: The code is trying to calculate the Root Mean Square Error, BIAS and NSE for each inputfile (runoff.txt) and should consider only certain periods for calculation.
The goal is a matrix YA which contains the performance measure combination for each runoff-file.
Glazio
Glazio 2017년 6월 6일
@Stephen Cobeldick, thanks for your help.
What exactly do you mean with:
"It does not seem to be necessary store the data from all files, as you only seem to process the data from the current file." ?
In the end, all results should be in YA.

댓글을 달려면 로그인하십시오.

채택된 답변

dpb
dpb 2017년 6월 6일
편집: dpb 2017년 6월 8일
S = 'runoff.txt';
O = 'runoff_observed.txt';
D = 'C:\Users\heute\model\results\model_standalone\';
d = dir(fullfile(D,'results*')); % list of directories
fmtS = ['%{dd.MM.yyyy-HH:mm}D' repmat('%*f',1,5) '%f']; % simulated format string
fmtO = ['%{dd.MM.yyyy HH:mm}D' %f']; % observed format string
L=length(d); % number sudirs
YA=zeros(L,3); % preallocate
for k = 1:L % iterate over subdirs
fid = fopen(fullfile(D,d{k}.name,S),'rt'); % open simulated
Z=textscan(fid,fmtS,'headerlines',2,'collectoutput',1); % read simulated
fclose(fid);
dtS=Z{:,1}; % timestamp simulated (datetime)
Qs=Z{:,2}; % simulated data
fid = fopen(fullfile(D,d.name{k},O),'rt'); % open observed
Z=textscan(fid,fmtO,'headerlines',1,'collectoutput',1); % read observed
fclose(fid);
dtO=Z{:,1}; % timestamp observed (datetime)
Qo=Z{:,2}; % observed data
% define the periods for computing performance measures
yr1=2014; yr2=2016; % years to compute over
output
ix=isbetween(dtO,datenum(yr(1),05,01),datenum(yr(1),10,01)); % first year
for yr=yr1+1:yr2 % subsequent years
ix=ix | isbetween(dtO,datenum(yr,05,01),datenum(yr,10,01));
end
YA(k,:) = [f_1k, f_2k, f_3k];
end
ADDENDUM
Cleaned up to incorporate changes from conversation below excepting for opening the reference file--treat that as need to. Above should then return the L records in the output array.
ERRATUM
NB: Remove the (1) index from year reference yr in the loop to get the subsequent years after first...inadvertently left it in there when copied line.
  댓글 수: 15
Glazio
Glazio 2017년 6월 8일
@dpb: Thanks for your help and patience, now it seems as if the right results are delivered :-)
dpb
dpb 2017년 6월 8일
Well, not if you didn't make the fixup I noted above--it'll run but will process the first year three times as was.
MORAL: ALWAYS debug thoroughly; running w/o error doesn't guarantee correctness!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Time Series Objects에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by