speed up for loop

Question

Mina Mino 2023년 4월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop

댓글: 埃博拉酱 2023년 4월 9일

Hi everone. I have a problem with my code which was written in the following form. it take a long time and it is too time consuming.

In this code, I need to read 700 .hgt files with the size of 3600*3600 and make distincts 3*3 matrix from each of them.

I woulde be thankful if anyone could help me. Thanks in advance for your attenuation and help.

clc
clear all
close all
format long g
fid=fopen('list2.txt','r');
if fid==-1
    disp('there is an error')
else
end
S = textscan(fid,'%s');
fclose(fid);
i=1;
fnames= S{i,i};
tic
NA=[];
for fiile=1:700
    varargout = readhgt(fnames{fiile});
    LAT=varargout.lat;LAT(end)=[];
    LON=varargout.lon;LON(end)=[];
    Z=varargout.z;Z(:,end)=[];
    Z(end,:)=[];
    %        Z_vec=reshape(Z,[12967201,1]);
    ROUGH=[];
    for row=1:3:3597
        for col=1:3:3597
            A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
            Dif=(double(A)-mean(mean(A))).^2;
            rough=sqrt(mean(mean(Dif)));
            ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
        end 
    end 
    NA=[NA;ROUGH];
end 
toc

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

actually the following part takes the most of time. in this part I am exploiting dependent 3*3 matrixs from a file with 3600*3600 elements.

for row=1:3:3597
    for col=1:3:3597
        A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
        Dif=(double(A)-mean(mean(A))).^2;
        rough=sqrt(mean(mean(Dif)));
        ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
    end 
end 

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

埃博拉酱 2023년 4월 7일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211609

편집: 埃博拉酱 2023년 4월 7일

MATLAB Online에서 열기

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded. In contrast, eliminating for loops with vectorized methods can effectively improve performance.

fid=fopen('list2.txt');
if fid==-1
	error('there is an error');
end
S = textscan(fid,'%s');
fclose(fid);
fnames= S{1};
tic
NA=cell(700,1);
for file=1:700
	varargout = readhgt(fnames{file});
	Z=varargout.z(1:3600,1:3600);
	Z=std(reshape(Z',3,size(Z,1)/3,3,size(Z,2)/3),1,[1,3]);
	[LAT,LON]=meshgrid(mean(reshape(varargout.lat(1:3600),3,[]),1),mean(reshape(varargout.lon(1:3600),3,[]),1));
	NA{file}=[LAT(:),LON(:),Z(:)];
end
NA=vertcat(NA{:});
toc

Above I assume your files all contain 3601×3601 Z data. You can do more corrections and optimizations with your real data.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Mina Mino 2023년 4월 7일

편집: Walter Roberson 2023년 4월 8일

MATLAB Online에서 열기

thank for your nice help. it really work for my code. I have the same problem with the following for. it is too time consuming. could you please help me with this as well?thank in advance for your consideration.

%%
% begining the code%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FT=[];step=18/110;
parfor  d=1:12*10^(6) 
    isamap_36=find(DOY==day_of_year(d));
    isamap_9=find(DOY_R==day_of_year(d));
    
    Lon_d=Lon(isamap_36);
    Lat_d=Lat(isamap_36);
    VWC_SMAP_d=VWC(isamap_36);
    roughness_d=roughness(isamap_36);
    TM_d=TM(isamap_36);
    SM_d=SM_36(isamap_36);
    
    LAT_R_d=LAT_R(isamap_9);
    LON_R_d= LON_R(isamap_9);
    VWC_R_9_d=VWC_R(isamap_9);  
    SM_R_9_d=SM_R(isamap_9);
    ROUGH_R_9_d=ROUGH_R(isamap_9);
    
    if (~isempty(isamap_36) && ~isempty(isamap_9))
        
        lon_smap_9_index= find(LON_R_d <=lon_cy(d)+step & LON_R_d>=lon_cy(d)-step);
        lat_smap_9_index= find(LAT_R_d <=lat_cy(d)+step & LAT_R_d>=lat_cy(d)-step);
        intsec_smap_9=intersect(lon_smap_9_index,lat_smap_9_index); 
        VWC_SMAP_g_9=VWC_R_9_d(intsec_smap_9);roughness_g_9=ROUGH_R_9_d(intsec_smap_9);SM_g_9=SM_R_9_d(intsec_smap_9);
        
        lon_smap_36_index= find(Lat_d <=lat_cy (d)+step & Lat_d>=lat_cy(d)-step);
        lat_smap_36_index= find(Lon_d <=lon_cy(d)+step & Lon_d>=lon_cy(d)-step);
        intsec_smap_36=intersect(lon_smap_36_index,lat_smap_36_index); 
        
        VWC_SMAP_g_36=VWC_SMAP_d(intsec_smap_36);roughness_g_36=roughness_d(intsec_smap_36);TM_d_36=TM_d(intsec_smap_36) ;SM_g_36=SM_d(intsec_smap_36);
        
        point_1=[lat_cy(d) lon_cy(d)];
        ptCloud_1=[latitu longi];
        idx_land  = knnsearch(ptCloud_1,point_1);
        
        ptCloud_2=[clay(:,1) clay(:,2)];
        idx_clay  = knnsearch(ptCloud_2,point_1);
        
        
        if (~isempty(intsec_smap_36)&& ~isempty(intsec_smap_9)) 
            
            FT=[day_of_year(d) lon_cy(d) lat_cy(d) inc_cy(d) ref_cy(d) mean(VWC_SMAP_g_9) mean(roughness_g_9) mean(SM_g_9) mean(VWC_SMAP_g_36) mean(roughness_g_36) mean(TM_d_36) mean(SM_g_36) clay(idx_clay,3) land_cover(idx_land)];
            S=FT;
            parsave(sprintf('output%d.mat', d),S );
        end
    end
end

Walter Roberson 2023년 4월 8일

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded

Disk controllers can have multiple DMA lanes. The operating system might be single threaded in disk I/O at some level, but it can launch a DMA and then return -- some multiple DMA can be happening simultaneously. The controllers have room for several commands and buffers, and for spinning drives will actively re-order commands to improve performance. Meanwhile, each controller can be handling multiple drives, each of which can support queuing of multiple commands. Each drive can be DMA'ing with its respective controller. The connection to the individual drives does not necessarily support multiple DMA lanes.

So one connection at a time between a drive and a controller, but the controller is dealing with multiple drives simultaneously, and is quite possibly handling multiple communications simultaneously -- DMA'ing into multiple buffers getting ready to flip them to the O/S for example.

Does having multiple drives per controller channel help? Maybe. In a RAID kind of scenario, especially if you have mirrored drives then multiple drives on the channel can lead to opportunities to dispatch an I/O operation to whichever drive will be ready for it first. But on one channel, you might not be able to have multiple drives transferring data in the same direction.

This leads to the general principle that it is typically safe and productive to have one drive per controller channel, and multiple controllers, that the infrastructure can be doing simultaneous I/O in such circumstances.

For any particular drive, the general principle is that having 2 to 3 requests queued at the same time is often optimal, allowing the controller to reorder the requests according to what is spinning by.

On high performance systems, it might be practical to have one I/O stream per drive head .

Beyond that.. for any one file, unless you go to a lot of trouble, you tend to get conflict trying to do multiple I/O with the same file, as the same head might be needed (because it is on the same track). But different files tend to be in different tracks.

... which is to say that having a modest number of different files per drive being accessed by parallel workers can improve performance, and as you spread the load out over different drives you can definitely improve performance. The single threading is not as bad of a restrictions as you might think.

(Single threading can be necessary in order to provide "atomic" operations, such as the fact that operating systems must guarantee that file renames must at all user-accessible times either give the old name or the new name, that there is no user-accessible case in which both names or neither name is accessible.)

埃博拉酱 2023년 4월 9일

@Walter Roberson For multiple files uniformly distributed across multiple drives, parallel I/O can theoretically improve performance, but this may require elaborated load balancing algorithms for these drives, which is very tricky. Judging from the code shown by OP, this is probably far beyond his programming ability.

댓글을 달려면 로그인하십시오.

Answer 2

chicken vector 2023년 4월 7일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211529

MATLAB Online에서 열기

Try parallel computing:

% Get CPU's numebr of cores:
[~, numberOfCores] = evalc('feature(''numcores'')');
% Start parallel pool:
parpool(numberOfCores)
% Parallelise <for> loop:
parfor file = 1 : 700
    
   % dostuff 
end

Just be carefull and read the documentation about how to save restults from parfor iteratively, as you need to pre-allocate the output variable and you will have some other constraints.

I suggest you to read the documentation and see some examples.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

chicken vector 2023년 4월 7일

편집: chicken vector 2023년 4월 7일

MATLAB Online에서 열기

The first advice that I can give you is, as Matlab suggests you, initialise your variable ROUGH before the for loop.

This is particularly important when dealing with big dataset because at each loop iteration the size of ROUGH is changed and the RAM of your computer needs to re-allocate the data inside it to have the bits representing ROUGH close to each other.

Now imagine how time expensive this could be when you do it (3597/3)^2 times, when you could save this time by "reserving" your RAM spots before the loop by substituting:

ROUGH = [];

With:

ROUGH = zeros(3597, 3);

The second advice is to use vectorisation as much as possible because it is much faster than using for loops.

Vectorisation can be seen as a sort of parallel computation rather than in sequence.

Consider as an example that I want to obtain the element-wise difference between two matrices. The following 3 methods obtain the same result, but performances are quite different:

N = 500;
x = rand(N);
y = rand(N);
% For loop without initial allocation:
tic
z = [];
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T1 = toc
% For loop with initial allocation:
tic
z = zeros(N);
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T2 = toc
% Vectorisation:
tic
z = x - y;
T3 = toc

On my computer I obtain:

T1 = 0.0147
T2 = 0.0064
T3 = 0.0005

To exploit vectorisation in your code, try to have a look at reshape.

First thing that comes to my mind is to re-organise your Z, LAT, LON variables into matrices of size [3, (3597^2)/3] so that you can already vectorise along one dimension.

Nontheless this might be a bit complex so I suggest you to begin with the initialisation of ROUGH.

In my code saves half the computational time with a 500x500 matrix which only has 2 % of the elements you have in a 3597x3597 matrix, since it scales with the square of the dimension.

TLDR:

Try this:

ROUGH = zeros(3597,3);
select = (0:2);
current_rough_row = 0;
for row = 1 : 3 : N
    for col = 1 : 3 : N
        current_rough_row = current_rough_row + 1;
        A = Z(row + select, col + select);
        Dif = (A - mean(A,'all')).^2;
        rough = sqrt(mean(Dif,'all'));
        ROUGH(current_rough_row,:) = [mean(LAT(row + select)) mean(LON(col + select)) rough];
    end
end

And do the same with NA:

NA = zeros(3597*700, 3); % Instead of: NA = [];
NA[700*(fiile-1)+1 : 700*fiile] = ROUGH; % Instead of NA = [NA;ROUGH];

Mina Mino 2023년 4월 7일

thanks for you response and help.

댓글을 달려면 로그인하십시오.

speed up for loop

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

speed up for loop

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기