I know this is probably a novice question, but I am quite a Matlab novice. The while loop in my script begins to run ridiculously slow as the table "nonapattern" increases in size. Is it possible to increase the speed somehow? Thank you.

Question

Mark Bodner 2018년 7월 29일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/412606-i-know-this-is-probably-a-novice-question-but-i-am-quite-a-matlab-novice-the-while-loop-in-my-scr

편집: jonas 2018년 8월 4일

counter=1;
searchsize=254;
patternsize=92378;
j=1;
i=1;
newlist = zeros(100,2);
while counter<patternsize
while i<searchsize
    i
        if isequal(pinellas{i,3},nonapattern{counter,1})
            newlist(j,1)=pinellas(i,1);
            newlist(j,2)=pinellas(i,2);
            j=j+1;
        end
   i=i+1;     
end
counter=counter+1;
i=1;
end

Pattern trajectory is the script which matches the patterns from "Data" with the list in "nonapattern". When "nonapattern" becomes large (e.g. around 90,000 x 2 element table) the script takes days to run. Thanks so much for any suggestions/help to make this run faster.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

jonas 2018년 7월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/412606-i-know-this-is-probably-a-novice-question-but-i-am-quite-a-matlab-novice-the-while-loop-in-my-scr#answer_330815

편집: jonas 2018년 7월 31일

MATLAB Online에서 열기

Looks like the size of the matrix is increasing by each entry. Read about preallocation and preallocation of matrices of unknown size .

Other than that, the original script loops through one cell array, nonapattern, and finds matching strings in a second cell array, data, including duplicates. Some data is then extracted from the matched rows of data. Faster code given below:

Load data

[~,nonapattern]=xlsread('nonapattern.xlsx');
[numdata,data]=xlsread('Data.xlsx');

Find pairs of identical strings in each cell arrays

[C,ia,ib] = intersect(nonapattern,data)

C =

3×1 cell array
    {'SO5 SO6 SOA SOB SOC SOD SOE SOG SOO'}
    {'SO5 SO6 SOA SOB SOD SOE SOG SOH SOO'}
    {'SO5 SO6 SOD SOE SOF SOG SOK SOM SON'}

Next, find duplicates

index=cellfun(@(x)find(ismember(data,x)==1),C,'uniformoutput',false)

index =

 1×3 cell array
    {5×1 double}    {4×1 double}    {2×1 double}

Grab corresponding numerical data from numdata, columns 1 and 2

out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);

out =

3×1 cell array
    {5×2 double}
    {4×2 double}
    {2×2 double}

댓글 수: 11
이전 댓글 9개 표시이전 댓글 9개 숨기기

Mark Bodner 2018년 8월 4일

Solved the problem and the code works great for the data files for which you wrote it. Having a problem though with the indexing part of the code when applying it to other excel files with different dimensions. The first 3 lines work great, but the last two formatting lines have trouble when I try to adapt for files (i.e. "state.xlxs" 41351 x 26; "Pinellas.xlsx" 757107 x 17). It finds the intersections between state and Pinellas and puts them in "D" just fine. But then when I try to find the repeats, "indexP" seems fine except for the first cell which winds up as a 6315992 x 1 double. The last line of code (where I try to format everything grabbing columns 8 and 18) just fails because "Index in position 1 exceeds array bounds (must not exceed 41351)"--the size of the file "state". I guess I just don't understand how these last two formatting lines work. The code as I currently tried to adapt it is

[numdataP,patternsizeP]=xlsread('state.xlsx');

[~,dataP]=xlsread('pinellas.xlsx');

[D,ic,id] = intersect(patternsizeP,dataP)

indexP=cellfun(@(x)find(ismember(dataP,x)==1),D,'uniformoutput',false)

outP=cellfun(@(x)numdataP(x,8:18),indexP,'uniformoutput',false)

Could you shed any light on how this formatting works so that I can generalize the script. Thanks so much once again.

jonas 2018년 8월 4일

편집: jonas 2018년 8월 4일

MATLAB Online에서 열기

I am a bit confused because I don't understand the structure of your new data. Now you are working with 2D cell arrays, which is fine, but what are the dimensions of numdata? Feel free to upload the new data if you want me to take a look.

Anyway, so let's break the code down line by line, using my original notations.

[C,ia,ib] = intersect(nonapattern,data)

You said this works fine, but I suspect there is a problem with the input here. I would take a look at the content of C{1} to make sure it looks OK. The next line of code:

index=cellfun(@(x)find(ismember(data,x)==1),D,'uniformoutput',false)

goes over over each unique cell in D, cell by cell, and finds matches in data. the function ismember outputs a matrix with the same size as data, containing ones where you have matches and zeros otherwise. The find function then takes this matrix and outputs the linear indices of matches, i.e. the ones. It seems C{1} matches 6315992 times, which is not necessarily wrong, but makes me believe there is something sketchy going on with the content of that cell.

out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);

The problem is in this line of code, which only works if both C and data are single column cell arrays. The reason is that the previous line of code outputs linear indices, as opposed to subscripts.

What are linear indices? Assume you have a matrix:

The linear indices basically describe the position in the 2D-array if you stack each column on top of one another to a long 1D-array.

The next line

out=cellfun(@(x)numdata(x,1:2),index,'uniformoutput',false);

breaks down because we are using linear indices to refer to rows.

This can easily be fixed. In fact, the find column can output both linear indices and subscripts if you add two more outputs:

[linear,row,col]=find()

However, I don't understand the structure of your new numdata so I cannot write the new code for you.

댓글을 달려면 로그인하십시오.

I know this is probably a novice question, but I am quite a Matlab novice. The while loop in my script begins to run ridiculously slow as the table "nonapattern" increases in size. Is it possible to increase the speed somehow? Thank you.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 11
이전 댓글 9개 표시이전 댓글 9개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

I know this is probably a novice question, but I am quite a Matlab novice. The while loop in my script begins to run ridiculously slow as the table "nonapattern" increases in size. Is it possible to increase the speed somehow? Thank you.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 11 이전 댓글 9개 표시이전 댓글 9개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 11
이전 댓글 9개 표시이전 댓글 9개 숨기기