Fastest way to match elements in two vectors and return indices?

Question

Daniel 2020년 1월 29일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/502663-fastest-way-to-match-elements-in-two-vectors-and-return-indices

댓글: Daniel 2020년 1월 30일

I have the code below to assemble data taken from several files. The outer loop isn't really important. Once in it, it first uses two files to construct a grid of x,y coordinates. Then, in each of the xyf files, there are four columns: x coordinates, y coordinates, u data, and v data. The x,y pairs are predetermined and match those constructed in the grid. However, there isn't always data in the xyf files for every x,y pair, so a simple reshape won't work. For example:

x1 y1 u11 v11

x1 y3 u13 v13

Here, there's no u or v data reported at x1,y2, so it just skips to the next one. The goal is to assemble the u and v data each into an array whose first two dimensions match the dimensions of the grid and whose third dimension is the number of xyf files. I've done this below by finding the index of each x and y coordinate on the grid I construct and then putting each u and v in the right place using those indices. It's slow. I only do this once for each data set, so changing the format of the input data is probably not worth it. It seems there should at least be a way to remove the finds from the loop, but I haven't found a way that preserves the order of the matches. I've attached sample files to use. Can this go faster?

cal = 0.0136; % cm/pix
dt = 0.005; % seconds
vcal = dt/cal; % seconds*pix/cm
folders = {'folder'};
names = {'name'};
for k = 1:length(folders)
    path = ['mypath' folders{k}];
    cd (path);
    pivgridI = importdata('grdUsedI_001.dat',' ',2); 
    pivgridJ = importdata('grdUsedJ_001.dat',' ',2);
    xgrid = pivgridJ.data(1,:); xgrid = [xgrid(2:end),xgrid(end)+32]; % ad hoc fix
    ygrid = pivgridI.data(:,1); 
    n = length(xgrid); m = length(ygrid); % expected size of space array
    [Xq,Yq] = meshgrid(xgrid,ygrid);
    files = dir([path '\xyf*.dat']); % get file names
    L = length(files);
    ui = NaN(m,n,L); vi = ui;
    for i = 1:L
        fid = fopen([path '\' files(i).name]);
        Data = textscan(fid,'%f %f %f %f %f');
        fclose(fid);
        Data = cell2mat(Data);
        x = Data(:,2);
        y = Data(:,3);
        u = Data(:,4);
        v = Data(:,5); 
        for j = 1:length(x)
            [~,xind] = find(Xq == x(j),1);
            [yind,~] = find(Yq == y(j),1);
            ui(yind,xind,i) = u(j);
            vi(yind,xind,i) = v(j);
        end
    end
    save(names{k},'ui','vi')
end

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

per isakson 2020년 1월 29일

편집: per isakson 2020년 1월 29일

Is it possible for you to make a Minimal working example ?

Daniel 2020년 1월 29일

You're right! Sorry about that! The page reloaded and wouldn't come up again because of a "planned outage". I couldn't find the post, so I thought it hadn't worked and posted again.

I believe you have a minimal working example with the files I've attached. If you can make it faster for the first xyf file, then the whole thing will be faster because it just repeats, right?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

the cyclist 2020년 1월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/502663-fastest-way-to-match-elements-in-two-vectors-and-return-indices#answer_412722

I have to admit that I have not dug into your code, but it sounds like the ismember function might be useful.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Daniel 2020년 1월 29일

It looks to me like ismember does not keep every match of the same values, only the first. I need them all.

댓글을 달려면 로그인하십시오.

Answer 2

Guillaume 2020년 1월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/502663-fastest-way-to-match-elements-in-two-vectors-and-return-indices#answer_412737

편집: Guillaume 2020년 1월 29일

MATLAB Online에서 열기

I haven't tried to fully understand your code. A few comments:

Don't cd into directories. Use fullfile to build paths, so instead of

cd (path);
pivgridI = importdata('grdUsedI_001.dat',' ',2); 

use

pivgridI = importdata(fullfile(path, 'grdUsedI_001.dat'),' ',2);

It's safer and faster.

Similarly use fullfile to build paths instead of using string concatenation
I'd recommend more targeted import functions than importdata. In particular, readmatrix or readtable are better overall.

Anyway, to answer your question something like this would be much simpler:

pivGridI = readmatrix('grdUsedI_001.txt', 'NumHeaderLines', 2);
pivGridJ = readmatrix('grdUsedJ_001.txt', 'NumHeaderLines', 2);
%haven't fully understood what you're doing afterward
[Yq, Xq] = ndgrid(pivGridI(:, 1), [pivGridJ(1, 2:end), pivGridJ(1, end)+32]);
ui = nan(size(Xq));
vi = nan(size(Xq));
xyf = readtable('xyf_001_000099.txt')
xyf.Properties.VariableNames = {'row', 'x', 'y', 'u', 'v'}
[found, where] = ismember(xyf{:, {'x', 'y'}}, [Xq(:), Yq(:)], 'rows');
assert(all(found), 'Some coordinates in xyf file are not part of the grid');
ui(where) = xyf.u;
vi(where) = xyf.v;

Note that some locations are duplicated in your xyf file. It's unclear why and it looks like the u and v are the same for the duplicates so it probably doesn't matter. With the above, the last value for each location is the one that will end up in the array.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Guillaume 2020년 1월 30일

MATLAB Online에서 열기

Those repeated values are not duplicates and do matter.

Then you need to explain what should be done with them, since you can only put one value in the corresponding element of u of an v.

" Both of those (I'm pretty sure) only provide the first or last, but not all elements that meet the criteria"

No, the code I wrote do give you the destination (in where) for each xy pair of xyf including duplicated ones. I then used standard indexing to store the corresponding values. When given the same destination for several elements, standard indexing stores the last of these elements in the destination. The code can be easily change to do something else (store the mean, sum, first?). For example, if you wanted to store the sum of the u and v that are at the same x and y, you'd replace the last lines with

[row, col] = sub2ind(size(Xq), where);
ui = accumarray([row, col], xyf.u, size(Xq), @sum, NaN);  %and there' no longer a need to preallocate ui and vi
vi = accumarray([row, col], xyf.v, size(Xq), @sum, NaN);

Daniel 2020년 1월 30일

Let me try to clarify. Say the xyf file looked like this:

x1 y1 u11 v11

x2 y1 u21 v21

x2 y2 u22 v22

Note that there is no entry for x1,y2. I also know ahead of time all the possible values of x and y. For example, they could be (32,32), (32,64), (64,32), and (64,64) here. In this case, there's no data at (32,64). I want to read this data in and create the following two arrays:

u = [u11, NaN; u21, u22] and v = [v11, NaN; v21, v22]

Note that there is a NaN for the element corresponding to x1,y2. Your coding is more advanced than mine is, so is this what yours will do?

댓글을 달려면 로그인하십시오.

Fastest way to match elements in two vectors and return indices?

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Fastest way to match elements in two vectors and return indices?

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기