Mean distance function upgrade question

Question

0 개 추천

Dear Team,

The below code calculating the mean distance. For a few thousand points (x,y,z) the code is working fine, but when i input values as group1 = 70000 points and group2 = 80000 points the progress is too slow. What should i add/change in the below code to have optimal results ?

data = table2array(readtable("test.xlsx"));
group1 = length(data(~isnan(data(:,1))));
group2 = length(data(~isnan(data(:,5))));
tic
for i=1:group1
    display(i);
    minval = inf;        
    for j=1:group2
        point(i,j) = sqrt((data(j,5)-data(i,1))^2+(data(j,6)-data(i,2))^2+(data(j,7)-data(i,3))^2);
        if point(i,j)<minval
            minval = point(i,j);
        end
    end
    values(i) = minval;
end
avg = mean(values);
toc

Thanks in advance

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Chm 2022년 10월 31일

0 개 추천

Thanks a lot Team!

you are amazing!!

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Torsten 2022년 10월 31일

편집: Torsten 2022년 10월 31일

MATLAB Online에서 열기

1 개 추천

Don't know if you have enough RAM for this. Note that the distance matrix pdist2(group1,group2) will be 70000 x 80000 in your case.

group1 = [1 3 -5; 2 -1 4; 3 4 90];
group2 = [0 4 7; 3 3 -56];
m = mean(min(pdist2(group1,group2).'))
m = 33.7672

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Chm 2022년 10월 31일

편집: Chm 2022년 10월 31일

Thanks a lot Torsten for your prompt reply. I will check it and let you know. I have 32Gb

댓글을 달려면 로그인하십시오.

Answer 3

Jan 2022년 10월 31일

편집: Jan 2022년 11월 1일

MATLAB Online에서 열기

0 개 추천

data = table2array(readtable("test.xlsx"));
% group1 = length(data(~isnan(data(:,1))));  Faster:
group1 = nnz(~isnan(data(:,1)));
group2 = nnz(~isnan(data(:,5)));
tic
values = zeros(group1, 1);  % Pre-allocate
for i = 1:group1
    % Wastes time: display(i);
    % Do you reall need the huge point(i,j) array? If not, collect the data
    % in a scalar:
    minval = inf;        
    for j = 1:group2
        % Avoid the expensive SQRT at searching for the minimum:
        point = (data(j,5)-data(i,1))^2 + ...
                (data(j,6)-data(i,2))^2 + ...
                (data(j,7)-data(i,3))^2;
        if point < minval
            minval = point;
        end
    end
    values(i) = sqrt(minval);  % One SQRT is enough
end
avg = mean(values);
toc

Vectorizing the inner loop is most likely faster:

    point = (data(1:group2,5) - data(i,1))^2 + ...
            (data(1:group2,6) - data(i,2))^2 + ...
            (data(1:group2,7) - data(i,3))^2;
    values(i) = sqrt(min(point));  % One SQRT is enough    

Now avoid creating the submatrices repeatedly:

values = zeros(n, 1);  % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
    point     = sum((A - B(i, :)).^2, 2);
    values(i) = sqrt(min(point));  % One SQRT is enough    
end
avg = mean(values);

Compare this with the nice and clean PDIST method suggested by Torsten.

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Jan 2022년 10월 31일

편집: Jan 2022년 10월 31일

MATLAB Online에서 열기

n = 3e4;
data = rand(n, 7);
tic
values = zeros(n, 1);  % Pre-allocate!
for i = 1:n
    minval = inf;        
    for j = 1:n
        point = (data(j,5)-data(i,1))^2 + ...
                (data(j,6)-data(i,2))^2 + ...
                (data(j,7)-data(i,3))^2;
        if point < minval
            minval = point;
        end
    end
    values(i) = sqrt(minval);  % One SQRT is enough
end
avg = mean(values);
toc
Elapsed time is 5.615727 seconds.
tic
values = zeros(n, 1);  % Pre-allocate!
for i = 1:n
    point = (data(:,5) - data(i,1)).^2 + ...
            (data(:,6) - data(i,2)).^2 + ...
            (data(:,7) - data(i,3)).^2;
    values(i) = sqrt(min(point));  % One SQRT is enough    
end
avg = mean(values);
toc
Elapsed time is 4.345840 seconds.
tic
values = zeros(n, 1);  % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
    point     = sum((A - B(i, :)).^2, 2);
    values(i) = sqrt(min(point));  % One SQRT is enough    
end
avg = mean(values);
toc
Elapsed time is 2.549547 seconds.
tic
m = mean(min(pdist2(data(:, 5:7), data(:, 1:3))));
toc
Elapsed time is 3.068626 seconds.

Please check the timings by your own. I see strange effects in the forum currently.

Torsten 2022년 10월 31일

@Jan

Compare this with the nice and clean PDIST method suggested by Torsten.

Too memory-intensive if the goal are only the row minima.

I think your second suggestion is a good compromise.

Jan 2022년 11월 1일

MATLAB Online에서 열기

Locally in my R2018b installation this is the fastest:

S  = 0;
a5 = data(:, 5);
a6 = data(:, 6);
a7 = data(:, 7);
for i = 1:n   % Faster with PARFOR!
    p = (a5 - data(i, 1)).^2 + ...
        (a6 - data(i, 2)).^2 + ...
        (a7 - data(i, 3)).^2;
    S = S + sqrt(min(p));    
end
avg = S / n;

댓글을 달려면 로그인하십시오.

Mean distance function upgrade question

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

태그

Community Treasure Hunt

Mean distance function upgrade question

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기