Find double repetitions in a (sorted) array.

Given an array submitted in a form of struct field, containing integer numbers. For convenience, let's assume that the numbers are already sorted in ascending order:
>> s.x
ans =
2
ans =
2
ans =
5
ans =
5
ans =
5
ans =
8
ans =
8
Find indexes of elements, which occur exact 2 times:
ind =
1 2 6 7

댓글 수: 4

Steven Lord
Steven Lord 2017년 10월 20일
Okay, done.
If you're having difficulty solving this problem, why don't you show us what you've tried and whatever warning, error, or unexpected result you received so we may be able to offer some suggestions.
bbb_bbb
bbb_bbb 2017년 10월 20일
편집: bbb_bbb 2017년 10월 20일
I have built a for loop, but I see that it is not optimal for speed, first of all because ind changes size every loop iteration. I think there is more concise way...
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
% Finding double repetitions
d=diff([s.x]); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
Jan
Jan 2017년 10월 21일
편집: Jan 2017년 10월 21일
The iterative growing of arrays is a standard mistake from the view point of efficiency. Simply pre-allocate:
d = diff(x);
j = 0;
ind = zeros(1, numel(d));
indi = 1;
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(indi) = i-1;
ind(indi+1) = i;
indi = indi + 2;
end
j=0;
end
end
ind = ind(1:indi-1);
This does not catch the case, if the last two elements are equal.
This does not catch the case, if the last two elements are equal.
Adding this line repairs this:
if d(end)==0, d(end+1)=1; end
This variant seems to be the fastest.

댓글을 달려면 로그인하십시오.

 채택된 답변

Andrei Bobrov
Andrei Bobrov 2017년 10월 20일
편집: Andrei Bobrov 2017년 10월 23일

2 개 추천

x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
[~,~,g] = unique(x); % OR for last versions of MATLAB: g = findgroups(x)
c = accumarray(g,1:numel(x),[],@(x){x});
out = cell2mat(c(cellfun(@numel,c) == 2));
or
[a,~,g] = unique(x);
out = find(ismember(x,a(accumarray(g,1) == 2)));
or (FIXED)
out = reshape(strfind([1,diff(x(:)')~=0,1],[1 0 1]) + [0;1],[],1);
out = reshape(bsxfun(@plus,strfind([1,diff(x(:)')~=0,1],[1 0 1]),[0;1]),[],1); % for old MATLAB

댓글 수: 13

bbb_bbb
bbb_bbb 2017년 10월 20일
Spasibo. But can't test it - no findgroups in R2015a.
You can replace it by a call to UNIQUE:
[~,~,gId] = unique( x ) ;
and then pass gId instead of the output of FINDGROUPS.
Andrei Bobrov
Andrei Bobrov 2017년 10월 21일
Thanks Cedric!
Spasibo. That works. Problem is that it is very time-consumpting as to compare to other variants.
x=randi([1,1e6],1e5,1); x=sort(x); % creates a 1e5 elements vector
tic
[count,edges] = histcounts(x, 0.5 : max(x)+0.5);
ind=find(ismember(x,find(count==2)));
toc
tic
d=diff(x); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
toc
tic
[~,~,g] = unique(x); % OR for last versions of MATLAB: g = findgroups(x)
c = accumarray(g,1:numel(x),[],@(x){x});
out = cell2mat(c(cellfun(@numel,c) == 2));
toc
Elapsed time is 0.096384 seconds.
Elapsed time is 0.004652 seconds.
Elapsed time is 0.451423 seconds.
Jan
Jan 2017년 10월 21일
cellfun('prodofsize', c) is remarkably faster than cellfun(@numel, c).
@ Andrei Bobrov
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
out = reshape(strfind([1,diff(x)~=0],[1 0 1]) + [0;1],[],1);
Error using horzcat
Dimensions of matrices being concatenated are not consistent.
Hm!
out = reshape(strfind([1,diff(x(:)')~=0],[1 0 1]) + [0;1],[],1);
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
out = reshape(strfind([1,diff(x(:)')~=0],[1 0 1]) + [0;1],[],1);
Error using +
Matrix dimensions must agree.
you use old MATLAB, for you:
out = reshape(bsxfun(@plus,strfind([1,diff(x(:)')~=0],[1 0 1]),[0;1]),[],1);
This does not catch the case, if the last two elements are equal.
x= [2; 2; 4; 4; 4; 6; 6; 8; 8; 8; 8; 9; 9];
out = reshape(bsxfun(@plus,strfind([1,diff(x(:)')~=0],[1 0 1]),[0;1]),[],1)
out =
1
2
6
7
Andrei Bobrov
Andrei Bobrov 2017년 10월 23일
:), fixed!
This is ok. I sligtly modified it for speed. The fastest and concisiest algorithm of all suggested!
out = strfind([true,diff(x')~=0,true],[1 0 1]);
out = reshape([out;out+1],1,[]);
Andrei Bobrov
Andrei Bobrov 2017년 10월 23일
Thank you mister Bbb_bbb.

댓글을 달려면 로그인하십시오.

추가 답변 (4개)

Rik
Rik 2017년 10월 20일
편집: Rik 2017년 10월 20일

0 개 추천

It always pays off to get rid of loops and/or pre-allocating your output.
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
x=[s.x];
%only newer releases: 0.000778 seconds
tic
count=histcounts(x,0.5 : max(x)+0.5);
ind=find(sum(x==find(count==2)'));
toc
%should work on most releases: 0.000628 seconds
tic
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
toc
%your loop: 0.001100 seconds
tic
d=diff(x); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
toc

댓글 수: 8

find(sum(x==find(count==2)'))
Error using ==
Matrix dimensions must agree.
Rik
Rik 2017년 10월 20일
편집: Rik 2017년 10월 20일
Ah yes, forgot to mention: the implicit expanding only works on newer releases (I think it works from 2015b or 2016a).
You do the expanding explicitly like this:
logical_array = repmat(x,length(a),1)==repmat(count',1,length(x));
ind=find(sum(logical_array));
bbb_bbb
bbb_bbb 2017년 10월 20일
What is a?
bbb_bbb
bbb_bbb 2017년 10월 20일
편집: bbb_bbb 2017년 10월 20일
I tryed this variant without cycle, and it proved by far slower. What is the reason for that? I guess ismember function is slow.
% array of random numbers
x=randi([1,1e6],1e5,1); x=sort(x);
tic
[count,edges,bin] = histcounts(x, 0.5 : max(x)+0.5);
ind=find(ismember(x,find(count==2)));
toc
Elapsed time is 0.106030 seconds.
tic
d=diff(x); j=0; ind=[];
for i=1:numel(d)
if d(i)==0
j=j+1;
else
if j==1
ind(end+1)=i-1;
ind(end+1)=i;
end
j=0;
end
end
toc
Elapsed time is 0.004015 seconds.
Rik
Rik 2017년 10월 20일
a was the dummy variable name I used when I jotted down the code instead of count. I edited my previous comment accordingly.
And yes, ismember can be quite slow, but it's flexibility makes it suitable for many jobs. Although in this case, histcounts might be the culprit.
What you need to think about is that some function have an overhead to get started, but will scale much better. I would suggest testing it out on a real example and use the debugging tools to analyse the run time (the 'run and time' button).
bbb_bbb
bbb_bbb 2017년 10월 20일
편집: bbb_bbb 2017년 10월 20일
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
tic
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
toc
Error using ==
Matrix dimensions must agree.
repmat(x,length(count),1) % 22x1 double
repmat(count',1,length(x)) % 2x11 double
That's because x has a different shape:
x1= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x1));
x2=[s.x];
x1 is 11x1 and x2 is 1x11
This variant isn't working at big arrays:
x=randi([1,1e6],1e5,1); x=sort(x)';
count=histcounts(x,0.5 : max(x)+0.5);
count=find(count==2);
ind=find(sum(repmat(x,length(count),1)==repmat(count',1,length(x))));
Error using repmat
Maximum variable size allowed by the program is exceeded.

댓글을 달려면 로그인하십시오.

Image Analyst
Image Analyst 2017년 10월 20일

0 개 추천

You didn't tag it as homework. Is it? This will do it:
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x]
[groupNumber, groupValue] = findgroups(numbers)
counts = histcounts(groupNumber)
ofGroupSize2 = find(counts == 2) % Find those only if they have a length of 2.
values = groupValue(ofGroupSize2)
indexes = find(ismember(numbers, values))

댓글 수: 2

bbb_bbb
bbb_bbb 2017년 10월 20일
편집: bbb_bbb 2017년 10월 21일
No, its no homework - so called "just-for-fun project".
[groupNumber, groupValue] = findgroups(numbers)
Undefined function or variable 'findgroups'.
Matlab 2015a
Image Analyst
Image Analyst 2017년 10월 20일
편집: Image Analyst 2017년 10월 21일
You can use regionprops() instead of findgroups() if you have an old version and have the Image Processing Toolbox. See my separate answer with demo code.

댓글을 달려면 로그인하십시오.

Jan
Jan 2017년 10월 21일

0 개 추천

Your code looks like the input is sorted. The other approaches do not have this limitation. If it is really sorted:
d = [true; diff(x) ~= 0]; % TRUE if values change
b = x(d); % Elements without repetitions
k = find([d', true]); % Indices of changes
n = diff(k);
is2 = find(n==2);
ind4 = reshape([k(is2); k(is2)+1], 1, []);
Code taken from FEX: RunLength.
Image Analyst
Image Analyst 2017년 10월 21일
편집: Image Analyst 2017년 10월 21일

0 개 추천

If you have the Image Processing Toolbox, you can use regionprops():
% Assignment of a struct with a field containing integer numbers
x= [2; 2; 5; 5; 5; 8; 8; 13; 13; 13; 13];
s = struct('x', num2cell(x));
numbers = [s.x] % A labeled "image"
% Find lengths of each run of numbers plus the indexes where they occur.
props = regionprops(numbers, 'Area', 'PixelIdxList')
% Extract from structure into one vector.
allLengths = [props.Area]
% Find those only if they have a length of 2.
ofGroupSize2 = find(allLengths == 2)
% Find indexes of those runs with length 2.
indexes = [props(ofGroupSize2).PixelIdxList]
% Shape into row vector
indexes = reshape(sort(indexes(:)), 1, [])

댓글 수: 1

bbb_bbb
bbb_bbb 2017년 10월 21일
This works, but the variant is the longest (6.2 sec on 1e6 elements vector). The fastest is 0.03 sec.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Structures에 대해 자세히 알아보기

질문:

2017년 10월 20일

댓글:

2017년 10월 23일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by