Finding the first index of a row where all integers of a defined list have occurred

조회 수: 4 (최근 30일)
Steve 2023년 1월 27일
댓글: Steve 2023년 1월 28일
I have a 100x100 list values (1,2,3,4) selected at random.
x=randi([1,4],100,100);
Reading left to right, I want to find the first index in each row where all the numbers have appeared (1,2,3,4), irrespecitve of order or repeats.
For example, in the random data (4,2,3,2,4,4,1,1,2,4,2,3...), all numbers have appeared at index 7.
Does this require a loop?

댓글을 달려면 로그인하십시오.

채택된 답변

John D'Errico 2023년 1월 27일
Given a vector, can you simply find the first index where all have occurred? Just use ismember.
x=randi([1,4],1,100)
x = 1×100
4 1 1 4 2 4 2 4 1 1 4 1 4 4 4 4 4 3 4 3 4 4 4 4 3 4 1 2 4 1
[~,locs] = ismember(1:4,x)
locs = 1×4
2 5 18 1
max(locs)
ans = 18
So max(locs) is the index at which point ALL of those elements have been seen. No loop was needed for that part. You could now just use a loop on the rows of a matrix with multiple rows.
Can you do this without a loop? Well, yes. The way to do it would require tools like mat2cell, and then cellfun. That is, you could convert the array to a cell array of vectors. Then use cellfun to apply the ismember operation (written as an m-file) to each vector.
Is it worth it, to write a code that you don't really understand, and will do what you want no more efficiently than a simple loop? A complicated looking line of code is not better than a simple loop. Remember, you need to understand that code in order to debug and maintain it in the future.
댓글 수: 3이전 댓글 1개 표시이전 댓글 1개 숨기기
John D'Errico 2023년 1월 28일
편집: John D'Errico 2023년 1월 28일
Lol. No problem. There are probably other ways I could have done it too, but lets take it apart on an example vector.
X = [2 1 2 5 1 5 4 3 2 1];
So the first occurrence of a 1 is the second element. Right? 2 happens at the first element. 3 happens where? The 8th element. Finally, where does the 4 first happen? At element 7.
Now, lets read the help for ismember.
help ismember
ISMEMBER True for set member. LIA = ISMEMBER(A,B) for arrays A and B returns an array of the same size as A containing true where the elements of A are in B and false otherwise. LIA = ISMEMBER(A,B,'rows') for matrices A and B with the same number of columns, returns a vector containing true where the rows of A are also rows of B and false otherwise. [LIA,LOCB] = ISMEMBER(A,B) also returns an array LOCB containing the lowest absolute index in B for each element in A which is a member of B and 0 if there is no such index. [LIA,LOCB] = ISMEMBER(A,B,'rows') also returns a vector LOCB containing the lowest absolute index in B for each row in A which is a member of B and 0 if there is no such index. The behavior of ISMEMBER has changed. This includes: - occurrence of indices in LOCB switched from highest to lowest - tighter restrictions on combinations of classes If this change in behavior has adversely affected your code, you may preserve the previous behavior with: [LIA,LOCB] = ISMEMBER(A,B,'legacy') [LIA,LOCB] = ISMEMBER(A,B,'rows','legacy') Examples: a = [9 9 8 8 7 7 7 6 6 6 5 5 4 4 2 1 1 1] b = [1 1 1 3 3 3 3 3 4 4 4 4 4 9 9 9] [lia1,locb1] = ismember(a,b) % returns lia1 = [1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1] locb1 = [14 14 0 0 0 0 0 0 0 0 0 0 9 9 0 1 1 1] [lia,locb] = ismember([1 NaN 2 3],[3 4 NaN 1]) % NaNs compare as not equal, so this returns lia = [1 0 0 1], locb = [4 0 0 1] Class support for inputs A and B, where A and B must be of the same class unless stated otherwise: - logical, char, all numeric classes (may combine with double arrays) - cell arrays of strings (may combine with char arrays) -- 'rows' option is not supported for cell arrays - objects with methods SORT (SORTROWS for the 'rows' option), EQ and NE -- including heterogeneous arrays derived from the same root class See also ISMEMBERTOL, INTERSECT, UNION, UNIQUE, UNIQUETOL, SETDIFF, SETXOR, SORT, SORTROWS. Documentation for ismember doc ismember Other uses of ismember categorical/ismember datetime/ismember mtree/ismember cell/ismember double/ismember sym/ismember codistributed/ismember duration/ismember tabular/ismember dataset/ismember gpuArray/ismember tall/ismember
What does the second output argument tell us? It indcates the first occurrence of each element. So we have
[~,locs] = ismember([1 2 3 4],X)
locs = 1×4
2 1 8 7
Is that not exactly where I said those elements first appear?
Now, what happens when you use max on a vector? It tells you the LARGEST element in the vector.
max(locs)
ans = 8
Now go back and think about the goal here. We want to know at what point ALL of the elements in [1 2 3 4] have been found in the original vector X. No sooner, no later.
So max(locs) yields exactly what you wanted. Does it work only for a vector? Well yes. But as has been said, there is nothing wrong with a loop over the rows of your matrix. Apply ismember as I have just done for each row of your array.
Loops are not overtly inefficient. Yes, avoid double and more deeply nested loops, if you can. But good code wants to be readable code. And you need to maintain your code. If you can easily read that code, then that is important. With experience the code we write tends to have fewer loops. But only when a loop is seen to be a significant bottleneck to performance should you be worried about those loops.
Steve 2023년 1월 28일
Thanks. I see, the max index of the first occurance is the last number to appear, and therefore all number have appeared at this point. Nice solution. Cheers!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

the cyclist 2023년 1월 27일
I 100% agree with @John D'Errico's take on this, which is that any non-loop solution here is (probably) going to be sufficiently obfuscated that "future you" will regret avoiding the loops.
One reason is that there is no "natural" way to avoid loops is that find and ismember don't really have "row-wise" implementations (to my knowledge) that do what you are trying to do. (There is a bit of discussion of this in the forum.)
Here is an example of an obfuscated solution:
% The input
rng default
N = 4;
x=randi([1,N],100,100);
% The algorithm
% (Technically, I supposed this is a for loop, but not over your rows or columns)
loc = zeros(height(x),N);
for val = 1:N
[~,loc(:,val)] = max(x==val,[],2);
end
output = max(loc,[],2)
output = 100×1
5 5 5 5 12 15 12 8 6 5
Personally, I would use the loop, with John's solution.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by