ranking (ordering) values with repeats
조회 수: 48 (최근 30일)
이전 댓글 표시
Hello Community,
Im hoping some of you have a clever solution to this problem. Im looking for fast and efficient way to rank (order)a vector of numbers in a particular way when repeated values arise.
To make it simple, suppose I have a row vector:
data = [-1 2 0 -2 0]
I know I can rank them using the 3rd output of "unique":
>> [~,~,rnk] = unique(data)
rnk =
2 4 3 1 3
What I like about this is that it assigns the same rank to the repeated zeros. What I don't like about this is that the top rank is now "4" even though I have 5 values. I would prefer this:
>> rnk = myrank(data)
rnk =
2 5 3 1 3
Ive also played around with the second output of "sort" quite a bit, but since this output produces indicies of the sorted values within the original array, there is no simple way (that I've found) to associate the same rank with repeated values.
Im just wondering if there is something simple that Im missing.
Thanks!
댓글 수: 0
채택된 답변
Oleg Komarov
2012년 3월 23일
If you have the Statistics Toolbox, but it's kinda an overkill:
floor(tiedrank([-1 2 0 -2 0]))
ans =
2 5 3 1 3
Otherwise:
data = [-1 2 0 -2 0];
% Sort data
[srt, idxSrt] = sort(data);
% Find where are the repetitions
idxRepeat = [false diff(srt) == 0];
% Rank with tieds but w/o skipping
rnkNoSkip = cumsum(~idxRepeat);
% Preallocate rank
rnk = 1:numel(data);
% Adjust for tieds (and skip)
rnk(idxRepeat) = rnkNoSkip(idxRepeat);
% Sort back
rnk(idxSrt) = rnk
rnk =
2 5 3 1 3
댓글 수: 4
sunbeam
2013년 3월 6일
There is something wrong here; for
data = [1 -3 -3 2 23 23];
The result is
rnk =
3 1 1 4 5 4
Tommaso Fornaciari
2016년 12월 4일
Hi is there a way to assign equal observations two different subsequent ranks? following on the original question, the output i would need is 2 5 4 1 3 or 2 5 3 1 4
Thank you
추가 답변 (4개)
Raph
2015년 5월 4일
It should also work with sort() and ismember()
data_sorted = sort(data);
[~, rnk] = ismember(data,data_sorted)
댓글 수: 1
Bradley Stiritz
2016년 5월 28일
Very impressive, Raph! Thanks for your contribution. Excellent use of built-in vectorized functions.
sunbeam
2013년 3월 6일
This should work. I couldn't figure out how to do it without a loop, but at least this only loops over the duplicate entries. Someone let me know if you come up with a better way.
function outrank = rankWithDuplicates(data,mode)
% R = rankWithDuplicates(data,mode) ranks the values in the data variable
% according to size, allowing for duplicates. Whereas sort actually
% rearranges the input, and therefore duplicates get assigned different
% indices, rankWithDuplicates will simply output the rank order allowing
% ties for duplicate entries. For example,
%
% rankWithDuplicates([1 1 5 8 8 10])
%
% will output [1 1 3 4 4 6]; and if these entries are shuffled like
%
% rankWithDuplicates([8 1 5 1 10 8])
%
% the output will be [4 1 3 1 6 4].
%
% INPUT: data, a vector of real numbers.
% mode, an optional input which can be 'ascend' or 'descend'
%
% OUTPUT: the rank order of the input data.
%
if nargin==1
mode='ascend';
end
[~,b]=size(data);
if b==1
data=data';
end
% Sort data
[srt, idxSrt] = sort(data,mode);
% Find where are the repetitions and negate
idxRepeat = [false diff(srt) == 0];
% Loop through where there are duplicates and maintain the rank.
% I'm not sure if this is necessary but it's the only way I could get it
% done.
rnk = 1:numel(data);
loopidx=find(idxRepeat>0);
for i=loopidx
rnk(i)=rnk(i-1);
end
% Return order according to original sort
outrank(idxSrt)=rnk;
댓글 수: 0
Jeyamugan T
2017년 4월 7일
I wrote this code for some other purpose but it may useful for this problem.
function [rkList]=arrayRankEx(O)
cO=sort(O);
n=size(O,2);
rkList=zeros(1,n);
in=1;
while(in<=n)
out=1;
co=0;
while(out<=n && in<=n)
if(O(out)==cO(in))
rkList(out)=in;
co=co+1;
end
out=out+1;
end
in=in+co;
end
end
>>[5 7 -2 1 -1 0 0 1 5 3]
ans =
5 7 -2 1 -1 0 0 1 5 3
>> arrayRankEx([5 7 -2 1 -1 0 0 1 5 3])
ans =
8 10 1 5 2 3 3 5 8 7
댓글 수: 0
Benjamin Levy
2017년 11월 16일
Not sure if this is still a 'live' thread, but the code should report these ranks for ascending order: 2.0000 5.0000 3.5000 1.0000 3.5000.
Now, suppose your data set is data = [ 11 20 2 14 15 11 13 20 7 9 1 5 17... 7 5 16 3 5 20 ]; Your answer for ascending order (correcting for ties), using sortrows([ data' ranks ],2), should provide column 1 = data, column 2 = ranks:
1.0000 1.0000
2.0000 2.0000
3.0000 3.0000
5.0000 5.0000
5.0000 5.0000
5.0000 5.0000
7.0000 7.5000
7.0000 7.5000
9.0000 9.0000
11.0000 11.0000
11.0000 11.0000
11.0000 11.0000
13.0000 13.0000
14.0000 14.0000
15.0000 15.0000
16.0000 16.0000
17.0000 17.0000
20.0000 19.0000
20.0000 19.0000
20.0000 19.0000
Note that there are several sections in the sorted data wherein there are consecutive runs of same integers (e.g., ...5 5 7 7 ).
Using your code and my data set, and the same final sort, I have (column 1 data, column 2 ranks):
1 1
2 2
3 3
5 4
5 4
5 4
7 5
11 7
7 7
11 7
9 9
11 10
13 13
20 13
20 13
14 14
15 15
16 16
17 17
20 18
댓글 수: 1
Nataraja M
2018년 3월 26일
Hello Sir I used above command sortrows([ data' ranks ],2) for ranking vectors from maximum to lowest, but facing error like Not enough input arguments. Can you please help me to solve this error Thank you
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!