Efficient SORT function with random tie-breaking rule

조회 수: 4 (최근 30일)
Matteo
Matteo 2012년 3월 1일
편집: Andrea Nardin 2020년 12월 3일
Hi, I'm trying to find if it already exists a function that do the exactly same thing of the built-in "sort" function except for the tie-breaking rule. Indeed, the sort function, in case of tie, maintains the initial order of the vector/matrix. On the contrary, I want to find a sort function that, in case of tie, choose at random the order of the entries. It already exists?? there exists some user function already written??
Just to inform you, i tried to write this:
function [sorted,idx]=randomSort2(m,dim,mode)
dim=2;
[sorted,idx]=sort(m,dim,mode);
diff=sorted(:,2:end)-sorted(:,1:end-1);
u=find(min(diff,[],2)==0);
for i=1:length(u)
k1=randperm(size(m,2));
[sorted(i,:),k2]=sort(m(i,k1),dim,mode);
idx(i,:)=k1(k2);
end
but obviously when there are a lot of ties it takes some seconds to perform. i want somethin more efficient, if it exists..
Thank you, Matteo
  댓글 수: 2
Sean de Wolski
Sean de Wolski 2012년 3월 1일
So you want to randomly select the index in the case of duplicates, since the value will be the same?
Matteo
Matteo 2012년 3월 1일
yeah, because what I actually want to work on is the index matrix returned by the sort function!!!

댓글을 달려면 로그인하십시오.

답변 (4개)

Sean de Wolski
Sean de Wolski 2012년 3월 30일
For some reason this question came back to me when I was doing something completely unrelated the computer the other day. Will the values always be integers? If so, you could add a random value between -0.5 and 0.5 to the matrix, sort(), and then round(). Since the values are integers they will not switch positions with other integers, but will be random against equal integers:
m=[1 3 2;
2 1 2;
2 1 2];
[sorted,idx] = randomSort(m,2,'ascend')
Where randomSort() is this:
function [v idx] = randomSort(m,dim,direction)
[v,idx] = sort(m+(rand(size(m))-0.5),dim,direction);
v = round(v); %back to integers
This will handle matrices, dimensions changes etc.
  댓글 수: 1
Andrea Nardin
Andrea Nardin 2020년 12월 3일
편집: Andrea Nardin 2020년 12월 3일
This is brilliant, it should be the top answer if integer numbers is the case. However it can be generalized to non-integers if you know (or compute each time) the minimum absolute difference among the set of numbers.
Of course if the min difference is infinitely small, then there is a finite precision problem in representing random numbers in such a small interval and so ties could not be differenciated much. But the solution could work for many cases.

댓글을 달려면 로그인하십시오.


Sean de Wolski
Sean de Wolski 2012년 3월 1일
One way:
x = [1 2 2 2 3 4 4 5];
[v,idx] = sort(x);
idxc = cumsum([1 logical(diff(v))]);
idx = cell2mat(accumarray(idxc',idx',[],@(x){x(randperm(numel(x)))}))
I'm sure there's a better way that doesn't use accumarray which requires building and destroying the cell array. Whatever this other method is would scale better to multiple dimensions as well.

Matteo
Matteo 2012년 3월 1일
Thanks Sean,
but I add immediately one comment because maybe I haven't cleared my point. I want to use the "randomSort" on a MATRIX!! that in order to sort INDIPENDENTLY (as sort already can do!) EACH ROW.
e.g.
m=[1 3 2;
2 1 2;
2 1 2];
[sorted,idx]=randomSort(m)
sorted=[1 2 3;
1 2 2;
1 2 2];
idx=[1 3 2;
2 1 3;
2 3 1]
% notice the 2nd and 3rd rows (random tie-breaking rule!!)
then I don't need just an efficient way to sort a vector! but a matrix in the previous way!
Thanks, Matteo
  댓글 수: 2
Sean de Wolski
Sean de Wolski 2012년 3월 1일
or a for-loop with each row of a matrix!
Matteo
Matteo 2012년 3월 2일
this is what I'm trying to avoid, because I will generally have many many rows and I'm scared about the processing time! but I'm giving up about it and maybe accept this solution as a compromise.

댓글을 달려면 로그인하십시오.


Matteo
Matteo 2012년 3월 2일
However I tried your method and my method (using both a for-loop on the rows of the matrix) on a matrix of dimension 100000x20 and I found that my method takes about 3 secs whereas your method about 55 secs!!
Just to know I've measured the processing time for every row which is 0.00003 secs for my method and 0.0005 for your method.
Someone has other ideas about it???

카테고리

Help CenterFile Exchange에서 Creating and Concatenating Matrices에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by