Efficient pairwise logical operation on all combinations of vectors?
조회 수: 4 (최근 30일)
이전 댓글 표시
I have an algorithm I'm trying to implement more efficiently. These are the basic steps:
Let X be an n-by-p matrix. Let vector Xi be the i-th column of X, and vector Xj be the j-th column of X. For every combination of Xi and Xj, I want to test whether each element of Xi is larger than the corresponding element in Xj. Because Xi > Xj is different from Xj > Xi, I am interested in all permutations.
I am working in Matlab, so my current approach is to loop over the p columns of X. On each loop, I select the vector Xi, repeat it p times to form an n-by-p matrix (lets call it XI), and then do XI > X. The most efficient way I can think to write this is:
bsxfun(@gt, X(:,i), X);
I have profiled the code, and this operation is far and away the slowest part.
I know that if I could re-frame this operation as matrix arithmetic, it would be much faster. But if that is possible, I do not know how.
Is it possible do this operation faster? If not this operation specifically, maybe a bit more context. What I really want to do is identify whether elements in Xi are exceptionally large relative to the rest of the sample. The next step is to count the number of true values in the logical matrix resulting from (XI > X). The following are the lines of code of chief interest, written on 3 lines to make profiling easier. X in this case is 37,000 x 10,000
x = bsxfun(@gt, X(:,i), X); % 19.748 s (100 calls)
y = sum(x ,2); % 9.819 s (100 calls)
b = y >= rth; % 0.007 s (100 calls)
I thought it might make a difference to transpose X at the outset, since summing over columns rather than rows might be faster. But expanding over rows instead of columns might be slower... here are the numbers with transposed X. No real difference:
x = bsxfun(@gt, X(i,:), X); % 19.353 s (100 calls)
y = sum(x ,1); % 11.549 s (100 calls)
b = y >= rth; % 0.007 s (100 calls)
It may be that this operation just takes time ... but I have the sense that it should be possible to do this faster. I am not interested in writing my own C-extension, but if there is a way to do with better within the current Matlab toolkit I would love to hear how.
댓글 수: 2
John D'Errico
2018년 5월 28일
Note that this is not valid MATLAB syntax:
x = bsxfun(@gt, X(i,:), X), 2); % 19.353 s (100 calls)
You have a spare set of closing parens there, and an extra ,2, which suggests this is a copy paste error.
채택된 답변
John D'Errico
2018년 5월 28일
Note that a big part of the problem may be that you are just doing a LOT of tests, and grabbing pretty large chunks of RAM to do them.
That bsxfun call you do will create a matrix of size 37000x10000 for each i. Since it is logical, that is not too bad, but still each element is stored as one byte, thus roughly .37 GB to allocate, etc.
You do not say what version of MATLAb you are using. Current releases of MATLAB allow expansion without BSXFUN, so I had to check the time on that instead.
X = rand(37000,10000);
timeit(@() bsxfun(@gt, X(:,1), X))
ans =
0.51082
timeit(@() X(:,1) > X)
ans =
0.59996
That in R2017b. (Rats. I thought I bad gotten around to doing the upgrade. Or, maybe I have, and I am just running the wrong release. Whatever.)
Anyway, big computations can take big time. We get spoiled by fast computers (and yours seems about 2.5 times faster than mine), expecting everything to happen immediately.
You ask about whether this is doable using linear algebra. That depends on your definitions of what is big. So, yes, you MAY be able to do this more quickly, if you are willing to be a little loose on how you do the computations.
[Xdp,Xtags] = sort(ones(1,37000)*X,'descend');
See that this will be IMMENSELY faster than what you are doing, but it uses a different way of looking at which columns are big. So the 10 largest columns, by one measure are:
Xtags(1:10)
ans =
9493 4864 9289 54 8955 1644 3724 4277 8419 608
The measure I chose was simply which columns have the largest dot product with a vector of ones, which I could have written using sum, but writing it as a dot product makes the idea more explicit.
Xdp(1:10)
ans =
18713 18694 18693 18689 18687 18683 18681 18676 18676 18675
You do say that size of the elements is important. (Hey, sometimes size does matter.) Are the largest elements important? If so, then this might be a better measure:
[Xdp,Xtags] = sort(sum(X.^2),'descend');
Really, it depends on how you define size in your context. Which of these two vectors are larger?
x1 = [1 1 10]
x2 = [2 2 2]
So in effect, you MIGHT be able to choose a column p-norm to do your work. If you really do want to count the number of elements that are larger, then using a small value of p will give you a decent measure. Thus:
p = 0.1;
[Xdp,Xtags] = sort(sum(X.^p),'descend');
Xtags(1:10)
ans =
4864 6561 8031 1563 2228 7500 769 9493 2595 9689
So by this measure, column 4864 is large compared to the other columns.
I'm not sure what your real need is here, so it is difficult to know if your problem can be restructured for more speed. Or if it is really important to do exactly what you are doing now, as you are doing it.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!