Using histcounts to determine loose data mode
이전 댓글 표시
As a form of filtering, I'm using histcounts to grab something akin to the mode of a data set. The idea being, I lean on histcounts automatic binning algorithm to perform the initial data grouping, then resample the data so as to compress all non-zero-count adjacent bins into single bins. Finally, take the edges of the highest-count bin from this second grouping and use those for other bits of data processing.
data = [randi([0,8000],1,20),randi([140000,260000],228)];
(xCounts,xEdges) = hiscounts(data);
t1 = find(xCounts); t2 = diff([0,diff(t1)==1,0]); %Finds the gaps between populated bin groups
t3 = t1(t2>0); t4 = t1(t2<0); %Starting and ending indecis of bin groups
and I'm stuck here. I know the corresponding indecis of t3 & t4 represent the grouping indecis of xCounts (e.g. group 1 is xCounts(t3(1):t4(1))), but I can figure out how to get a properly vectorized version of sum(xCounts(t3:t4)). The loop version is simple:
xCountsNew = zeros(1,numel(t3))
for i=1:numel(t3)
xCountsNew(i) = sum(xCounts(t3(i):t4(i)))
end
but I'm trying to improve my vectorization/minimize loops.
So there's really three questions here:
1) Is this a decent way to get a loose mode of a data set?
2) How can I vectorize the above for loop?
3) Should I vectorize the above for loop? I have learned that for loops are generally faster than arrayfun calls, but I feel like there's a way to vectorize the loop without using arrayfun or similar.
댓글 수: 1
Gabriel Stanley
2023년 3월 23일
편집: Gabriel Stanley
2023년 3월 23일
답변 (2개)
I am not certain what you want to do.
The histcounts function has a third output bin that will index into the elements that were assigned to a particular bin counts bin.
x = randn(1,25)
[xCounts,xEdges,Bin] = histcounts(x,7)
[~,idx] = max(xCounts)
AssignedToLargestBin = x(Bin == idx)
BinsIdx = ismember(Bin,idx+[-1 0 1])
MaxAdjacentBins = x(BinsIdx) % Return Elements Of 'x' From Largest & Two Adjacent Bins
You can of course set ‘idx’ to be whatever you like, and this can be straightforward if there are more than one index, as illustrated here.
.
I think that you have a case of the XY problem here. Namely, you are asking about your solution to a particular problem, but I suspect there is a more direct way to solve your actual problem.
I'm guessing here, but it seems that you have a data sample, and you want to estimate where the maximal density of that sample is. Is that right?
If that is right, then you either
- Know the functional form of the underlying distribution, OR
- You do not
Again, I'm guessing, but it seems like you don't.
If both of my guesses are correct, then I would use the ksdensity function to make an empirical estimate of the underlying continuous distribution, and see where the maximum is (using a sufficiently fine grid).
rng default
data = 0.5 + 0.1*randn(1,300); % Using randn instead of rand, so that there is truly a mode
xi = 0 : 0.005 : 1;
data_pdf = ksdensity(data,xi);
figure
hold on
histogram(data,"Normalization","pdf")
plot(xi,data_pdf)
[maxPdf,indexToMax] = max(data_pdf);
xiOfMax = xi(indexToMax)
Of course, that's a lot of guesswork on my part. But it is typically better to tell us the problem you are trying to solve, in addition to your method.
카테고리
도움말 센터 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
