Using histcounts to determine loose data mode

Question

Gabriel Stanley 2023년 3월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1933600-using-histcounts-to-determine-loose-data-mode

편집: Gabriel Stanley 2023년 3월 23일

As a form of filtering, I'm using histcounts to grab something akin to the mode of a data set. The idea being, I lean on histcounts automatic binning algorithm to perform the initial data grouping, then resample the data so as to compress all non-zero-count adjacent bins into single bins. Finally, take the edges of the highest-count bin from this second grouping and use those for other bits of data processing.

data = [randi([0,8000],1,20),randi([140000,260000],228)];
(xCounts,xEdges) = hiscounts(data);
t1 = find(xCounts); t2 = diff([0,diff(t1)==1,0]); %Finds the gaps between populated bin groups
t3 = t1(t2>0); t4 = t1(t2<0); %Starting and ending indecis of bin groups

and I'm stuck here. I know the corresponding indecis of t3 & t4 represent the grouping indecis of xCounts (e.g. group 1 is xCounts(t3(1):t4(1))), but I can figure out how to get a properly vectorized version of sum(xCounts(t3:t4)). The loop version is simple:

xCountsNew = zeros(1,numel(t3))
for i=1:numel(t3)
    xCountsNew(i) = sum(xCounts(t3(i):t4(i)))
end

but I'm trying to improve my vectorization/minimize loops.

So there's really three questions here:

1) Is this a decent way to get a loose mode of a data set?

2) How can I vectorize the above for loop?

3) Should I vectorize the above for loop? I have learned that for loops are generally faster than arrayfun calls, but I feel like there's a way to vectorize the loop without using arrayfun or similar.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Gabriel Stanley 2023년 3월 23일

편집: Gabriel Stanley 2023년 3월 23일

First, My apologies to you both for the delayed response (currently working two seperate efforts on different networks).

I think I just failed to emphasize the filtering part of my basic question. I've revised the data set in the initial problem to more accurately represent the kind of data I'll be getting, but to emphazise: for any given data set, across many data sets, I know that most of the data should fall into a single range (what that range is will be different between data sets), but with measurement tolerances some spurious data will have gotten through my first filter and be in other ranges. Thus my idea for a second filter is to lean on histcounts to give a low-computational-cost estimate of the data groups, and pick the highest-count group (where each group is defined as a contiguous set of hist bins) as the valid data, rather than junk.

I'm currently trying out each of your solutions to see which gets me closer to what I'm actually going for.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Star Strider 2023년 3월 22일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1933600-using-histcounts-to-determine-loose-data-mode#answer_1198660

MATLAB Online에서 열기

I am not certain what you want to do.

The histcounts function has a third output bin that will index into the elements that were assigned to a particular bin counts bin.

x = randn(1,25)
x = 1×25
    0.4748    0.0557   -0.4042    1.5613    0.5029    0.0587    0.3570   -0.3454    0.6755   -1.3716    1.1648   -0.1539   -0.8742    0.7570    0.0337   -0.0293    1.1485   -0.7709   -0.2480    0.2435    0.4373    0.4816   -0.2855   -1.2080    0.8539
[xCounts,xEdges,Bin] = histcounts(x,7)
xCounts = 1×7
     2     2     4     5     7     3     2
xEdges = 1×8
   -1.6000   -1.1400   -0.6800   -0.2200    0.2400    0.7000    1.1600    1.6200
Bin = 1×25
     5     4     3     7     5     4     5     3     5     1     7     4     2     6     4     4     6     2     3     5     5     5     3     1     6
[~,idx] = max(xCounts)
idx = 5
AssignedToLargestBin = x(Bin == idx)
AssignedToLargestBin = 1×7
    0.4748    0.5029    0.3570    0.6755    0.2435    0.4373    0.4816
BinsIdx = ismember(Bin,idx+[-1 0 1])
BinsIdx = 1×25 logical array
   1   1   0   0   1   1   1   0   1   0   0   1   0   1   1   1   1   0   0   1   1   1   0   0   1
MaxAdjacentBins = x(BinsIdx)                        % Return Elements Of 'x' From Largest & Two Adjacent Bins
MaxAdjacentBins = 1×15
    0.4748    0.0557    0.5029    0.0587    0.3570    0.6755   -0.1539    0.7570    0.0337   -0.0293    1.1485    0.2435    0.4373    0.4816    0.8539

You can of course set ‘idx’ to be whatever you like, and this can be straightforward if there are more than one index, as illustrated here.

.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

the cyclist 2023년 3월 23일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1933600-using-histcounts-to-determine-loose-data-mode#answer_1199365

MATLAB Online에서 열기

I think that you have a case of the XY problem here. Namely, you are asking about your solution to a particular problem, but I suspect there is a more direct way to solve your actual problem.

I'm guessing here, but it seems that you have a data sample, and you want to estimate where the maximal density of that sample is. Is that right?

If that is right, then you either

Know the functional form of the underlying distribution, OR
You do not

Again, I'm guessing, but it seems like you don't.

If both of my guesses are correct, then I would use the ksdensity function to make an empirical estimate of the underlying continuous distribution, and see where the maximum is (using a sufficiently fine grid).

rng default

data = 0.5 + 0.1*randn(1,300); % Using randn instead of rand, so that there is truly a mode

xi = 0 : 0.005 : 1;

data_pdf = ksdensity(data,xi);

figure

hold on

histogram(data,"Normalization","pdf")

plot(xi,data_pdf)

[maxPdf,indexToMax] = max(data_pdf);
xiOfMax = xi(indexToMax)
xiOfMax = 0.5000

Of course, that's a lot of guesswork on my part. But it is typically better to tell us the problem you are trying to solve, in addition to your method.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Using histcounts to determine loose data mode

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Using histcounts to determine loose data mode

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기