Calculating the probability of a data point in a histogram

조회 수: 10 (최근 30일)
Curious Mind
Curious Mind 2018년 5월 7일
답변: Steven Lord 2018년 5월 8일
Hello:
The image below is a histogram of a large data set (90*1 double in blue) and a single data point (in red). I would like to compute the probability of the data (in red) against the blue data points. I could counts the counts on the left of the red bar and divide it by the total counts (90). But I want a matlab code that will do it more efficiently and in a faster way probably without even using the histogram. Thank you.

채택된 답변

Steven Lord
Steven Lord 2018년 5월 8일
Change the Normalization property of the histogram object then get the appropriate element of the Values property of that object.
rng default
x = randn(10000,1);
h = histogram(x)
h.Values(10)
Since the default Normalization method is 'count', this will tell you that there are 133 elements of x that fall into bin 10. [Since I used rng default, you should get the exact same random numbers in x as I did and so generate the exact same histogram.]
h.Normalization = 'probability';
h.Values(10)
Now h.Values(10) is 0.0133 which makes sense: 133 / 10000 (the total number of points) = 0.0133.
If you wanted to get the same information without actually bringing up the plot, the histcounts function also lets you specify a 'Normalization' method.
And I'd guess that histogram you showed was created with something more like 900 data points than 90. According to the Y limits each of the 5 central bars contain more than 90 elements, assuming you're using the default 'count' Normalization. Still not Big Data, but bigger.

추가 답변 (1개)

Image Analyst
Image Analyst 2018년 5월 7일
You need to know the edges of the bin, e1 and e2. Then you can simply do
percentageInBin = sum(data>=e1 & data < e2) / numel(data);
No histogram needed if you just need it for that one red bin.
By the way, it made me snicker when you described 90 elements as large. It literally would have to be around a million times that big before anyone might start considering it large.
  댓글 수: 3
Curious Mind
Curious Mind 2018년 5월 8일
Also if I have say a dataM (20*1) double matrix can I get the probabilities of all the rows in dataM at once against the data with 90 elements?
Image Analyst
Image Analyst 2018년 5월 8일
Just the bar in red.
To do it without explicitly computing a histogram array, you'd have to do it one bin at a time. Much better to simply get the histogram and divide the counts array by the total counts. Why can't you compute the histogram?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by