Extracting data from histogram plots
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
Hello. I'm trying to process some data from some chemical analyses I did a while ago. I have 3 types of data: particle diameter, nitrogen content (%), and sulfur content (%). I've already managed to organize the particle diameter data into a histogram plot with something like 50 bins. Now, I'd like to figure out the average nitrogen and sulfur content of the particles in each bin. I'm not sure how to do this, though, and I haven't found any obvious tutorials to explain how to do this. Any advice?
채택된 답변
3 methods to group data and compute mean for each group
Each method deals with empty bins differently.
discretize + splitapply
Use discretize to group each value into the bins used in histogram and then splitapply to compute the mean for each group. Note that each bin must contain at least one data point.
Example: compute the mean of data in bins defined by edges.
rng default % for reproducibility of this demo
data = rand(1,100)*100;
edges = 0:10:100;
binID = discretize(data,edges)
binID = 1×100
9 10 2 10 7 1 3 6 10 10 2 10 10 5 9 2 5 10 8 10 7 1 9 10 7 8 8 4 7 2
a = splitapply(@mean,data,binID)
a = 1×10
5.3838 15.2780 26.0259 35.6310 46.5284 55.4195 66.1338 75.5438 83.3041 94.1885
discretize + groupsummary
Use discretize to group each value into the bins and then groupsummary to compute the mean of each group. When working with vectors, the first two arguments must be column vectors.
Note that the output vector skips empty bins. See additional outputs to groupsummary to identify which bins are represented in the first output.
s = groupsummary(data(:),binID(:),'mean')
s = 10×1
5.3838
15.2780
26.0259
35.6310
46.5284
55.4195
66.1338
75.5438
83.3041
94.1885
discretize + accumarray
Use discretize to group each value into the bins and then accumarray to compute the mean of all bins.
Note that empty bins are represented by a 0.
m = accumarray(binID(:),data,[],@mean)
m = 10×1
5.3838
15.2780
26.0259
35.6310
46.5284
55.4195
66.1338
75.5438
83.3041
94.1885
Comparison of these methods when some bins are empty
data = randn(100,1)+10; % expected range: ~6 : ~13
edges = 0:3:15; % 5 bins but the first two will be empty
binID = discretize(data, edges);
m = accumarray(binID,data,[],@mean)
m = 5×1
0
0
8.4699
10.1766
12.7170
s = groupsummary(data,binID(:),'mean')
s = 3×1
8.4699
10.1766
12.7170
a = splitapply(@mean,data,binID)
Error using splitapply
For N groups, every integer between 1 and N must occur at least once in the vector of group numbers.
For N groups, every integer between 1 and N must occur at least once in the vector of group numbers.
댓글 수: 7
Haley Royer
2023년 3월 10일
Is there any way to discretize based on a lograrithmic scale? My data is organized so that the smallest bins have the smallest size (e.g., bin 1 ranges from 0.1 to 0.175 microns) and the largest bins have the largest size (e.g., bin 60 ranges from 7.499 to 8.058 microns). Each bin size is different and increases in size from the smallest to the largest bin.
Adam Danz
2023년 3월 10일
Yes, the second input argument for discretize is a vector of bin edges which you can define any way you'd like. If you're happy with your histogram bin edges, you can use the same edges to discretize your data.
Torsten
2023년 3월 10일
If you had read the documentation of "discretize", you would know that the parameter "edges" is exactly what you are asking for.
Haley Royer
2023년 3월 10일
I already read the discretize documentation. I just missed the edges parameter. No need to be a dick about it.
Haley Royer
2023년 3월 10일
Thank you, Adam. That's helpful advice. I think I've got what I need now. I appreciate the guidance.
Haley Royer
2023년 3월 11일
Hi again. I've run into another issue. When trying to use splitapply I get the following error
"Group numbers must be a vector of positive integers, and cannot be a sparse vector."
My understanding is that because I have values in my column that are zero, splitapply cannot be used. Some of the particles I'm looking at don't have nitrogen or sulfur, but I still have to average a group such as 0 0 0 5.0 2.5. Any way to get around this?
Adam Danz
2023년 3월 11일
Let's keep it civil here.
As you mentioned, if one of the bins have no values, then splitapply won't work.
I'll add alternatives to my answer.
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Data Distribution Plots에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
