Hello,
I have data for the size of rocks for two independent locations, A and B.
Data lists rock size from smallest to largest (shown by yellow column)
Red column shows the percentage of rocks under that size (out of 100%)
Green columns shows cumulative percentage (out of 100%)
**Normality test indicates rock data (yellow columns) is not normally distributed.**
**When I apply log transform to the data, it becomes normally distributed, and shows a straight (linear) line when plotted on a graph**
*I want to compare rock sizes between the two locations*
**What is the best way to compare rock sizes (yellow columns) given that these values are thresholds, not actual rock sizes?**

 채택된 답변

Star Strider
Star Strider 2020년 1월 20일

1 개 추천

If they both have the same distribution (regardless of what that distribution is),and you are comparing two samples, the ranksum test is likely the most appropriate.

댓글 수: 4

Sarah Yun
Sarah Yun 2020년 1월 20일
Hi Star,
Thank you.
My confusion comes from fact that data are not ACTUAL rock sizes, but some kind of threshold.
Is it possible to find the ACTUAL rock size from this data?
Star Strider
Star Strider 2020년 1월 20일
My pleasure!
The actual rock size data seem to have been discarded. What is left is essentially a histogram, with the bins being the yellow columns and the relative counts being the red columns.
One option is to create a single vector of rock threshold sizes (yellow column) for both sites (using the linspace function), then using interp1 with the same threshold rock size vector and the red columns from each site separately, estimate the frequencies of the rocks with those common sizes at each site. Also, do not extrapolate, so that the rock sizes that not shared by both sites will be NaN for site A. You can then use isnan to eliminate the NaN values from the site A interpolation.
Then use the interpolation results with ranksum. This is likely as close as you can get for derived data.
Sarah Yun
Sarah Yun 2020년 1월 20일
Thank you Star.
Is it possible to just do ranksum on the yellow column (the bins) for both data? Or would this violate a rule?
Star Strider
Star Strider 2020년 1월 20일
My pleasure.
It would likely be more appropriate to use it on the red columns, since (as I understand it) those are the relative frequencies of the sizes.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Geology에 대해 자세히 알아보기

태그

질문:

2020년 1월 20일

댓글:

2020년 1월 20일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by