How to fix this boxplot.. My data has so many values which are zero, i want to get rid of them in my analysis!
조회 수: 8 (최근 30일)
이전 댓글 표시
for k =1:size(wint2_r,3)
wn_r = squeeze(wint2_r(:,:,k));
w_r(:,k) = wn_r(:);
end
figure(2);
clf;
subplot(2,1,1)
boxplot(w_r)
subplot(2,1,2)
w_r(w_r == 0) = NaN;
boxplot(w_r)
댓글 수: 3
dpb
2016년 5월 19일
There aren't any rewards for seeing how little information you can post, Sofie. Give us some context here and put the fully-explained question in the question, don't try to ask the question in the title.
The short answer would be use the second option above excepting instead of setting the values to NaN, simply remove them entirely.
w_r(w_r == 0) = [];
That, of course, changes the dataset drastically and may produce something that looks nice on a plot but has no meaning--only you can determine if that makes any sense to do or not. At least one question to answer in that regards would be if the dataset has so many zeros, why is that? Are they really zero or no response or what????
답변 (1개)
dpb
2016년 5월 21일
편집: dpb
2016년 5월 21일
OK, on reflection I believe my previous comment actually is an Answer -- so deleted it and moved it with some refinements here.
The question of trying to force a plot to have a given appearance by arbitrarily removing values from the dataset is simply, in my opinion, misguided. Even if it were to look the way you thought it should, what would it mean since the data that made the plot are no longer the actual data?
Since the box plot limits are the 25- and 75th percentiles, respectively, if there are so many repeated initial values (zero or any other) that they comprise 75% or greater of the total then the correct "answer" for the boxplot is as shown; the median and the outliers; the two percentile points are subsumed in a single location and so can't be shown.
You're misreading the second plot; there are no(*) negative values, the bottom whisker appears to be at identically zero and the median is somewhere in the neighborhood of 2. The y-axis has been scaled to have a little visual space below the origin so the bottom whiskers don't lay on the bounding box as they would if the lower axis origin were zero.
() There may be a negative value in the second bar as an outlier; perhaps there's roundoff present in the original dataset and there is one (or a few) very small negative value(s)? What does *min(x) return for that case?
In summary, looks to me like the boxplot function is working precisely as advertised; the question is in what is the meaning of the data itself. For that question, we have no information here; only you can judge what it means as noted before; the meaning depending upon just what the values represent and how/why there are so many zero responses in it. I suspect still that the removal of those simply to get a plot that looks "pretty" or expected is not the right answer.
댓글 수: 2
dpb
2016년 5월 24일
편집: dpb
2016년 5월 24일
Well, we can't know that going in; you've got to tell us enough context so we don't go down these blind alleys.
But, in my mind that raises the question of what distinquishes from '0' as "no ice" and '0.0' as "no ice motion"??? There's got to be a limit of resolution for the measurement so one presumes there must be at least some locations that are stable?
So, what was the result of min for the given case 2 for all the data? With this information on what the data represent, is it not possible that the motion could be a retrograde one and, therefore, negative? Seems plausible to me, but other than the name don't know anything about which ice or motion relative to what, etc., etc., etc., ... so is still conjecture.
Does the Answer not satisfy the question given the caveat that you should simply select the non-zero data? Is not then the appearance of zero at the bottom of the whisker simply a manifestation of there being nonzero values small enough to be indistinguishable at the resolution of the graph?
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!