kstest - normal?

조회 수: 7 (최근 30일)
Ian
Ian 2011년 3월 31일
댓글: the cyclist 2020년 1월 30일
Hi, I am confused from reading the description from the 'kstest' function. Usually '1' means true and '0' means false, and the purpose of this function is to test whether or not a set of data is normally distributed. However, what I gather from reading the description, '0' is returned when the data is normally distributed, and '1' is returned when the data is not normally distributed.
Is this correct interpretation? The example is also a little confusing x = -2:1:4 x = -2 -1 0 1 2 3 4
[h,p,k,c] = kstest(x,[],0.05,0)
h =
0
p =
0.13632
k =
0.41277
c =
0.48342
These data are linear, not a normal distribution. Yet the kstest returns '0', which means the kstest classifies these data as normal, which is a limitation of the kstest with small data samples?
From what I read, the resolution is thus to use the 'smaller' or 'larger' tag to correct for this problem, but is there any clear cut-off for what is 'smaller' and what is 'larger'?
Lastly, if I were to use this test in a publication and say that our data was 'normal' (this function returned 0) or failed to be classified as 'normal' (this function returned 1) with this test and I used the 'smaller' or 'larger' tags, how does that change the name of the test? It can't be the same test if it is returning different values. How would I explain this?

채택된 답변

Andrew Newell
Andrew Newell 2011년 3월 31일
Your example (taken from the documentation), "illustrates the difficulty of testing normality in small samples." If you plot
normplot(x)
you'll see that the deviations from a standard normal distribution occur in the two outer points. It doesn't take a lot more data to get a reasonable result, though:
x = -2:0.5:4;
[h,p,k,c] = kstest(x,[],0.05,0)
h =
1
p =
0.0245
k =
0.3947
c =
0.3614
Keep in mind, too, their comment about the Lilliefors test - it is more likely to be the one you want.
  댓글 수: 2
the cyclist
the cyclist 2011년 3월 31일
Andrew, I think you meant "normplot(x)" rather than "normpdf(x)" here.
Andrew Newell
Andrew Newell 2011년 3월 31일
Oops!

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

the cyclist
the cyclist 2011년 3월 31일
Ian,
Lots and lots of things that need to be addressed here. I'll try to address as much as I can.
First, in your little example, you only have seven data points. Therefore, the statistical test you are applying has very little power to distinguish between normal and non-normal distributions. Note that if you added even one more point, x=-2:1:5, the K-S test would have rejected the null hypothesis, though. I hope that the real study you are planning to submit has more data than this!
The test certainly does not "classify these data as normal"! It fails to reject the hypothesis that the data are normally distributed. That's an important distinction. Given this dataset, you should not say your data are normal.
The data [-2 -1 0 1 2 3 4] are not, in and of themselves, "linear". They are seven data points that you just happen to know you generated linearly.
The resolution of this issue is not to use the additional arguments "larger" or "smaller". Those arguments are more related to one's expectation that the distribution being sampled is skewed toward one side or the other of normal. I don't think those are relevant here. (But, the way it would be described, if it were relevant, would be to say you used a one-sided KS test rather than two-sided.)
There are other tests of normality that may also be useful to you: jbtest and lillietest.
I would say that if it is important to distinguish normality, then, sadly, you do not have enough data to do so confidently.
  댓글 수: 6
N
N 2020년 1월 29일
On a side note related to the definition of the tails:
  • when using 'Tail' set to 'smaller' we are testing if the the distribution is left skewed
  • when using 'Tail' set to 'larger' we are testing if the the distribution is right skewed
Is this correct?
the cyclist
the cyclist 2020년 1월 30일
% Set random number seed to default
rng default
% Generate data that is clearly shifted larger than standard normal
% (I'm not sure I would refer to this as "right skewed", but I think this is what you mean.)
N = 1000;
x = randn(N,1) + 5;
% Null hypothesis that the distribution is larger than standard normal is NOT rejected
h_larger = kstest(x,'Tail',"larger")
% Null hypothesis that the distribution is unequal to standard normal IS rejected
h_unequal = kstest(x,'Tail',"unequal")
% Null hypothesis that the distribution is smaller than standard normal IS rejected
h_smaller = kstest(x,'Tail',"smaller")

댓글을 달려면 로그인하십시오.


Matt Tearle
Matt Tearle 2011년 3월 31일
The output is the more likely hypothesis, not a true/false. Hence, h = 0 means the null hypothesis (H0) which is that the data comes from the assumed distribution.
The smaller/larger options are for performing one-sided tests - eg if your data came from a normal distribution with positive mean.
Other than that, see Andrew's answer. In particular, look at lillietest and jbtest.
  댓글 수: 2
the cyclist
the cyclist 2011년 3월 31일
h=0 does not mean that the null hypothesis is the more likely hypothesis. It means only that the null hypothesis cannot be rejected at the specified level of confidence.
Matt Tearle
Matt Tearle 2011년 3월 31일
Yes, but given that it returns a single value 0 or 1, I was trying to find a way to phrase that this return is the "decision" (H0 or H1), rather than a true/false.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by