How to check for data normality using kstest?

Question

DANIEL KONG LEN HAO 2021년 9월 16일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1454254-how-to-check-for-data-normality-using-kstest

댓글: Rik 2021년 9월 18일

Suppose I have a data set with about 100 numbers as listed below, how do I properly determine whether or not this data set is a normally distributed using the kstest()? The description mentioned to minus it by the mean and then divide it by standard deviation before putting in the kstest(), but do I need to do that for this case?

Dataset = [64 66 80 66 76 55 57 72 76 68 81 70 82 80 71 74 83 80 76 78 72 74 76 65 61 75 68 80 88 73 76 71 70 74 70 76 66 72 80 75 81 82 84 86 71 82 77 78 80 78 88 77 73 72 74 68 75 62 65 71 72 75 72 75 76 73 81 71 61 61 71 81 73 67 77 77 80 57 70 73 80 75 70 75 74 70 68 80 85 81 71 80 80 78 75 75 80 76 82 75 57];

PS: I'm testing on whether the data is normal only. I must use kstest to find it.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Rik 2021년 9월 16일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1454254-how-to-check-for-data-normality-using-kstest#answer_788519

If you want to test if your data is from a standard normal distribution you should not change it before calling kstest.

If you want to test if your data is normally distributed (but not necessarily from the standard normal distribution), you will first have to normalize it by subtracting the mean and dividing by the standard deviation.

Which of the two is relevant for your case depends on your context. I'm guessing you want the second one, otherwise you don't need the test.

댓글 수: 2
없음 표시없음 숨기기

DANIEL KONG LEN HAO 2021년 9월 18일

Alright thank you! I was looking for normal distribution alone. Another thing I want to ask, does a smaller p-value (Probability) in the ks-test means it's more likely or less likely a normal distributed curve?

Rik 2021년 9월 18일

MATLAB Online에서 열기

That is easy to determine: since your data is absolutely not from a standard normal distribution, you can feed it your unaltered data and see the result. You can also read the documentation:

help kstest
 KSTEST Single sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
    H = KSTEST(X) performs a Kolmogorov-Smirnov (K-S) test to determine if
    a random sample X could have come from a standard normal distribution,
    N(0,1). H indicates the result of the hypothesis test:
       H = 0 => Do not reject the null hypothesis at the 5% significance
       level. 
       H = 1 => Reject the null hypothesis at the 5% significance
       level.
 
    X is a vector representing a random sample from some underlying
    distribution, with cumulative distribution function F. Missing 
    observations in X, indicated by NaNs (Not-a-Number), are ignored.
 
    [H,P] = KSTEST(...) also returns the asymptotic P-value P.
 
    [H,P,KSSTAT] = KSTEST(...) also returns the K-S test statistic KSSTAT
    defined above for the test type indicated by TAIL.
 
    [H,P,KSSTAT,CV] = KSTEST(...) returns the critical value of the test CV.
 
    [...] = KSTEST(X,'PARAM1',val1,'PARAM2',val2,...) specifies one or
    more of the following name/value pairs:
 
        Parameter       Value
        'alpha'         A value ALPHA between 0 and 1 specifying the
                        significance level. Default is 0.05 for 5% significance.
        'CDF'           CDF is the c.d.f. under the null hypothesis.  It can
                        be specified either as a ProbabilityDistribution object
                        or as a two-column matrix. Default is the standard
                        normal, N(0,1).
        'Tail'          A string indicating the type of test. The one-sample
                        K-S test tests the null hypothesis that F = CDF
                        (that is, F(x) = CDF(x) for all x)
                        against the alternative specified by TAIL:
             'unequal' -- "F not equal to CDF" (two-sided test) (Default)
             'larger'  -- "F > CDF" (one-sided test)
             'smaller' -- "F < CDF" (one-sided test)
 
    Let S(X) be the empirical c.d.f. estimated from the sample vector X, F(X)
    be the corresponding true (but unknown) population c.d.f., and CDF be the
    known input c.d.f. specified under the null hypothesis.
    For TAIL = 'unequal', 'larger', and 'smaller', the test statistics are
    max|S(X) - CDF(X)|, max[S(X) - CDF(X)], and max[CDF(X) - S(X)], respectively.
 
    In the matrix version of CDF, column 1 contains the x-axis data and
    column 2 the corresponding y-axis c.d.f data. Since the K-S test
    statistic will occur at one of the observations in X, the calculation
    is most efficient when CDF is only specified at the observations in X.
    When column 1 of CDF represents x-axis points independent of X, CDF is
    're-sampled' at the observations found in the vector X via
    interpolation. In this case, the interval along the x-axis (the column
    1 spread of CDF) must span the observations in X for successful
    interpolation.
 
    The decision to reject the null hypothesis is based on comparing the
    p-value P with ALPHA, not by comparing the statistic KSSTAT with the
    critical value CV.  CV is computed separately using an approximate
    formula or by interpolation in a table.  The formula and table cover
    the range 0.01<=ALPHA<=0.2 for two-sided tests and 0.005<=ALPHA<=0.1
    for one-sided tests.  CV is returned as NaN if ALPHA is outside this
    range.  Since CV is approximate, a comparison of KSSTAT with CV may
    occasionally lead to a different conclusion than a comparison of P with
    ALPHA.  
 
    See also KSTEST2, LILLIETEST, CDFPLOT.

    Documentation for kstest
       doc kstest
[h,p]=kstest([64	66	80	66	76	55	57	72	76	68	81	70	82	80	71	74	83	80	76	78	72	74	76	65	61	75	68	80	88	73	76	71	70	74	70	76	66	72	80	75	81	82	84	86	71	82	77	78	80	78	88	77	73	72	74	68	75	62	65	71	72	75	72	75	76	73	81	71	61	61	71	81	73	67	77	77	80	57	70	73	80	75	70	75	74	70	68	80	85	81	71	80	80	78	75	75	80	76	82	75	57])
h = logical
   1
p = 3.2646e-90

So you can see your answer here: a small p value means it is less likely to be from a normal distribution.

댓글을 달려면 로그인하십시오.

How to check for data normality using kstest?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to check for data normality using kstest?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기