Are the Skewness and Kurtosis Useful Statistics?

Quick Links

SPC for Excel Software

Visit our home page

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.


Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese

Connect with Us

Comments (31)

  • billFebruary 28, 2016 Reply

    Below is the e-mail Dr. Westfall sent concerning the describing kurtosis as a measure of peakedness. It is printed with his permission. It did lead to the re-writing of the article to remove the peakedness defintion of kurtosis.

    Thank you for making your information publically available. I often point students to the internet for supplemental information, and some of your is valuable. However, your description of kurtosis as “essentially useless for SPC” misses the point by a wide mark.

    Kurtosis has nothing to do with “peakedness”. It is a measure of outliers (special, rather than common causes of variation, in Deming’s terms), and a large part of spc is about identifying them and correcting the special causes when possible. Thus, if you see a large kurtosis statistic, you know you have a quality control problem that warrants further investigation.

    Here is a simple explanation showing why kurtosis measures outliers and not “peakedness”.

    Consider the following data set: 0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 1

    Kurtosis is the expected value of the (z-values)^4. Here are the (z-values)^4: 6.51, 0.30, 5.33, 0.45, 0.00, 0.30, 6.51, 0.00, 0.45, 0.30, 0.00, 6.51, 0.00, 0.00, 0.30, 0.00, 27.90, 0.00, 0.30, 0.45

    The average is 2.78, and that is an estimate of the kurtosis. (Subtract 3 if you want excess kurtosis.)

    Now, replace the last data value with 999 so it becomes an outlier: 0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 999

    Now, here are the (z-values)^4: 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 360.98

    The average is 18.05, and that is an estimate of the kurtosis. (Subtract 3 if you want excess kurtosis.)

    Clearly, only the outlier(s) matter. Nothing about the “peak” or the data near the middle matters. Further, it is clear that kurtosis has very positive implications for spc in its detection of outliers.

    Here is a paper that elaborates: Westfall, P.H. (2014). Kurtosis as Peakedness, 1905 – 2014. R.I.P. The American Statistician, 68, 191–195.

    May I suggest that you either modify or remove your description of kurtosis. It does a disservice to consumers and users of statistics, and ultimately harms your own business because it presents information that is completely off the mark as factual.

    Many thanks,

    Peter Westfall

    • JacksonOctober 9, 2018 Reply

      Excellent way of explaining, and nice article to get information on the topic of my presentation topic, which i am going to deliver in institution of higher education.

  • Ad van der venApril 4, 2016 Reply

    I have many samples, let us say 500, with say 50 cases within each sample. I compute for each sample the skewness and kurtosis based on the 50 observations. In the scatter plot of the sample skewness and sample kurtosis (500 data points) I observe a curved cloud of data points between the skewness and kurtosis. When I used 500 simulated data sets with 50 simulated measurements generated according to an exponential distribution I again found the curved shaped cloud of scatterpoints. Theoretically, however, the skewness is equal to 2 and the kurtosis equal to 6. Can youn elaborate about this?My e-mail address is ……… 

  • PavanJuly 19, 2016 Reply

    A very informative and insightful article. But one small typo, I think. When defining the figure 3 (in the associated description) it was mentioned that “Figure 3 is an example of dataset with negative skewness. It is the mirror image essentially of Figure 2. The skewness is -0.514. In this case, Sbelow is larger than Sabove. The right-hand tail will typically be longer than the left-hand tail.”the bold part should be “The left-hand tail will typically be longer than the right-hand tail“. Please correct me if I am wrong. Thanks,Pavan

    • billJuly 20, 2016 Reply

      Thanks Pavan.  You are correct.  I fixed the typo.

  • sothaAugust 10, 2016 Reply

    Shouldn't kurtosis for normal distribution be 3?And skewness is 0… 

    • billAugust 10, 2016 Reply

      Please see the equation for a4 above.  It will give 3 for a normal distribution.  But many software packages (including Excel) use the formula below that which subtracts 3 – and it gives 0 for a normal distribution. 

  • Anita December 1, 2016 Reply

    Please, I need your help. I'm doing a project work on skewness and kurtosis and its applications. Could you please help me with some of the areas of applications of skewness and kurtosis and also the scope and delimitations undergone during the study. Thanks. 

    • billDecember 1, 2016 Reply

      Hello Anita, 
      I am not sure what you are asking.  You can find applications by searching the internet.  For example, they are used by some stock traders to help determine when to sell or buy stocks.  Please e-mail at [email protected] if you need more.

  • Kennedy BestDecember 5, 2016 Reply

    1. Questions: What does the little i mean in the variable Xi2. Impressive: I thought the overall article was well-written and had good examples.3. Needs Improvement: It would be helpful to have simpler problems as a basis of each example and skew and kurtosis topic.

    • billDecember 6, 2016 Reply

      Thanks for the comment.  The little i is simply denotes the ith result.

  • Thomas StevensonJanuary 3, 2017 Reply

    Your discription of figure 4 and 5 seem backward.  In figure 4 the the far tails (m=60, m=140) have the same weight as the central region (m=100).   Wouldn't that be heavy tailed?  Likewise for figure 5, the tail region is short relative to the central region (i.e. light rather than heavy).

    • billJanuary 3, 2017 Reply

      Heavy or light as to do with the tails.  The uniform distribuiton in Figure 4 has no tails.  It is "light" in tails.   The other has long tails – so it is heavy in tails.

      • Thomas StevensonJanuary 4, 2017 Reply

        Maybe broad or tight would be better descriptors as heavy and light imply high and low frequency at least in my mind.

        • billJanuary 4, 2017 Reply

          I would agree with those descriptors.

  • RohAugust 28, 2017 Reply

    From figure 8, the kurtosis sees to somewhat converge to its 'true' value as the data points are increased. However, in my empirical tests, the kurtosis is simply increasing in the number of data points, going beyond the 'true' kurtosis as well. What could be the reason for this? I dont find it intuitive. Thanks.

    • billAugust 29, 2017 Reply

      n is the sample size.  As it increases, the kurtosis will approach that of the normal distribution, 0 or 3 depending on what equation you use.    How are you doing your empirical testing?

  • Yu-ChengMarch 7, 2018 Reply

    One small typo " there are 3 65’s, 6 65’s" for describing Figure 1.  It should be " there are 3 65’s, 6 75’s".

    • billMarch 7, 2018 Reply

      Corrected.  Thanks for letting me know.

  • dukeSeptember 12, 2018 Reply


  • Peter WestfallNovember 7, 2018 Reply

    Thanks for revising the information about kurtosis.  There are still a couple of small issues that should be addressed, though. 1. The graph showing "high kurtosis" is misleading in the way that it presents "heavy tails". The graph actually looks similar to a .5*beta(.5,1) + .5(-beta(.5,1))  distribution, which has light tails (bounded between -1 and 1), negative excess kurtosis, but an infinite peak. For a better example, consider simulating data from a T(5) distribution and drawing the histogram. There, the positive kurtosis more correctly appears as the presence of occasional outliers.  The "heavy tailedness" of kurtosis is actually hard to see in a histogram, because, despite the fact that the tails are heavy, they are still close to 0 and hence difficult to see.  A better way to demonstrate the tailedness of high kurtosis is to use a normal q-q plot, which makes the heavy tails very easy to see. 2. The argument that the kurtosis is not a good estimate of the "population" (or "process") parameters is true, but not a compelling argument against using the statistic for quality control or SPC. A high kurtosis alerts you to the presence of outlier(s), commonly known as out-of-control conditions, possibily indicating special causes of variation at work. Of course, such cases should be followed up by a plot of some sort, but just the fact that the kurtosis indicates such a condition tells you that it is indeed useful and applicable for SPC. There is no need for the "population" framework here, as Deming would agree, considering that this is an analytic (not enumerative) study. So the argument that kurtosis is not useful for SPC is overstated at best, and not supportable at worst. Peter Westfall

  • Rob CJune 20, 2019 Reply

    A few posts above is a suggested correction to a typo in describing Figure 1 — ” there are 3 65’s, 6 75’s” — this actually introduced another typo. In my viewing of Figure 1, the correct description ought to be ” there are 3 65’s, 6 70’s and 9 75’s”.

    • billJune 21, 2019 Reply

      Thanks for the correction of the correction!  It has been changed.

  • Madison ButlerAugust 29, 2020 Reply

    Wouldn't a useful measurement be the rate at which kurtosis approaches 0? If kurtosis is a measurement highly dependent on sample size, we should measure to what degree the kurtosis of a population depends on sample size as a measurement of kurtosis itself.

    • billAugust 30, 2020 Reply

      Hello, isn't that what Figures 7 and 8 are doing?  Taking different sample sizes from a population?  Sample size has to be pretty large before the kurtosis value starts to level off

  • Tom BarsonSeptember 5, 2020 Reply

    This is a useful article, but the conclusion seems strange.  The skewness, say, of a sample says something about the distributrion of that sample.  Whether it's valid for the population is a question that, yes, depends on sample size – but that's just as true of a histogram and, unlike a histogram, skewness can't be manipulated by bin widths, etc.  Seems like you can play all day with histograms bin widths – but if your first take shows a distribution that is bunched roughly in the middle, why not use skewness and your rules of thumb to confirm that instead of teasing the histogram?

    • billSeptember 7, 2020 Reply

      Thanks Tom.  Agree you can change the look of a histogram by changing the bin widths, etc.  The sample skewness does tell you about the sample – just not about the distribution it came from unless the sample size is large.

  • studentJanuary 24, 2021 Reply

    Can you help me with this, my lecturer ask me this question. ‘What can you tell about the skewness and kurtosis, of the weight and length of ikan selat in the lake?
    Weight (g) = Skewness (1.038), Kurtosis (3.546)
    Total length (cm) = Skewness (1.112), Kurtosis (3.725)

    • billJanuary 24, 2021 Reply

      What do you think you can tell?

  • APLJune 30, 2021 Reply

    A very nice explanation. Thank you Dr. Bill McNeese.

  • jadeFebruary 5, 2022 Reply

    what is the evaluation if the skewness is exactly 1 or 0.5?  

Leave a Reply

Your email address will not be published. Required fields are marked *