Deciding Which Distribution Fits Your Data Best


Quick Links

SPC for Excel Software

Visit our home page

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese

Connect with Us

Comments (20)

  • HEIMONMI April 24, 2018 Reply

    A/C Goodness of Fit Information by Distribution HOW DO YOU GOT A NEGATIVE -190.3 i,e WEIBULL -LOGLIKELIHOOD (-190.3) 

  • Anurag ChakrabortyJuly 19, 2019 Reply

    Firstly, thank you so much for this wonderful article that explains the procedure of determining the right distribution for a given set of data. I only have one question though. What is the connection between Table 1 and the set of values that has been presented in Table 2 ? For example, are the Goodness-of-Fit Test results for the different candidate distributions in Table 2 calculated based on the distribution parameters from Table 1 ?? Am I right to understand that the values presented in Table 2 couldn’t have been calculated without the data from Table 1 ?

    • billJuly 20, 2019 Reply

      Thank you for your kind words.  You are correct.  Table 1 gives the parameters for each of the distributions.  Table 2 takes those parameters to determine goodness of fit, etc.

  • RahulFebruary 21, 2020 Reply

    How can I determine which distribution fits my data the best in r programming? please help me to reach out of this point.

    • billFebruary 21, 2020 Reply

      Sorry, I do not use R.

    • RimittiJanuary 11, 2022 Reply

      In R, use packages fitdistrplus and actuar, you find also examples in youtube (distributions fitting in R etc..)

  • Daniel BarretoMay 11, 2020 Reply

    Excellent article, sir. What should I do if my data distribution does not fit any of these standard curves? I used minitab and for all distribution p<0,005.

    • billMay 11, 2020 Reply

      Did you try to transform the data using Box-Cox or the Johnson Transformations?  If they don't work and none of the distributions fit, you are pretty much out of luck – what are you trying to do?  

      • Daniel BarretoMay 11, 2020 Reply

        I tried to transform using box-cox and Johnson Transformation but both of the did not give a good fit.I have been trying to do a capability study for basis weight at a paper machine. Normally we just produce this special product with basis weight from 900-1000 for 2 days every month (every hour we take a sample to check the basis weigth). So I organized all the data from 2018 and 2019 (24 runs) in a spreadsheet and then realized that the distribution is not normal and with individual distribution identification I could not fit the data at any distribution available.Do you think the procedure is correct? What should I do? Thank you so much for the help

        • billMay 11, 2020 Reply

          Is the process in control?  If not, that is why the data may look non-normal.  I quite often see that short runs start high or low for different runs – that might cause the histogram to look non-normal.  Send me your data and i will take a look at it ([email protected]).   If that is not the issue,  I would just do a histogram and add specs to see if it looks like it is capable.  

  • AaronJune 22, 2020 Reply

    Many thanks for sharing this informative article. I was wondering how you calculated the LRT values?

    • billJune 22, 2020 Reply

      Likelihood-ratio test statistic = 2 * L(A)- 2 * L(B).
      where L(A) is the log likeihood for the three parameter distribution and L(B) is the log likelihood for the two parameter distribution. Use the chi-square distribution in Excel to define LRT:
      LRT = chidist(likelihood-ratio test statistic, 1)

  • AnonymousAugust 15, 2020 Reply

    Excellent article, thank you so much

  • RahulAugust 21, 2020 Reply

    I would like to know below things on distribution analysis  1)When we have a large dataset  with many features(many x variables–x1,x2,x3…) present in the whole data set then what is the approcah to determine the distribution?' 2) Should we find distribution for each variable separately and compare among themselves and process further to make them ditribution if they are not? 3) should we find distribution for only important variables and do the same thing if they are not ditributed well? 4)should we find distribution for Y(target) variable and do the same thing if they are not distributed well? 5) Should we find ditribution of x variables with relative to Target variable?

    • billAugust 21, 2020 Reply

      I am not sure what you are doing but I would find the distribution for each separate variable.  I assume they are independent of each other.  

  • Sadia October 23, 2020 Reply

    I am trying to generate random data that follow a bimodal distribution. First, could you please explain the details of a bimodal distribution? Then, how to generate random data using this distribution. Lastly, how to identify if data follows bimodal distribution? Your reply will be greatly appreciated.  

    • billOctober 23, 2020 Reply

      A bimodal distribution is one that has two peaks.  Suppose you want to do this for a normal distibution.  You can randomally generate numbers for the situation when the average = 100 and standard deviation = 10.  Then generate another one with average of 80 and standard deviation 0f 10.  You can use any values you want. Combine them and you have a bimodal distribution.

  • CollinzFebruary 24, 2021 Reply

    Very resourceful article. Could you be having a more detailed ebook about those concepts?

    • billFebruary 24, 2021 Reply

      We would like to do that – just an issue of time.  Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *