Search

Poisson Regression Help

This page shows how to perform Poisson regression using SPC for Excel.

In Poisson regression, the response variables (Y) are counting data – integers – and must be greater than or equal to 0.  For example, you might want to examine the number of traffic accidents at a given intersection.  You collect data on the number of accidents, which is the response variable, and the time of day (morning, afternoon, night) and weather conditions (sunny, raining, cloudy), which are the predictor variables.  The Poisson regression allows you to  determine which predictors have a statistically significant impact on the response variable.

The response variable must be counting data (0, 1, 2, 10, 50, 1000, etc.) and cannot have any values less than 0.  It must also follow a Poisson distribution.

Example

A chemist is interested in decreasing the blemishes on plastic bottles.  He has two factors he is interested in: the size of the cavity and the temperature.  He has collected 30 data points. The model he is using is:

Blemish Defects = Exp(Y)

where Y is a function of the cavity size and temperature.

Data Entry

Enter the data into a spreadsheet as shown below. The data can be downloaded here. The data must be in columns with the variable names in the first cell of the column.

Poisson Regression Output

There are four new worksheets added during the analysis:

The number in parentheses is the number of regressions in the workbook. This allows you to keep track of the worksheets that go together when you remove observations, remove variables, or transform the Y variable and rerun the regression. The ranges containing the results on the first three sheets listed are protected. This is necessary for the program to find the information needed to rerun the regression. The four worksheets are described below.

Data

This worksheet contains the data used in the regression analysis.

Summary

Most of the output for the Poisson regression is shown on the Summar worksheet. The top part of the Summary sheet shows the link function used and the deviance table.

The deviance table is used to define which variables have a significant impact on the response variable.  If the p-value for the source is less than 0.05, the variable has a significant impact on the response.  If the p-value for the source is greater than 0.05, you might consider removing it from the model, as shown below.    In this example, both the cavity size and temperature are statistically significant.  If the p-value is less than 0.05, it is shown in red.

Temperature is a continuous predictor.  Since it is statistically significant, it means that the coefficient for temperature is different than 0. Cavity size is a categorical predictor.  If a categorical predictor is statistically significant, then it means that the levels (like small and large) don’t have the same average (for blemish defects).

The next output on the Summary worksheet is the predictor’s table.

This contains the coefficients, standard error, t statistic, p value, the 95 % confidence interval for the coefficients and the VIF. Coefficients with p values less than 0.05 are statistically significant. These will be in red also if they are less than 0.05.

The regression model and model statistics are shown next on the summary sheet.

The model is given.  If categorical predictors are present, then there is a model for each categorical level for each categorical predictor. The model statistics are then given. The deviance R2 and adjusted deviance R2 are given.   The larger the percentages, the better the model is.  Three others statistics are given:

• Akaike Information Criterion (AIC)
• AICc (Akaike’s Corrected Information Criterion)
• BIC (Bayesian Information Criterion)

These statistics are used to compare this model with other models. The smaller the values the better.

The Goodness of Fit table is then shown on the Summary worksheet.

The value for the deviance is the error deviance from the Deviance Table above while the Pearson Chi-Squared value is the square of the Peason’s deviance (see below).   The major reason for the Goodness of Fit table is to see how well the data fits the model.  If there is a small p-value, then there is not a good fit.

The last portion of the Summary sheet is the Predict Results section.

Enter values for the cavity size and temperature and then select Predict.  The software will provide  the result along with the 95% confidence limits.

Residuals

The residuals worksheet contains the observation number, the observed values, predicted values, and the residuals using the options selected early or the defaults.

Regression Charts

There are two charts that are automatically created in this worksheet: a normal probability plot for the residuals and the predicted values versus observed values chart.  The normal probability plot of the raw residuals is shown below. The residuals should fall around the straight line.

The predicted values versus observed values chart is shown below. A good model will have the points close to the line.

Additional charts are available from the “Revise” button on the residuals worksheet. See “Revising the Poisson Regression” below.

Revising the Poisson Regression

SPC for Excel allows you to add additional charts on the Regression Charts worksheet, or to revise the regression by removing observations or removing variables. To access these options, select the “Revise” button on the residuals worksheet (see figure above).  You will get the form below.

These three options are discussed below.