Search

# Binary Logistic Regression Help

Binary logistic regression is used to determine the relationship between predictors (continues and/or categorical) a response variable that only has two possible outcomes.  The outcomes can be yes/no, pass/fail or any binary response.  For example, you may do a study to determine who will buy a product or not buy it or who will vote or not vote in election.  The binary logistic regression methodology will identify which predictors have a statistically significant impact on the response variable.

The example below shows how to use binary logistic regression in SPC for Excel.  This page contains the following:

### Example

A student wants to determine the impact that age, party affiliation, and sex has on whether a person votes or not.  The student collects data on 25 randomly selected people. Please see this link for how SPC for Excel handles categorical predictors if you have them.

### Data Entry

Enter the data into a spreadsheet as shown below. The data can be downloaded here. The data must be in columns with the variable names in the first cell of the column.

### Binary Regression Regression Output

There are four new worksheets added during the analysis:

The number in parentheses is the number of regressions in the workbook. This allows you to keep track of the worksheets that go together when you remove observations, remove variables, or transform the Y variable and rerun the regression. The ranges containing the results on the first three sheets listed are protected. This is necessary for the program to find the information needed to rerun the regression. The four worksheets are described below.

#### Data

This worksheet contains the data used in the regression analysis.

#### Summary

Most of the output for the binary logistic regression is shown on the Summary worksheet. The top part of the Summary sheet shows the link function and the response summary. This includes the number of successes and non-successes.

The deviance table is shown next.

It is used to define which variables have a significant impact on the response variable.  If the p-value for the source is less than 0.05, the variable has a significant impact on the response.  If the p-value for the source is greater than 0.05, you might consider removing it from the model, as shown below.    In this example, Age is statistically significant.  If the p-value is less than 0.05, it is shown in red.

Age is a continuous predictor.  If it is statistically significant, it means that the coefficient for temperature is different than 0. Party is a categorical predictor.  If a categorical predictor is statistically significant, then it means that the levels (like Rep and Dem) don’t have the same average.

The next output on the Summary worksheet is the predictor’s table.

This contains the coefficients, standard error, t statistic, p value, the 95 % confidence interval for the coefficients and the VIF. Coefficients with p values less than 0.05 are statistically significant. These will be in red also if they are less than 0.05.

The regression model and model statistics are shown next on the summary sheet.

The model is given.  If categorical predictors are present, then there is a model for each categorical level for each categorical predictor. The model statistics are then given. The deviance R2 and adjusted deviance R2 are given.   The larger the percentages, the better the model is.  Three others statistics are given:

• Akaike Information Criterion (AIC)
• AICc (Akaike’s Corrected Information Criterion)
• BIC (Bayesian Information Criterion)

These statistics are used to compare this model with other models. The smaller the values the better.

The Goodness of Fit table is then shown on the Summary worksheet.

The value for the deviance is the error deviance from the Deviance Table above while the Pearson Chi-Squared value is the square of the Peason’s deviance (see below).   The Hosmer-Lemeshow statistic is a goodness of fit test for logistic regression.  The major reason for the Goodness of Fit table is to see how well the data fits the model.  If there is a small p-value, then there is not a good fit.

The last portion of the Summary sheet is the Predict Results section.

Enter values for the age, party and gender and then select Predict.  The software will provide  the result along with the 95% confidence limits.

#### Residuals

The residuals worksheet contains the observation number, the observed values, predicted values, and the residuals using the options selected early or the defaults.

#### Regression Charts

There are two charts that are automatically created in this worksheet: a normal probability plot for the residuals and the predicted values versus observed values chart.  The normal probability plot of the raw residuals is shown below. The residuals should fall around the straight line.

The predicted values versus observed values chart is shown below. A good model will have the points close to the line.

Additional charts are available from the “Revise” button on the residuals worksheet. See “Revising the Poisson Regression” below.

### Revising the Binary Logistic Regression

SPC for Excel allows you to add additional charts on the Regression Charts worksheet, or to revise the regression by removing observations or removing variables. To access these options, select the “Revise” button on the residuals worksheet (see figure above).  You will get the form below.

These three options are discussed below.