Residuals Analysis (ANOVA)

This worksheet contains a table with the residuals analysis. A portion of the table for this example is shown below. The columns are described below. p is the number of terms in the model; n is the number of runs. Note: if you rerun an ANOVA in a workbook that already exists, the worksheet “Residuals” as well as the chart sheet “Residual Plots” will be replaced with the new data.

Standard Run Number: the run number assigned by the program during the design setup.

Actual Run Number: the actual run number assigned by the program during the design setup; will be the same as the standard run number if the option to randomize the runs was not selected.

Predicted Value: the value of the response variable predicted from the model containing all the sources of variation

Residual: the difference between the observed value and the predicted value

Leverage: the amount of leverage (influence) the run has on the predicted value; the leverage values are obtained from the diagonal element of the hat matrix; if the leverage for a run is greater than 2p/n, then this run is a high-leverage point and should be investigated further; p is the number of terms in the model and n is the number of runs; the hat matrix is given by H=X(X’X)^-1X’

Standardized Residuals: provides a rough check for outliers; determined by dividing each residual by the square root of the mean square error; any value outside +/- 3 is a possible outlier

Internally Studentized Residuals: take into account the inequality of variances across the factor space, any value outside +/-3 is a possible outlier, defined as (sigma squared is the mean square error):

Externally Studentized Residuals: uses a different estimate of sigma than MSE in the above equation; estimates sigma based on a data set with the i^th observation removed:

The externally studentized residual is the defined as (any value outside +/- 3 is a possible outlier):

DFFITS: measures the deletion influence of run i; if absolute values is greater than 2*sqrt(p/n) , the run is influential

Cook’s Distance: indicates the difference between the calculated b values and the values one would have obtained, had a run been excluded; all distances should be of about equal magnitude; if not, then there is reason to believe that the run biased the estimation of the regression coefficients; values greater than 1 are influential; defined as the following:

Standard Error of Estimated Mean: determined by the square root (mean square residual * leverage) for each observation

95% Lower CL: the lower confidence limit for the predicted value (based on the t distribution for alpha = 0.05 and the degrees of freedom of the mean square residual)

95% Upper CL: the upper confidence limit for the predicted value (based on the t distribution for alpha = 0.05 and the degrees of freedom of the mean square residual)

Residuals Analysis (ANOVA)

SPC Knowledgebase Newsletter and Videos