Attribute Gage R&R Studies: Comparing Appraisers

May 2010

Sometimes a measurement system has a measurement value that comes from a finite number of categories. The easiest one of these is a go/no go gage. This gage simply tells you if the part passes or it fails. There are only two possible outcomes. Other attribute measurement systems can have multiple categories such as very good, good, poor and very poor. In this newsletter, we will use the simple go/no go gage to understand how an attribute gage R&R study works. This is the first in a series of newsletters on attribute gage R&R studies and focuses on comparing appraisers.  In this issue:

Many folks use the manual Measurement Systems Analysis, 3rd edition, to help them understand their Gage R&R studies. Information on this manual can be found at this website: www.aiag.org. This newsletter follows the procedures there but provides more details about the calculations.

Example Data

Suppose you are in charge of a production process that makes widgets. The process is not capable of meeting specifications. You produce widgets that are out of specification. The process is in control and, as of yet, your Black Belt group has not figured out how to make it capable of meeting specifications. Your only alternative, at this time, is to perform 100% inspection of the parts and separate the parts that are within specifications from those that are out of specifications.

You have selected an attribute go/no go gage to use. This gage will simply tell if the part is within specifications. It does not tell you how "close" the result is to the nominal; only that it is within specifications.

To determine the effectiveness of the go/no gage, you decide to conduct an attribute gage R&R study. You select three appraisers (Bob, Tom and Sally). You find 30 parts to use in the trial. Each of these parts was measured using a variable gage and rated as passing (within specifications) or failing (out of specification).

Each appraiser measures each part three times using the go/no go gage and the results are recorded. The parts must be run in random order without the appraiser knowing which parts he/she is measuring. In other words, randomize the 30 parts and have an appraiser measure each part. Then randomize the order again and repeat the measurement.

The results from the study are shown below. P indicates the part passed (within specifications), while F indicates that the part failed (out of specifications). The first column is the reference value for the part. It represents the "true" value of the part based on the variable gage measurements.

Table 1: Attribute Gage R&R Study Results

  Appraiser
Bob
Bob
Bob Tom Tom
Tom Sally
Sally
Sally
Reference
 Part/Trial 1 2 3 1 2 3 1 2 3
P 1 P P P P P P P P P
P 2 P P P P P P P P P
F 3 F F F F F F F F F
F 4 F F F F F F F F F
F 5 F F F F F F F F F
P 6 P P F P P F P F F
P 7 P P P P P P P F P
P 8 P P P P P P P P P
F 9 F F F F F F F F F
P 10 P P P P P P P P P
P 11 P P P P P P P P P
F 12 F F F F F F F P F
P 13 P P P P P P P P P
P 14 P P F P P P P F F
P 15 P P P P P P P P P
P 16 P P P P P P P P P
P 17 P P P P P P P P P
P 18 P P P P P P P P P
P 19 P P P  P P P P P P
P 20 P P P P P P P P P
P 21 P P F P F P F P F
F 22 F F P F P F P P F
P 23 P P P P P P P P P
P 24 P P P P P P P P P
F 25 F F F F F F F F F
F 26 F P F F F F F F P
P 27 P P P P P P P P P
P 28 P P P P P P P P P
P 29 P P P P P P P P P
F 30 F F F F F P F F F

Between Appraiser Comparisons

We will use a cross-tabulation table to compare appraisers to each other. There is a cross-tabulation table for each pair of appraisers. There would be three in this case: Bob compared to Tom, Bob compared to Sally, and Tom compared to Sally. We will demonstrate the calculations using Bob and Tom. The first thing to do is to examine how Bob and Tom appraised the parts. This is shown in the table below. As can be seen in the table, Bob and Tom agreed most of the time. There were 7 times out of 90 samples where they disagreed. These are shown in yellow and bold below.

Table 2: Comparing Bob and Tom

Part
Bob Tom
Part
Bob
Tom
Part
Bob
Tom
1 P P 11 P P 21 P P
1 P P 11 P P 21 P
F
1 P P 11 P P 21
F
P
2 P P 12 F F 22 F F
2 P P 12 F F 22
F
P
2 P P 12 F F 22
P
F
3 F F 13 P P 23 P P
3 F F 13 P P 23 P P
3 F F 13 P P 23 P P
4 F F 14 P P 24 P P
4 F F 14 P P 24 P P
4 F F 14
F
P
24 P P
5 F F 15 P P 25 F F
5 F F 15 P P 25 F F
5 F F 15 P P 25 F F
6 P P 16 P P 26 F F
6 P P 16 P P 26 P
F
6 F F 16 P P 26 F F
7 P P 17 P P 27 P P
7 P P 17 P P 27 P P
7 P P 17 P P 27 P P
8 P P 18 P P 28 P P
8 P P 18 P P 28 P P
8 P P 18 P P 28 P P
9 F F 19 P P 29 P P
9 F F 19 P P 29 P P
9 F F 19 P P 29 P P
10 P P 20 P P 30 F F
10 P P 20 P P 30 F F
10 P P 20 P P 30
F
P

A blank cross-tabulation table is shown below. 

Table 3: Blank Cross-Tabulation Table 

   Tom
Fail   Pass  Total
Bob Fail Count      
Expected      
Pass Count      
Expected      
  Total Count      
  Expected      

 

The first step is to determine how often each of the following occurred in the data

  • How often did both Bob and Tom pass the same part? - 59
  • How often did both Bob and Tom fail the same part? - 24
  • How did Bob pass the part and Tom fail the part? - 3
  • How often did Bob fail the part and Tom pass the part? - 4

These results are then added into the table and the row and column totals calculated as shown below.

Table 4: Cross-Tabulation Table with Counts Added

   Tom
Fail  
Pass 
Total
Bob
Fail
Count
24 4 28
Expected
     
Pass
Count
3 59 62
Expected
     
  Total Count
27 63 90
  Expected
     

It is sometimes easier to see the differences in appraisers if one uses percentages as shown in the table below. Bob failed a part a total of 28 times. When Bob failed a part, Tom failed that same part 24 times out of 28 or 86% of the time. However, Tom passed that same part 4 times out of 28 or 14%. Bob passed a part a total of 62 times. When Bob passed a part, Tom passed that same part 59 times out of 62 or 95% of the time; Tom failed that part 3 times out of 62 times or 5% of the time.

Table 5: Counts as Percentages

  Tom
Fail
Pass
Total % Fail
% Pass
Bob
Fail
24 4 28 86% 14%
  Pass
3 59 62 5% 95%
  Total  27  63  90    

You can also look at columns to help understand the agreement. Tom failed a total of 27 parts. Bob also failed the same part 24 times, but passed 3 parts. Tom passed 63 parts; Bob agreed with him 59 times, but failed 4 of those Tom passed.

The next step is to determine the expected counts. This is the count you would expect if there was no difference between the two appraisers. This is done by using the row and column totals. The expected count for any cell above is RC/T where R is the row total and C is the column total. T is the overall total (90 in this example).

This can appear confusing. The expected value is based on the hypothesis of no association - that there is no difference between the appraisers. If this is true, then the proportion of counts in a single column is the same for all rows. Consider the column which is the shaded column in the table above. Under this hypothesis of no difference, both rows have the same probability that a count falls in this column. The best estimate of this common probability is the column total (27) divided by the overall total (90):

Probability for Shaded Column = Column Total/Overall Total = 27/90 = 0.3

Then the expected number of counts in top shaded cell is the total number of counts for that row times the probability:

Expected Count = Row Total * Column Probability = 28 * 0.3 = 8.4

The expected counts for the rest of the cells are shown in the table below.

Table 6: Cross-Tabulation Table with Expected Counts Added

   Tom
Fail  
Pass 
Total
Bob
Fail
Count
24 4 28
Expected
8.4 19.6 28
Pass
Count
3 59 62
Expected
18.6 43.4 62
  Total Count
27 63 90
  Expected
27 63 90

The cross-tabulation tables are designed so you can assess the level of agreement between the appraisers. The cross-tabulations tables for Bob and Sally and then Tom and Sally are shown below.

Table 8: Cross-Tabulation Table for Bob and Sally

  Sally
Fail  
Pass 
Total
Bob
Fail
Count
24 4 28
Expected
9.3 18.7 28
Pass
Count
6 56 62
Expected
20.7 41.3 62
  Total Count
27 63 90
  Expected
27 63 90

Table 9: Cross-Tabulation Table for Tom and Sally

   Sally
Fail  
Pass 
Total
Tom
Fail
Count
23 4 27
Expected
9.0 18.0 27
Pass
Count
7 56 63
Expected
21.0 42.0 63
  Total Count
30 60 90
  Expected
30 60 90

Kappa Values

A measure of agreement between appraisers can be found by using Cohen's kappa value. This compares two appraisers who are measuring the same parts. Kappa can range from 1 to -1. A kappa value of 1 represents perfect agreement between the two appraisers. A kappa value of -1 represents perfect disagreement between the two appraisers. A kappa value of 0 says that agreement represents that expected by chance alone. So, kappa values close to 1 are desired.

Kappa is calculated using the following equation:

kappa = (po -pe)/(1 - pe)

where

po = the sum of the actual counts in the diagonal cells/overall total

pe = the sum of the expected counts in the diagonal cells/over total

The sum of counts in the diagonal cells is the sum of the counts where the appraisers agreed (both either passed or failed a part). The sum of expected counts is the same thing but you use the expected counts instead of the counts.

Using Bob and Tom's data, the value of kappa is calculated as shown below.

po = (24 + 59)/90 =0 .922

 

pe = (8.4 + 43.9)/90 = 0.576

 

kappa = (po - pe)/(1 - pe) = (0.922 - 0.576)/(1 - 0.576) = 0.82

The table below summaries the calculations of kappa for the three cases.

Table 10: Kappa Values

  Bob
Tom
Sally
Bob - 0.82 0.75
Tom
0.82 - 0.72
Sally
0.75 0.72 -

The MSA manual reference above says:

"A general rule of thumb is that values of kappa greater than 0.75 indicate good to excellent agreement (with a maximum kappa = 1); values les than 0.40 indicate poor agreement."

Based on these results, the appraisers are very near that 0.75 mark that indicate good to excellent agreement.

Another article (Landis, J.R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159-174) provides the following interpretation of kappa:

 

  • Poor agreement = Less than 0.20
  • Fair agreement = 0.20 to 0.40
  • Moderate agreement = 0.40 to 0.60
  • Good agreement = 0.60 to 0.80
  • Very good agreement = 0.80 to 1.00

Next month we will continue the newsletters on attribute gage R&R studies. We will make use of the reference column in the data above - the "true" value of the part and see how each appraiser stacks up against the reference. We will then look at the confidence intervals for each appraiser. Hope to have you back then.

Quick Links

Preview Upcoming Release of SPC for Excel Version 5

Visit our home page

SPC for Excel Software

Online Videos of How the SPC for Excel Software Works

Measurement Systems Analysis (Gage R&R)

Customer Complaint SPC Software

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese's profile on LinkedIn