May 2010

Sometimes a measurement system has a measurement value that comes from a finite number of categories. The easiest one of these is a go/no go gage. This gage simply tells you if the part passes or it fails. There are only two possible outcomes. Other attribute measurement systems can have multiple categories such as very good, good, poor and very poor. In this newsletter, we will use the simple go/no go gage to understand how an attribute gage R&R study works. This is the first in a series of newsletters on attribute gage R&R studies and focuses on comparing appraisers.  In this issue:

Many folks use the manual Measurement Systems Analysis, 3rd edition, to help them understand their Gage R&R studies. Information on this manual can be found at this website: www.aiag.org. This newsletter follows the procedures there but provides more details about the calculations.

Example Data

Suppose you are in charge of a production process that makes widgets. The process is not capable of meeting specifications. You produce widgets that are out of specification. The process is in control and, as of yet, your Black Belt group has not figured out how to make it capable of meeting specifications. Your only alternative, at this time, is to perform 100% inspection of the parts and separate the parts that are within specifications from those that are out of specifications.

You have selected an attribute go/no go gage to use. This gage will simply tell if the part is within specifications. It does not tell you how "close" the result is to the nominal; only that it is within specifications.

To determine the effectiveness of the go/no gage, you decide to conduct an attribute gage R&R study. You select three appraisers (Bob, Tom and Sally). You find 30 parts to use in the trial. Each of these parts was measured using a variable gage and rated as passing (within specifications) or failing (out of specification).

Each appraiser measures each part three times using the go/no go gage and the results are recorded. The parts must be run in random order without the appraiser knowing which parts he/she is measuring. In other words, randomize the 30 parts and have an appraiser measure each part. Then randomize the order again and repeat the measurement.

The results from the study are shown below. P indicates the part passed (within specifications), while F indicates that the part failed (out of specifications). The first column is the reference value for the part. It represents the "true" value of the part based on the variable gage measurements.

Table 1: Attribute Gage R&R Study Results

 Appraiser
Bob
Bob
BobTomTom
TomSally
Sally
Sally
Reference
 Part/Trial123123123
P1PPPPPPPPP
P2PPPPPPPPP
F3FFFFFFFFF
F4FFFFFFFFF
F5FFFFFFFFF
P6PPFPPFPFF
P7PPPPPPPFP
P8PPPPPPPPP
F9FFFFFFFFF
P10PPPPPPPPP
P11PPPPPPPPP
F12FFFFFFFPF
P13PPPPPPPPP
P14PPFPPPPFF
P15PPPPPPPPP
P16PPPPPPPPP
P17PPPPPPPPP
P18PPPPPPPPP
P19PPP PPPPPP
P20PPPPPPPPP
P21PPFPFPFPF
F22FFPFPFPPF
P23PPPPPPPPP
P24PPPPPPPPP
F25FFFFFFFFF
F26FPFFFFFFP
P27PPPPPPPPP
P28PPPPPPPPP
P29PPPPPPPPP
F30FFFFFPFFF

Between Appraiser Comparisons

We will use a cross-tabulation table to compare appraisers to each other. There is a cross-tabulation table for each pair of appraisers. There would be three in this case: Bob compared to Tom, Bob compared to Sally, and Tom compared to Sally. We will demonstrate the calculations using Bob and Tom. The first thing to do is to examine how Bob and Tom appraised the parts. This is shown in the table below. As can be seen in the table, Bob and Tom agreed most of the time. There were 7 times out of 90 samples where they disagreed. These are shown in yellow and bold below.

Table 2: Comparing Bob and Tom

Part
BobTom
Part
Bob
Tom
Part
Bob
Tom
1PP11PP21PP
1PP11PP21P
F
1PP11PP21
F
P
2PP12FF22FF
2PP12FF22
F
P
2PP12FF22
P
F
3FF13PP23PP
3FF13PP23PP
3FF13PP23PP
4FF14PP24PP
4FF14PP24PP
4FF14
F
P
24PP
5FF15PP25FF
5FF15PP25FF
5FF15PP25FF
6PP16PP26FF
6PP16PP26P
F
6FF16PP26FF
7PP17PP27PP
7PP17PP27PP
7PP17PP27PP
8PP18PP28PP
8PP18PP28PP
8PP18PP28PP
9FF19PP29PP
9FF19PP29PP
9FF19PP29PP
10PP20PP30FF
10PP20PP30FF
10PP20PP30
F
P

 

A blank cross-tabulation table is shown below. 

Table 3: Blank Cross-Tabulation Table 

  Tom
Fail  Pass Total
BobFailCount   
Expected   
PassCount   
Expected   
 TotalCount   
 Expected   

 

The first step is to determine how often each of the following occurred in the data

  • How often did both Bob and Tom pass the same part? - 59
  • How often did both Bob and Tom fail the same part? - 24
  • How did Bob pass the part and Tom fail the part? - 3
  • How often did Bob fail the part and Tom pass the part? - 4

These results are then added into the table and the row and column totals calculated as shown below.

Table 4: Cross-Tabulation Table with Counts Added

  Tom
Fail  
Pass 
Total
Bob
Fail
Count
24428
Expected
   
Pass
Count
35962
Expected
   
 TotalCount
276390
 Expected
   

 

It is sometimes easier to see the differences in appraisers if one uses percentages as shown in the table below. Bob failed a part a total of 28 times. When Bob failed a part, Tom failed that same part 24 times out of 28 or 86% of the time. However, Tom passed that same part 4 times out of 28 or 14%. Bob passed a part a total of 62 times. When Bob passed a part, Tom passed that same part 59 times out of 62 or 95% of the time; Tom failed that part 3 times out of 62 times or 5% of the time.

Table 5: Counts as Percentages

 Tom
Fail
Pass
Total% Fail
% Pass
Bob
Fail
2442886%14%
 Pass
359625%95%
 Total  27 63 90  

 

You can also look at columns to help understand the agreement. Tom failed a total of 27 parts. Bob also failed the same part 24 times, but passed 3 parts. Tom passed 63 parts; Bob agreed with him 59 times, but failed 4 of those Tom passed.

The next step is to determine the expected counts. This is the count you would expect if there was no difference between the two appraisers. This is done by using the row and column totals. The expected count for any cell above is RC/T where R is the row total and C is the column total. T is the overall total (90 in this example).

This can appear confusing. The expected value is based on the hypothesis of no association - that there is no difference between the appraisers. If this is true, then the proportion of counts in a single column is the same for all rows. Consider the column which is the shaded column in the table above. Under this hypothesis of no difference, both rows have the same probability that a count falls in this column. The best estimate of this common probability is the column total (27) divided by the overall total (90):

Probability for Shaded Column = Column Total/Overall Total = 27/90 = 0.3

Then the expected number of counts in top shaded cell is the total number of counts for that row times the probability:

Expected Count = Row Total * Column Probability = 28 * 0.3 = 8.4

The expected counts for the rest of the cells are shown in the table below.

Table 6: Cross-Tabulation Table with Expected Counts Added

  Tom
Fail  
Pass 
Total
Bob
Fail
Count
24428
Expected
8.419.628
Pass
Count
35962
Expected
18.643.462
 TotalCount
276390
 Expected
276390

 

The cross-tabulation tables are designed so you can assess the level of agreement between the appraisers. The cross-tabulations tables for Bob and Sally and then Tom and Sally are shown below.

Table 8: Cross-Tabulation Table for Bob and Sally

 Sally
Fail  
Pass 
Total
Bob
Fail
Count
24428
Expected
9.318.728
Pass
Count
65662
Expected
20.741.362
 TotalCount
276390
 Expected
276390

 

Table 9: Cross-Tabulation Table for Tom and Sally

  Sally
Fail  
Pass 
Total
Tom
Fail
Count
23427
Expected
9.018.027
Pass
Count
75663
Expected
21.042.063
 TotalCount
306090
 Expected
306090

Kappa Values

A measure of agreement between appraisers can be found by using Cohen's kappa value. This compares two appraisers who are measuring the same parts. Kappa can range from 1 to -1. A kappa value of 1 represents perfect agreement between the two appraisers. A kappa value of -1 represents perfect disagreement between the two appraisers. A kappa value of 0 says that agreement represents that expected by chance alone. So, kappa values close to 1 are desired.

Kappa is calculated using the following equation:

kappa = (po -pe)/(1 - pe)

where

po = the sum of the actual counts in the diagonal cells/overall total

pe = the sum of the expected counts in the diagonal cells/over total

The sum of counts in the diagonal cells is the sum of the counts where the appraisers agreed (both either passed or failed a part). The sum of expected counts is the same thing but you use the expected counts instead of the counts.

Using Bob and Tom's data, the value of kappa is calculated as shown below.

po = (24 + 59)/90 =0 .922

 

pe = (8.4 + 43.9)/90 = 0.576

 

kappa = (po - pe)/(1 - pe) = (0.922 - 0.576)/(1 - 0.576) = 0.82

The table below summaries the calculations of kappa for the three cases.

Table 10: Kappa Values

 Bob
Tom
Sally
Bob-0.820.75
Tom
0.82-0.72
Sally
0.750.72-

 

The MSA manual reference above says:

"A general rule of thumb is that values of kappa greater than 0.75 indicate good to excellent agreement (with a maximum kappa = 1); values les than 0.40 indicate poor agreement."

Based on these results, the appraisers are very near that 0.75 mark that indicate good to excellent agreement.

Another article (Landis, J.R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159-174) provides the following interpretation of kappa:

  • Poor agreement = Less than 0.20
  • Fair agreement = 0.20 to 0.40
  • Moderate agreement = 0.40 to 0.60
  • Good agreement = 0.60 to 0.80
  • Very good agreement = 0.80 to 1.00

Next month we will continue the newsletters on attribute gage R&R studies. We will make use of the reference column in the data above - the "true" value of the part and see how each appraiser stacks up against the reference. We will then look at the confidence intervals for each appraiser. Hope to have you back then.

Summary

This month’s publication introduced the attribute Gage R&R analysis. How to run an attribute Gage R&R was covered as well as the calculations of expected counts.  The calculation of Kappa values was presented along with how these values are interpreted.

Quick Links

SPC for Excel Software

Visit our home page

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese's profile on LinkedIn

Connect with Us