Attribute Gage R&R Studies: Comparing Appraisers

May 2010

Sometimes a measurement system has a measurement value that comes from a finite number of categories. The easiest one of these is a go/no go gage. This gage simply tells you if the part passes or it fails. There are only two possible outcomes. Other attribute measurement systems can have multiple categories such as very good, good, poor and very poor. In this newsletter, we will use the simple go/no go gage to understand how an attribute gage R&R study works. This is the first in a series of newsletters on attribute gage R&R studies and focuses on comparing appraisers. In this issue:

Example Data
Between Appraiser Comparisons
Kappa Values
Summary
Quick Links

Many folks use the manual Measurement Systems Analysis, 3rd edition, to help them understand their Gage R&R studies. Information on this manual can be found at this website: www.aiag.org. This newsletter follows the procedures there but provides more details about the calculations.

Example Data

Suppose you are in charge of a production process that makes widgets. The process is not capable of meeting specifications. You produce widgets that are out of specification. The process is in control and, as of yet, your Black Belt group has not figured out how to make it capable of meeting specifications. Your only alternative, at this time, is to perform 100% inspection of the parts and separate the parts that are within specifications from those that are out of specifications.

You have selected an attribute go/no go gage to use. This gage will simply tell if the part is within specifications. It does not tell you how “close” the result is to the nominal; only that it is within specifications.

To determine the effectiveness of the go/no gage, you decide to conduct an attribute gage R&R study. You select three appraisers (Bob, Tom and Sally). You find 30 parts to use in the trial. Each of these parts was measured using a variable gage and rated as passing (within specifications) or failing (out of specification).

Each appraiser measures each part three times using the go/no go gage and the results are recorded. The parts must be run in random order without the appraiser knowing which parts he/she is measuring. In other words, randomize the 30 parts and have an appraiser measure each part. Then randomize the order again and repeat the measurement.

The results from the study are shown below. P indicates the part passed (within specifications), while F indicates that the part failed (out of specifications). The first column is the reference value for the part. It represents the “true” value of the part based on the variable gage measurements.

Table 1: Attribute Gage R&R Study Results

	Appraiser	Bob	Bob	Bob	Tom	Tom	Tom	Sally	Sally	Sally
Reference	Part/Trial	1	2	3	1	2	3	1	2	3
P	1	P	P	P	P	P	P	P	P	P
P	2	P	P	P	P	P	P	P	P	P
F	3	F	F	F	F	F	F	F	F	F
F	4	F	F	F	F	F	F	F	F	F
F	5	F	F	F	F	F	F	F	F	F
P	6	P	P	F	P	P	F	P	F	F
P	7	P	P	P	P	P	P	P	F	P
P	8	P	P	P	P	P	P	P	P	P
F	9	F	F	F	F	F	F	F	F	F
P	10	P	P	P	P	P	P	P	P	P
P	11	P	P	P	P	P	P	P	P	P
F	12	F	F	F	F	F	F	F	P	F
P	13	P	P	P	P	P	P	P	P	P
P	14	P	P	F	P	P	P	P	F	F
P	15	P	P	P	P	P	P	P	P	P
P	16	P	P	P	P	P	P	P	P	P
P	17	P	P	P	P	P	P	P	P	P
P	18	P	P	P	P	P	P	P	P	P
P	19	P	P	P	P	P	P	P	P	P
P	20	P	P	P	P	P	P	P	P	P
P	21	P	P	F	P	F	P	F	P	F
F	22	F	F	P	F	P	F	P	P	F
P	23	P	P	P	P	P	P	P	P	P
P	24	P	P	P	P	P	P	P	P	P
F	25	F	F	F	F	F	F	F	F	F
F	26	F	P	F	F	F	F	F	F	P
P	27	P	P	P	P	P	P	P	P	P
P	28	P	P	P	P	P	P	P	P	P
P	29	P	P	P	P	P	P	P	P	P
F	30	F	F	F	F	F	P	F	F	F

Between Appraiser Comparisons

We will use a cross-tabulation table to compare appraisers to each other. There is a cross-tabulation table for each pair of appraisers. There would be three in this case: Bob compared to Tom, Bob compared to Sally, and Tom compared to Sally. We will demonstrate the calculations using Bob and Tom. The first thing to do is to examine how Bob and Tom appraised the parts. This is shown in the table below. As can be seen in the table, Bob and Tom agreed most of the time. There were 7 times out of 90 samples where they disagreed. These are shown in yellow and bold below.

Table 2: Comparing Bob and Tom

Part	Bob	Tom	Part	Bob	Tom	Part	Bob	Tom
1	P	P	11	P	P	21	P	P
1	P	P	11	P	P	21	P	F
1	P	P	11	P	P	21	F	P
2	P	P	12	F	F	22	F	F
2	P	P	12	F	F	22	F	P
2	P	P	12	F	F	22	P	F
3	F	F	13	P	P	23	P	P
3	F	F	13	P	P	23	P	P
3	F	F	13	P	P	23	P	P
4	F	F	14	P	P	24	P	P
4	F	F	14	P	P	24	P	P
4	F	F	14	F	P	24	P	P
5	F	F	15	P	P	25	F	F
5	F	F	15	P	P	25	F	F
5	F	F	15	P	P	25	F	F
6	P	P	16	P	P	26	F	F
6	P	P	16	P	P	26	P	F
6	F	F	16	P	P	26	F	F
7	P	P	17	P	P	27	P	P
7	P	P	17	P	P	27	P	P
7	P	P	17	P	P	27	P	P
8	P	P	18	P	P	28	P	P
8	P	P	18	P	P	28	P	P
8	P	P	18	P	P	28	P	P
9	F	F	19	P	P	29	P	P
9	F	F	19	P	P	29	P	P
9	F	F	19	P	P	29	P	P
10	P	P	20	P	P	30	F	F
10	P	P	20	P	P	30	F	F
10	P	P	20	P	P	30	F	P

A blank cross-tabulation table is shown below.

Table 3: Blank Cross-Tabulation Table

			Tom
			Fail	Pass	Total
Bob	Fail	Count
	Fail	Expected
	Pass	Count
	Pass	Expected
	Total	Count
	Total	Expected

The first step is to determine how often each of the following occurred in the data

How often did both Bob and Tom pass the same part? – 59
How often did both Bob and Tom fail the same part? – 24
How did Bob pass the part and Tom fail the part? – 3
How often did Bob fail the part and Tom pass the part? – 4

These results are then added into the table and the row and column totals calculated as shown below.

Table 4: Cross-Tabulation Table with Counts Added

			Tom
			Fail	Pass	Total
Bob	Fail	Count	24	4	28
	Fail	Expected
	Pass	Count	3	59	62
	Pass	Expected
	Total	Count	27	63	90
	Total	Expected

It is sometimes easier to see the differences in appraisers if one uses percentages as shown in the table below. Bob failed a part a total of 28 times. When Bob failed a part, Tom failed that same part 24 times out of 28 or 86% of the time. However, Tom passed that same part 4 times out of 28 or 14%. Bob passed a part a total of 62 times. When Bob passed a part, Tom passed that same part 59 times out of 62 or 95% of the time; Tom failed that part 3 times out of 62 times or 5% of the time.

Table 5: Counts as Percentages

		Tom
		Fail	Pass	Total	% Fail	% Pass
Bob	Fail	24	4	28	86%	14%
	Pass	3	59	62	5%	95%
	Total	27	63	90

You can also look at columns to help understand the agreement. Tom failed a total of 27 parts. Bob also failed the same part 24 times, but passed 3 parts. Tom passed 63 parts; Bob agreed with him 59 times, but failed 4 of those Tom passed.

The next step is to determine the expected counts. This is the count you would expect if there was no difference between the two appraisers. This is done by using the row and column totals. The expected count for any cell above is RC/T where R is the row total and C is the column total. T is the overall total (90 in this example).

This can appear confusing. The expected value is based on the hypothesis of no association – that there is no difference between the appraisers. If this is true, then the proportion of counts in a single column is the same for all rows. Consider the column which is the shaded column in the table above. Under this hypothesis of no difference, both rows have the same probability that a count falls in this column. The best estimate of this common probability is the column total (27) divided by the overall total (90):

Probability for Shaded Column = Column Total/Overall Total = 27/90 = 0.3

Then the expected number of counts in top shaded cell is the total number of counts for that row times the probability:

Expected Count = Row Total * Column Probability = 28 * 0.3 = 8.4

The expected counts for the rest of the cells are shown in the table below.

Table 6: Cross-Tabulation Table with Expected Counts Added

			Tom
			Fail	Pass	Total
Bob	Fail	Count	24	4	28
	Fail	Expected	8.4	19.6	28
	Pass	Count	3	59	62
	Pass	Expected	18.6	43.4	62
	Total	Count	27	63	90
	Total	Expected	27	63	90

The cross-tabulation tables are designed so you can assess the level of agreement between the appraisers. The cross-tabulations tables for Bob and Sally and then Tom and Sally are shown below.

Table 8: Cross-Tabulation Table for Bob and Sally

			Sally
			Fail	Pass	Total
Bob	Fail	Count	24	4	28
	Fail	Expected	9.3	18.7	28
	Pass	Count	6	56	62
	Pass	Expected	20.7	41.3	62
	Total	Count	27	63	90
	Total	Expected	27	63	90

Table 9: Cross-Tabulation Table for Tom and Sally

			Sally
			Fail	Pass	Total
Tom	Fail	Count	23	4	27
	Fail	Expected	9.0	18.0	27
	Pass	Count	7	56	63
	Pass	Expected	21.0	42.0	63
	Total	Count	30	60	90
	Total	Expected	30	60	90

Kappa Values

A measure of agreement between appraisers can be found by using Cohen’s kappa value. This compares two appraisers who are measuring the same parts. Kappa can range from 1 to -1. A kappa value of 1 represents perfect agreement between the two appraisers. A kappa value of -1 represents perfect disagreement between the two appraisers. A kappa value of 0 says that agreement represents that expected by chance alone. So, kappa values close to 1 are desired.

Kappa is calculated using the following equation:

kappa = (p_o -p_e)/(1 – p_e)

where

p_o = the sum of the actual counts in the diagonal cells/overall total

p_e = the sum of the expected counts in the diagonal cells/over total

The sum of counts in the diagonal cells is the sum of the counts where the appraisers agreed (both either passed or failed a part). The sum of expected counts is the same thing but you use the expected counts instead of the counts.

Using Bob and Tom’s data, the value of kappa is calculated as shown below.

p_o = (24 + 59)/90 =0 .922

p_e = (8.4 + 43.9)/90 = 0.576

kappa = (p_o – p_e)/(1 – p_e) = (0.922 – 0.576)/(1 – 0.576) = 0.82

The table below summaries the calculations of kappa for the three cases.

Table 10: Kappa Values

	Bob	Tom	Sally
Bob	–	0.82	0.75
Tom	0.82	–	0.72
Sally	0.75	0.72	–

The MSA manual reference above says:

“A general rule of thumb is that values of kappa greater than 0.75 indicate good to excellent agreement (with a maximum kappa = 1); values les than 0.40 indicate poor agreement.”

Based on these results, the appraisers are very near that 0.75 mark that indicate good to excellent agreement.

Another article (Landis, J.R. and Koch, G. G. (1977) “The measurement of observer agreement for categorical data” in Biometrics. Vol. 33, pp. 159-174) provides the following interpretation of kappa:

Poor agreement = Less than 0.20
Fair agreement = 0.20 to 0.40
Moderate agreement = 0.40 to 0.60
Good agreement = 0.60 to 0.80
Very good agreement = 0.80 to 1.00

Next month we will continue the newsletters on attribute gage R&R studies. We will make use of the reference column in the data above – the “true” value of the part and see how each appraiser stacks up against the reference. We will then look at the confidence intervals for each appraiser. Hope to have you back then.

Summary

This month’s publication introduced the attribute Gage R&R analysis. How to run an attribute Gage R&R was covered as well as the calculations of expected counts. The calculation of Kappa values was presented along with how these values are interpreted.

Quick Links

Thanks so much for reading our SPC Knowledge Base. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

Connect with Us

Measurement Systems Analysis/Gage R&R

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments