January 2013

This is the first of a multi-part newsletter on rational subgrouping – a very important, yet often forgotten, part of statistical process control (SPC).  Far too often, people do not give enough (or any) thought about how to subgroup their data when constructing an  X-R control chart or any other control chart that involves putting the data into subgroups.  One needs to remember that control charts are really a study of the variation in your process.  And the variation displayed on the control chart depends on how you subgroup your data – which may or may not be the variation you would like to study.  We will use the sport of golf to help us explore rational subgrouping in this newsletter.

In this issue:

 As always, please feel free to leave a comment at the end of the newsletter.

Rational Subgrouping

So, what is rational subgrouping?  Lloyd Nelson defined rational subgrouping as “a sample in which all of the items are produced under conditions in which only random effects are responsible for the observed variation" (Control Charts: Rational Subgroups and Effective Applications, Journal of Quality Technology, Vol. 20, No. 1, Jan. 1988).  This is the premise behind rational subgrouping – the results that are combined into the same subgroup can be logically thought to have been obtained or produced under essentially the same conditions.  We will start our discussion of rational subgrouping by considering the sport of golf.   This analogy will help us see how rational subgrouping works – and give us more insight into understanding how X-R control charts work.

 

Golf and Rational Subgrouping

golf clubsGolf is a great game – at least some folks think so.  Imagine that you are a pro golfer – getting to fly around the world to beautiful golf courses throughout the year and compete for million dollar purses in golf tournaments.  Golf tournaments consist of four rounds of 18 holes of golf played over four days.  So, suppose you are a golf pro – like Phil Mickelson.  You want to monitor whether your golf score is getting better.  How could you do this?

One method would be to track your average tournament score, i.e., the average of the four rounds.  You would like to see this score get lower since lower scores improve your ability to earn money in the pros.  You would also probably be interested in your consistency, i.e., how close the four rounds in a tournament are.  You won’t like a lot of variation your results – shooting a 72 one day and a 90 the next.  So, you could also track the range in your golf scores for a tournament.  The perfect chart to do this is the  X-R control chart.

Like all control charts, the X-R control chart examines variation.  To use an X-R control chart, you need determine how to subgroup the data.  You should never just take the existing data and put it into a subgroup size of 4 or 5 simply because you read that most of the time a subgroup size of 4 or 5 is used with the X-R control chart.  It makes sense (one of the tenets of rational subgrouping), with golf, to subgroup the data by tournament.  The four rounds of golf for a single tournament form a subgroup.  This will allow us to explore the variation we are interested in: our average score and our consistency.

 

Golf Data

Suppose the first tournament you play for the year has the following results for the rounds: 70, 76, 70, and 73.  Pretty nice four rounds of golf for you!  These four rounds form the first subgroup.  You can calculate the subgroup average (the average of the four rounds of golf):

Golf Average for Subgroup 1

You can continue this for each tournament you are in and plot the results on an X control chart.  This chart  shows you have much variation there is in your average tournament score from tournament to tournament – this is the variation between subgroups and is sometimes called the long-term variation.

You can also calculate the range of this first tournament of the year (the first subgroup).  The range is simply the highest score (76) minus the lowest score (70).  So, the range for the first subgroup is:

Range = Highest – Lowest  = 76 – 70  = 6

You are pretty consistent!  You can continue this for each tournament you are in and plot the results on the R chart.  This chart shows you how much variation there is in your scores within a tournament from tournament to tournament – this is the variation within a subgroup and is sometimes called the short-term variation.

Table 1 shows your scores for your last 18 tournaments.  The tournament (subgroup) averages and ranges have also been calculated.  The overall average for the 18 tournaments and the average range have also been calculated as shown in the table.

Table 1: Tournament Scores

Tourn. No.

Round 1

Round 2

Round 3

Round 4

Average

Range

1

70

76

70

73

72.25

6

2

72

66

71

73

70.5

7

3

68

67

69

71

68.75

4

4

68

68

72

67

68.75

5

5

71

69

72

68

70

4

6

71

67

75

77

72.5

10

7

69

76

70

71

71.5

7

8

67

71

67

67

68

4

9

70

68

71

68

69.25

3

10

70

71

66

74

70.25

8

11

67

71

70

69

69.25

4

12

75

66

73

73

71.75

9

13

73

71

70

75

72.25

5

14

66

68

71

78

70.75

12

15

73

69

73

67

70.5

6

16

69

65

67

76

69.25

11

17

72

71

70

67

70

5

18

69

72

68

74

70.75

6

    

Sum

1266.25

116

    

Average

70.34722

6.470588

 

One key to understanding  X-R control charts is to understand that the two charts are monitoring different sources of variation.  The  X control chart is examining the variation in subgroup averages over time and will let you know if these subgroup averages are consistent (in control – only common causes of variation present) or if any subgroup averages fall outside the “normal” variation (out of control – special cause of variation present). 

The R chart is examining the variation within the subgroup over time and will let you know if this within subgroup variation is consistent over time.  If you are new to control charts, please review our newsletter on the purpose of control charts.

 

Golf and Control Charts

As mentioned before, the subgroup averages are plotted on the  X control chart and the subgroup ranges are plotted on the R chart.  Figure 1 is the  X run chart for these 18 tournaments.  Figure 2 is the R chart for the tournaments.  The averages are plotted on each chart as well, but the control limits have not yet been added.

Figure 1: X Run Chart for Golf Data

 Xbar Chart for Golf

Figure 2: R Run Chart for Golf Data

 Range Chart for Golf

Whenever you look at a control chart, the first question you should ask yourself is

“What variation is this chart examining?”

If you can’t answer this question, then the control chart is nonsense – it will not tell you anything at all.  Throw it out and start over with a discussion of rational subgrouping.  Figure 1 is measuring the variation in tournament averages from tournament to tournament.  You can see this because you are plotting the tournament averages over time.  In control chart language:

“The  X control chart is monitoring the variation in the subgroup averages from subgroup to subgroup.”

Figure 2 is measuring the variation within the four rounds in a single tournament from tournament to tournament.  You are plotting this range (the maximum minus the minimum value) over time.  In control chart language:

“The R chart is monitoring the variation within the subgroup from subgroup to subgroup.”

The next step is to add the control limits.  The control limits for the  X and R control charts are shown below. 

Xbar-R control chart limits

where D4, D3, and A2, are control chart constants that depend on subgroup size (see our newsletter on  X-R control charts).  Our subgroup size in this case is 4, since there are four rounds per tournament.   R is the average on the range chart and x double bar is the average of the X control chart.

One important item to note on the control limits: the average range is used in the calculation of the control limits for the X control chart.  This means that the “short-term” variation from the range chart is used to set the control limits for the “long-term” variation on the X control chart.  We will come back to this point next month.

Figure 3 is the X control chart with the control limits added.  Figure 4 is the R chart with the control limits added. 

Figure 3: X Control Chart for Golf Data

Xbar_Control_Chart_Golf

Figure 4: R Control Chart for Golf Data

Range_Chart_Golf

 

Remember that the upper control limit (UCL) represents the largest value we would expect from the process if there are only common causes of variation present.  In this example, the UCL = 75.04.  This means that the highest tournament average one would expect is 75 as long as only common causes of variation are present.  The LCL represents the smallest value we would expect from the process if only common causes of variation are present.  The LCL = 65.61.  This means the smallest tournament average we would expect is about 65.6 as long as only common causes of variation are present.

As long as there are no points beyond the control limits and no patterns, the X control chart is in statistical control.  This is the case as seen in Figure 3.  This means that the variation in the subgroup averages (tournament averages) is consistent over time.  The process is consistent and predictable.  It means that we can predict what will happen in the future.  We don’t know what the next tournament average will be, but we do know that it will be between 65.61 and 75.04 with a long term average of 70.32 as long as the process stays the same.  It also tells us that our golf score is not getting lower over time.

Figure 4 is the range chart with the control limits added.  Remember that this chart is monitoring the variation in the range of scores from a single tournament.  The average range (6.47) is the centerline on the chart.   The upper control limit (14.77) is also plotted.  In this example, there is no lower control limit.   Since there are no points beyond the control limits or patterns, the R chart is in statistical control.  This means that we can predict what will happen in the future.  We don’t know the exact range of scores that will occur in the next tournament, but we do know that it will vary between 0 and 14 with a long term average of 6.47 as long as the process stays the same.

Once a process is in statistical control, the only way to improve it is to fundamentally change the process.  If we want our average score to decrease or to be more consistent, we have to change the way we do things.  Fundamental changes could include a different golf coach, a new swing, or new clubs.

Phil Mickelson

Phil MickelsonThe tournament scores in Table 1 look like they were made by a pretty good golfer.  They were.  The data are the PGA tournament scores for Phil Mickelson in 2010.  He was pretty consistent in 2010.   How has he been since then? 

Table 2 shows his PGA tournament rounds for those tournaments where he made the cut from 2010 through 2012. He did not miss the cut on too many tournaments.  The data are from www.espn.com.

Phil (shown in the Associated Press photo to the right) turned pro in 1992.  He is 42 years now.  He made $4.2 million in 2012 on the tour.  Not bad! The control charts based on the data in the table are shown below. 

 

Table 2: Phil Mickelson Golf Scores

 

DatePGA Tournament (Tour) Tourn. No.Round 1Round 2Round 3Round 4
Jan 28 - 31  (2010)Farmers Insurance Open 170767073
Feb 4 - 7 Northern Trust Open 272667173
Feb 11 - 14 AT&T Pebble Beach National Pro-Am 368676971
Feb 25 - 28 Waste Management Phoenix Open 468687267
Mar 11 - 14 WGC-CA Championship 571697268
Mar 25 - 28 Arnold Palmer Invitational 671677577
Apr 1 - 4 Shell Houston Open 769767071
Apr 8 - 11 The Masters 867716767
Apr 29 - May 2 Quail Hollow Championship 970687168
May 6 - 9 THE PLAYERS Championship 1070716674
Jun 3 - 6 the Memorial Tournament 1167717069
Jun 17 - 20 U.S. Open Championship 1275667373
Jul 15 - 18 The Open Championship 1373717075
Aug 5 - 8 WGC-Bridgestone Invitational 1466687178
Aug 12 - 15 PGA Championship 1573697367
Sep 3 - 6 Deutsche Bank Championship 1669656776
Sep 9 - 12 BMW Championship 1772717067
Sep 23 - 26 THE TOUR Championship 1869726874
Jan 27 - 30  (2011)Farmers Insurance Open 1967696869
Feb 3 - 7 Waste Management Phoenix Open 2067657171
Feb 10 - 13 AT&T Pebble Beach National Pro-Am 2171676971
Feb 17 - 20 Northern Trust Open 2271707468
Mar 10 - 13 World Golf Championships-Cadillac Championship 2373717276
Mar 24 - 27 Arnold Palmer Invitational presented by Mastercard 2470756973
Mar 31 - Apr 3 Shell Houston Open 2570706365
Apr 7 - 10 The Masters 2670727174
May 5 - 8 Wells Fargo Championship 2769667469
May 12 - 15 THE PLAYERS Championship 2871716972
Jun 2 - 5 The Memorial Tournament 2972707267
Jun 16 - 19 U.S. Open Championship 3074697771
Jul 14 - 17 The Open Championship 3170697168
Aug 4 - 7 World Golf Championships-Bridgestone Invitational 3267737172
Aug 11 - 14 PGA Championship 3371706970
Aug 25 - 27 The Barclays 3467706868
Sep 2 - 5 Deutsche Bank Championship 3570736369
Sep 15 - 18 BMW Championship 3672737175
Sep 22 - 25 TOUR Championship by Coca-Cola 3768706771
1/19 - 1/22 (2012)Humana Challenge3874696669
2/2 - 2/5 Waste Management Phoenix Open3968706773
2/9 - 2/12 AT&T Pebble Beach National Pro-Am4070657064
2/16 - 2/19 Northern Trust Open4166707071
3/8 - 3/11 World Golf Championships-Cadillac Championship4272717171
3/22 - 3/25 Arnold Palmer Invitational4373717172
3/29 - 4/1 Shell Houston Open4465707071
4/5 - 4/8 The Masters Tournament4574686672
5/3 - 5/6 Wells Fargo Championship4671726871
5/10 - 5/13 THE PLAYERS Championship4771717073
5/17 - 5/20 HP Byron Nelson Championship4870696966
6/14 - 6/17 U.S. Open Golf Championship4976717178
8/2 - 8/5 World Golf Championships-Bridgestone Invitational5071697371
8/9 - 8/12 PGA Championship5173717374
8/23 - 8/26 The Barclays5268746776
8/31 - 9/3 Deutsche Bank Championship5368686866
9/6 - 9/9 BMW Championship5469676470
9/20 - 9/23 TOUR Championship by Coca-Cola5569717269
11/1 - 11/4 World Golf Championships-HSBC Champions5666696668

 

Figures 5 and 6 show the X and R control charts with the control limits based on the 2010 data.  Still a pretty consistent golfer!  Note that in Figure 6, there is a run of 8 points below the average.  Something happened to make his game more consistent from that point on.  We should probably re-calculate the control limits from that point on.  But I will leave that to Phil.  Or, if you would like to do it, e-mail me and I will send you the data in an Excel workbook.

Figure 5: Phil Mickelson X Control Chart

 Phil Mickelson Xbar Chart

Figure 6: Phil Mickelson R Control Chart

 Phil Mickelson R Chart

 

Summary

Rational subgrouping provides the context you use to interpret a control chart.  The first question you ask yourself when examining a control chart is “What variation is this chart examining?”  This is the key to using control charts effectively.  Without effective rational subgrouping, control charts can be simple nonsense.

We used the X-R control chart in this newsletter.  The X-R control chart is a powerful tool for examining sources of variation but it is critical to set up the chart to explore the variation you are interested in.  In this golf example,  we were interested in monitoring our average tournament score as well as our consistency.  The chart we used was setup to examine that variation.  Note that, if we had plotted each individual tournament result, we would have been looking at different sources of variation.  This is the principle behind rational subgrouping. 

Next month we will take a more in-depth look at rational subgrouping including some rules to follow. 

 

Quick Links

SPC for Excel Software

Visit our home page

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese's profile on LinkedIn

Connect with Us

       

Leave a comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h1> <h2> <h3> <h4> <h5> <h6> <img> <hr> <div> <span> <strike> <b> <i> <u> <table> <tbody> <tr> <td> <th>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Connect with Us

SPC Knowledge Base Sign-up

Click here to sign up for our FREE monthly publication, featuring SPC and other statistical topics, case studies and more!

SPC Around the World

SPC for Excel is used in over 60 countries internationally.  Click here for a list of those countries.