January 2013

This is the first of a multi-part newsletter on rational subgrouping – a very important, yet often forgotten, part of statistical process control (SPC).  Far too often, people do not give enough (or any) thought about how to subgroup their data when constructing an  X-R control chart or any other control chart that involves putting the data into subgroups.  One needs to remember that control charts are really a study of the variation in your process.  And the variation displayed on the control chart depends on how you subgroup your data – which may or may not be the variation you would like to study.  We will use the sport of golf to help us explore rational subgrouping in this newsletter.

In this issue:

 As always, please feel free to leave a comment at the end of the newsletter.

Rational Subgrouping

So, what is rational subgrouping?  Lloyd Nelson defined rational subgrouping as “a sample in which all of the items are produced under conditions in which only random effects are responsible for the observed variation" (Control Charts: Rational Subgroups and Effective Applications, Journal of Quality Technology, Vol. 20, No. 1, Jan. 1988).  This is the premise behind rational subgrouping – the results that are combined into the same subgroup can be logically thought to have been obtained or produced under essentially the same conditions.  We will start our discussion of rational subgrouping by considering the sport of golf.   This analogy will help us see how rational subgrouping works – and give us more insight into understanding how X-R control charts work.

 

Golf and Rational Subgrouping

golf clubsGolf is a great game – at least some folks think so.  Imagine that you are a pro golfer – getting to fly around the world to beautiful golf courses throughout the year and compete for million dollar purses in golf tournaments.  Golf tournaments consist of four rounds of 18 holes of golf played over four days.  So, suppose you are a golf pro – like Phil Mickelson.  You want to monitor whether your golf score is getting better.  How could you do this?

One method would be to track your average tournament score, i.e., the average of the four rounds.  You would like to see this score get lower since lower scores improve your ability to earn money in the pros.  You would also probably be interested in your consistency, i.e., how close the four rounds in a tournament are.  You won’t like a lot of variation your results – shooting a 72 one day and a 90 the next.  So, you could also track the range in your golf scores for a tournament.  The perfect chart to do this is the  X-R control chart.

Like all control charts, the X-R control chart examines variation.  To use an X-R control chart, you need determine how to subgroup the data.  You should never just take the existing data and put it into a subgroup size of 4 or 5 simply because you read that most of the time a subgroup size of 4 or 5 is used with the X-R control chart.  It makes sense (one of the tenets of rational subgrouping), with golf, to subgroup the data by tournament.  The four rounds of golf for a single tournament form a subgroup.  This will allow us to explore the variation we are interested in: our average score and our consistency.

 

Golf Data

Suppose the first tournament you play for the year has the following results for the rounds: 70, 76, 70, and 73.  Pretty nice four rounds of golf for you!  These four rounds form the first subgroup.  You can calculate the subgroup average (the average of the four rounds of golf):

Golf Average for Subgroup 1

You can continue this for each tournament you are in and plot the results on an X control chart.  This chart  shows you have much variation there is in your average tournament score from tournament to tournament – this is the variation between subgroups and is sometimes called the long-term variation.

You can also calculate the range of this first tournament of the year (the first subgroup).  The range is simply the highest score (76) minus the lowest score (70).  So, the range for the first subgroup is:

Range = Highest – Lowest  = 76 – 70  = 6

You are pretty consistent!  You can continue this for each tournament you are in and plot the results on the R chart.  This chart shows you how much variation there is in your scores within a tournament from tournament to tournament – this is the variation within a subgroup and is sometimes called the short-term variation.

Table 1 shows your scores for your last 18 tournaments.  The tournament (subgroup) averages and ranges have also been calculated.  The overall average for the 18 tournaments and the average range have also been calculated as shown in the table.

Table 1: Tournament Scores

Tourn. No.

Round 1

Round 2

Round 3

Round 4

Average

Range

1

70

76

70

73

72.25

6

2

72

66

71

73

70.5

7

3

68

67

69

71

68.75

4

4

68

68

72

67

68.75

5

5

71

69

72

68

70

4

6

71

67

75

77

72.5

10

7

69

76

70

71

71.5

7

8

67

71

67

67

68

4

9

70

68

71

68

69.25

3

10

70

71

66

74

70.25

8

11

67

71

70

69

69.25

4

12

75

66

73

73

71.75

9

13

73

71

70

75

72.25

5

14

66

68

71

78

70.75

12

15

73

69

73

67

70.5

6

16

69

65

67

76

69.25

11

17

72

71

70

67

70

5

18

69

72

68

74

70.75

6

       

Sum

1266.25

116

       

Average

70.34722

6.470588

 

One key to understanding  X-R control charts is to understand that the two charts are monitoring different sources of variation.  The  X control chart is examining the variation in subgroup averages over time and will let you know if these subgroup averages are consistent (in control – only common causes of variation present) or if any subgroup averages fall outside the “normal” variation (out of control – special cause of variation present). 

The R chart is examining the variation within the subgroup over time and will let you know if this within subgroup variation is consistent over time.  If you are new to control charts, please review our newsletter on the purpose of control charts.

 

Golf and Control Charts

As mentioned before, the subgroup averages are plotted on the  X control chart and the subgroup ranges are plotted on the R chart.  Figure 1 is the  X run chart for these 18 tournaments.  Figure 2 is the R chart for the tournaments.  The averages are plotted on each chart as well, but the control limits have not yet been added.

Figure 1: X Run Chart for Golf Data

 Xbar Chart for Golf

Figure 2: R Run Chart for Golf Data

 Range Chart for Golf

Whenever you look at a control chart, the first question you should ask yourself is

“What variation is this chart examining?”

If you can’t answer this question, then the control chart is nonsense – it will not tell you anything at all.  Throw it out and start over with a discussion of rational subgrouping.  Figure 1 is measuring the variation in tournament averages from tournament to tournament.  You can see this because you are plotting the tournament averages over time.  In control chart language:

“The  X control chart is monitoring the variation in the subgroup averages from subgroup to subgroup.”

Figure 2 is measuring the variation within the four rounds in a single tournament from tournament to tournament.  You are plotting this range (the maximum minus the minimum value) over time.  In control chart language:

“The R chart is monitoring the variation within the subgroup from subgroup to subgroup.”

The next step is to add the control limits.  The control limits for the  X and R control charts are shown below. 

Xbar-R control chart limits

where D4, D3, and A2, are control chart constants that depend on subgroup size (see our newsletter on  X-R control charts).  Our subgroup size in this case is 4, since there are four rounds per tournament.   R is the average on the range chart and x double bar is the average of the X control chart.

One important item to note on the control limits: the average range is used in the calculation of the control limits for the X control chart.  This means that the “short-term” variation from the range chart is used to set the control limits for the “long-term” variation on the X control chart.  We will come back to this point next month.

Figure 3 is the X control chart with the control limits added.  Figure 4 is the R chart with the control limits added. 

Figure 3: X Control Chart for Golf Data

Xbar_Control_Chart_Golf

Figure 4: R Control Chart for Golf Data

Range_Chart_Golf

 

Remember that the upper control limit (UCL) represents the largest value we would expect from the process if there are only common causes of variation present.  In this example, the UCL = 75.04.  This means that the highest tournament average one would expect is 75 as long as only common causes of variation are present.  The LCL represents the smallest value we would expect from the process if only common causes of variation are present.  The LCL = 65.61.  This means the smallest tournament average we would expect is about 65.6 as long as only common causes of variation are present.

As long as there are no points beyond the control limits and no patterns, the X control chart is in statistical control.  This is the case as seen in Figure 3.  This means that the variation in the subgroup averages (tournament averages) is consistent over time.  The process is consistent and predictable.  It means that we can predict what will happen in the future.  We don’t know what the next tournament average will be, but we do know that it will be between 65.61 and 75.04 with a long term average of 70.32 as long as the process stays the same.  It also tells us that our golf score is not getting lower over time.

Figure 4 is the range chart with the control limits added.  Remember that this chart is monitoring the variation in the range of scores from a single tournament.  The average range (6.47) is the centerline on the chart.   The upper control limit (14.77) is also plotted.  In this example, there is no lower control limit.   Since there are no points beyond the control limits or patterns, the R chart is in statistical control.  This means that we can predict what will happen in the future.  We don’t know the exact range of scores that will occur in the next tournament, but we do know that it will vary between 0 and 14 with a long term average of 6.47 as long as the process stays the same.

Once a process is in statistical control, the only way to improve it is to fundamentally change the process.  If we want our average score to decrease or to be more consistent, we have to change the way we do things.  Fundamental changes could include a different golf coach, a new swing, or new clubs.

Phil Mickelson

Phil MickelsonThe tournament scores in Table 1 look like they were made by a pretty good golfer.  They were.  The data are the PGA tournament scores for Phil Mickelson in 2010.  He was pretty consistent in 2010.   How has he been since then? 

Table 2 shows his PGA tournament rounds for those tournaments where he made the cut from 2010 through 2012. He did not miss the cut on too many tournaments.  The data are from www.espn.com.

Phil (shown in the Associated Press photo to the right) turned pro in 1992.  He is 42 years now.  He made $4.2 million in 2012 on the tour.  Not bad! The control charts based on the data in the table are shown below. 

 

Table 2: Phil Mickelson Golf Scores

 

Date PGA Tournament (Tour)  Tourn. No. Round 1 Round 2 Round 3 Round 4
Jan 28 - 31  (2010) Farmers Insurance Open  1 70 76 70 73
Feb 4 - 7  Northern Trust Open  2 72 66 71 73
Feb 11 - 14  AT&T Pebble Beach National Pro-Am  3 68 67 69 71
Feb 25 - 28  Waste Management Phoenix Open  4 68 68 72 67
Mar 11 - 14  WGC-CA Championship  5 71 69 72 68
Mar 25 - 28  Arnold Palmer Invitational  6 71 67 75 77
Apr 1 - 4  Shell Houston Open  7 69 76 70 71
Apr 8 - 11  The Masters  8 67 71 67 67
Apr 29 - May 2  Quail Hollow Championship  9 70 68 71 68
May 6 - 9  THE PLAYERS Championship  10 70 71 66 74
Jun 3 - 6  the Memorial Tournament  11 67 71 70 69
Jun 17 - 20  U.S. Open Championship  12 75 66 73 73
Jul 15 - 18  The Open Championship  13 73 71 70 75
Aug 5 - 8  WGC-Bridgestone Invitational  14 66 68 71 78
Aug 12 - 15  PGA Championship  15 73 69 73 67
Sep 3 - 6  Deutsche Bank Championship  16 69 65 67 76
Sep 9 - 12  BMW Championship  17 72 71 70 67
Sep 23 - 26  THE TOUR Championship  18 69 72 68 74
Jan 27 - 30  (2011) Farmers Insurance Open  19 67 69 68 69
Feb 3 - 7  Waste Management Phoenix Open  20 67 65 71 71
Feb 10 - 13  AT&T Pebble Beach National Pro-Am  21 71 67 69 71
Feb 17 - 20  Northern Trust Open  22 71 70 74 68
Mar 10 - 13  World Golf Championships-Cadillac Championship  23 73 71 72 76
Mar 24 - 27  Arnold Palmer Invitational presented by Mastercard  24 70 75 69 73
Mar 31 - Apr 3  Shell Houston Open  25 70 70 63 65
Apr 7 - 10  The Masters  26 70 72 71 74
May 5 - 8  Wells Fargo Championship  27 69 66 74 69
May 12 - 15  THE PLAYERS Championship  28 71 71 69 72
Jun 2 - 5  The Memorial Tournament  29 72 70 72 67
Jun 16 - 19  U.S. Open Championship  30 74 69 77 71
Jul 14 - 17  The Open Championship  31 70 69 71 68
Aug 4 - 7  World Golf Championships-Bridgestone Invitational  32 67 73 71 72
Aug 11 - 14  PGA Championship  33 71 70 69 70
Aug 25 - 27  The Barclays  34 67 70 68 68
Sep 2 - 5  Deutsche Bank Championship  35 70 73 63 69
Sep 15 - 18  BMW Championship  36 72 73 71 75
Sep 22 - 25  TOUR Championship by Coca-Cola  37 68 70 67 71
1/19 - 1/22 (2012) Humana Challenge 38 74 69 66 69
2/2 - 2/5  Waste Management Phoenix Open 39 68 70 67 73
2/9 - 2/12  AT&T Pebble Beach National Pro-Am 40 70 65 70 64
2/16 - 2/19  Northern Trust Open 41 66 70 70 71
3/8 - 3/11  World Golf Championships-Cadillac Championship 42 72 71 71 71
3/22 - 3/25  Arnold Palmer Invitational 43 73 71 71 72
3/29 - 4/1  Shell Houston Open 44 65 70 70 71
4/5 - 4/8  The Masters Tournament 45 74 68 66 72
5/3 - 5/6  Wells Fargo Championship 46 71 72 68 71
5/10 - 5/13  THE PLAYERS Championship 47 71 71 70 73
5/17 - 5/20  HP Byron Nelson Championship 48 70 69 69 66
6/14 - 6/17  U.S. Open Golf Championship 49 76 71 71 78
8/2 - 8/5  World Golf Championships-Bridgestone Invitational 50 71 69 73 71
8/9 - 8/12  PGA Championship 51 73 71 73 74
8/23 - 8/26  The Barclays 52 68 74 67 76
8/31 - 9/3  Deutsche Bank Championship 53 68 68 68 66
9/6 - 9/9  BMW Championship 54 69 67 64 70
9/20 - 9/23  TOUR Championship by Coca-Cola 55 69 71 72 69
11/1 - 11/4  World Golf Championships-HSBC Champions 56 66 69 66 68

 

Figures 5 and 6 show the X and R control charts with the control limits based on the 2010 data.  Still a pretty consistent golfer!  Note that in Figure 6, there is a run of 8 points below the average.  Something happened to make his game more consistent from that point on.  We should probably re-calculate the control limits from that point on.  But I will leave that to Phil.  Or, if you would like to do it, e-mail me and I will send you the data in an Excel workbook.

Figure 5: Phil Mickelson X Control Chart

 Phil Mickelson Xbar Chart

Figure 6: Phil Mickelson R Control Chart

 Phil Mickelson R Chart

 

Summary

Rational subgrouping provides the context you use to interpret a control chart.  The first question you ask yourself when examining a control chart is “What variation is this chart examining?”  This is the key to using control charts effectively.  Without effective rational subgrouping, control charts can be simple nonsense.

We used the X-R control chart in this newsletter.  The X-R control chart is a powerful tool for examining sources of variation but it is critical to set up the chart to explore the variation you are interested in.  In this golf example,  we were interested in monitoring our average tournament score as well as our consistency.  The chart we used was setup to examine that variation.  Note that, if we had plotted each individual tournament result, we would have been looking at different sources of variation.  This is the principle behind rational subgrouping. 

Next month we will take a more in-depth look at rational subgrouping including some rules to follow. 

 

Quick Links

SPC for Excel Software

Visit our home page

SPC Training

SPC Consulting

Ordering Information

Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position.

Sincerely,

Dr. Bill McNeese
BPI Consulting, LLC

View Bill McNeese's profile on LinkedIn

Connect with Us

     

Leave a comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <h1> <h2> <h3> <h4> <h5> <h6> <img> <hr> <div> <span> <strike> <b> <i> <u> <table> <tbody> <tr> <td> <th>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

Connect with Us

     

SPC Knowledge Base Sign-up

Click here to sign up for our FREE monthly publication, featuring SPC and other statistical topics, case studies and more!

SPC Around the World

SPC for Excel is used in over 60 countries internationally.  Click here for a list of those countries.