Scatter Diagrams


February 2005

 

In this issue:

 

 

 

Greetings,

Welcome to the SPC for Excel e-zine. Each month you will receive information on a featured SPC topic and other items. We hope you enjoy this issue. Please let us know your ideas for topics to cover as well as any ideas you might have for improving the e-zine.

Suppose you are faced with a problem. You have followed the steps in the problem-solving model. You have defined the problem using Pareto diagrams and pinpointing. In addition, you have analyzed how the process is behaving through the use of process flow diagrams, histograms and control charts. The process is in control, but the results are not acceptable. There is too much variation in the process (or perhaps it is operating at the wrong level or average). You need to find out what is causing the process to behave as it does. A cause and effect diagram has been constructed. This diagram lists all the possible causes of the problem. How do you determine what causes are really responsible for the variation? For example, is reaction yield influenced more by run time or pressure? One method of doing this is to use a scatter diagram. The scatter diagram is introduced in this e-zine.

 

Introduction to Scatter Diagrams

Scatter Pic

A scatter diagram is used to show the relationship between two kinds of data. It could be the relationship between a cause and an effect, between one cause and another, or even between one cause and two others. To understand how scatter diagrams work, consider the following example.

Suppose you have been working on the process of getting to work within a certain time period. The control chart you constructed on the process shows that, on average, it takes you 25 minutes to get to work. The process is in control. You would like to decrease this average to 20 minutes. What causes in the process affect the time it takes you to get to work? There are many possible causes, including traffic, the speed you drive, the time you leave for work, weather conditions, etc. Suppose you have decided that the speed you drive is the most important cause. A scatter diagram can help you determine if this is true.

In this case, the scatter diagram would be showing the relationship between a "cause" and an "effect." The cause is the speed you drive and the effect is the time it takes to get to work. You can examine this cause and effect relationship by varying the speed you drive to work and measuring the time it takes to get to work. For example, on one day you might drive 40 mph and measure the time it takes to get to work. The next day, you might drive 50 mph. After collecting enough data, you can then plot the speed you drive versus the time it takes to get to work. Figure 1 is an example of a scatter diagram for this case. The cause (speed) is on the x-axis. The effect (time it takes to get to work) is on the y-axis. Each set of points is plotted on the scatter diagram.

 


Our SPC for Excel automatically generates scatter diagram. Click here for more information!

 

Intrepreting a Scatter Diagram

 


The next step is to determine if there is a relationship between speed and the time it takes to get to work. The figure below shows the general types of relationships that can exist. Figure A shows a strong positive correlation between x and y. This means that if x increases, then so will y. If x is the speed you drive and y the time it takes to get to work, a strong positive correlation would mean that the faster you drive (increasing x), the longer it takes to get to work (increasing y). Figure B shows a situation where a positive correlation may be present. This means if x increases, y will increase somewhat. However, there are probably other factors that are affecting y.

Figure C shows an example of no relationship or correlation between x and y. In other words, y is affected by other causes than x. For the driving to work example, this would mean that the speed at which you drive has no effect on the time it takes to get to work.

Figure D is an example of a possible negative relationship between x and y. Increasing x (the speed) decreases y (the time) somewhat, but there appear to be other causes that affect y. Figure E shows a strong negative relationship between x and y. This means that an increase in x causes a decrease in y. For example, the faster you drive, the more quickly you get to work.

Exact methods of determining if a correlation exists between x and y are available in many software packages including our software SPC for Excel.

There is one item to be cautious about. It is possible that a scatter plot can show a relationship between x and y but x may not be the cause of y. There may be some other factor that affects both x and y in the same manner. For example, a study during the 1930's showed a relationship between the number of babies born in one part of England and the stork population. As the stork population increased, so did the number of babies. We realize that the reason for the increase in babies born was not caused by the increase in stork population.

Correlations

 

 

We have a PowerPoint training module on scatter diagrams. Click here for more information! ยป

 

Steps in Making a Scatter Diagram


A scatter diagram can be used in the "Determine Causes" step of the problem-solving model (see May 2004 e-zine on the website). In this step, we are trying to determine why the process is behaving as it is.

The steps in constructing a scatter diagram are given below.

1. Gather the data.

a. Collect 25 to 100 paired samples of data (x and y values), the relationship of which you wish to investigate, and record the data. Less data can be used if necessary.

2. Plot the data.

a. Select the scales for the x and y axes.

b. Plot each paired value of sample data on the chart.

3. Determine if there is a relationship between x and y visually or by using a software package such as SPC for Excel.

 

Scatter Diagram Example


Scatter DataIn a warehouse, pickers pick line items from a pick ticket. Is there a correlation between lines picked per day in a warehouse and overtime hours? The data for the last 22 days are given in the table. The scatter diagram is also given. Is there a correlation? If so, what type of correlation?

The scatter diagram was generated using the Excel add-in SPC for Excel. The equation in the title shows the relationship between lines picked per day and overtime. The equation is:

Y =1.3 + (0.0392)X

 

where Y = overtime in hours and X = lines picked per day. You can use this equation to predict overtime based on the number of lines picked per day. For example, if the number of lines picked on a given day was 600, the overtime is predicted to be:

Y =1.3 + (0.0392)X = 1.3 + (0.0392*600) = 24.82

 

The key number in the equation is the 0.0392. This is the slope of the line. It means that when the line items picked per day increases by 1, the overtime hours will increase by .0392 hours.

OT Hours