You are analyzing accident patterns on a highway. You select the two variables, motor speed and the number of accidents, and draw the diagram. Once it is complete, you notice that as the speed of vehicles increases, the number of accidents goes up.
This shows the correlation between the two. In most cases, the independent variable is plotted along the horizontal x-axis , and the dependent variable is plotted on the vertical y-axis. The independent variable is the control parameter because it influences the behavior of the dependent variable.
It is not necessary to have a controlling parameter to draw a scatter diagram. There can also be two independent variables. In that case, you can use any axis for any variable. I know that many professionals think that a scatter diagram is like a fishbone diagram because the fatter has two parameters: cause and effect.
Please note that these two diagrams are different. The fishbone diagram shows you the effect of a cause but does not show the relationship.
The scatter diagram helps you analyze the correlation between the two variables. However, the fishbone or Ishikawa diagram can help you draw a scatter diagram; for example, you can find the two variables cause and effect and then use the scatter diagram to analyze their relationship. You can classify scatter diagrams in many ways; I will discuss the two most popular based on correlation and slope of the trend.
These are the most common in project management. Here, the data points are a little closer and you can see that some kind of relationship exists between these variables. In this diagram, data points are close to each other and you can draw a line by following their pattern. As discussed earlier, you can categorize the scatter diagram according to the slope, or trend, of the data points:. A strong positive correlation means a visible upward trend from left to right; a strong negative correlation means a visible downward trend from left to right.
A weak correlation means the trend is less clear. A flat line, from left to right, is the weakest correlation, as it is neither positive nor negative. A scatter diagram with no correlation shows that the independent variable does not affect the dependent variable. In a positive slant, the correlation is positive, i. You can say that the slope of a straight line drawn along the data points will go up.
The pattern resembles a straight line. As the value of X increases, the value of Y also increases, but the pattern does not resemble a straight line. In the negative slant, the correlation is negative, i. The slope of a straight line drawn along the data points will go down. It might just be a series of points with no visible trend, or it might be a straight, flat row of points. In either case, the independent variable has no effect on the second variable; it is not dependent.
Scatter diagrams are useful in determining the relationship between two variables. This relationship can be between two causes, or a cause and an effect. It can be positive, negative, or have no correlation at all. The first variable is independent, and the second variable depends on the first. To analyze the pattern of the relationship, you change the independent variable and monitor the changes in the dependent one.
A scatter diagram can have two independent variables. A scatter diagram is an important concept from a PMP exam point of view. Learn About Quality. Magazines and Journals search.
About Scatter Diagram. Scatter Diagram Resources. Scatter Diagram Related Topics. What is a Scatter Diagram? Quality Glossary Definition: Scatter diagram Also called: scatter plot, X-Y graph The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them.
When to use a scatter diagram Scatter diagram procedure Scatter diagram example Scatter diagram considerations Scatter diagram resources When to Use a Scatter Diagram When you have paired numerical data When your dependent variable may have multiple values for each value of your independent variable When trying to determine whether the two variables are related, such as: When trying to identify potential root causes of problems After brainstorming causes and effects using a fishbone diagram to determine objectively whether a particular cause and effect are related When determining whether two effects that appear to be related both occur with the same cause When testing for autocorrelation before constructing a control chart Scatter Diagram Procedure Collect pairs of data where a relationship is suspected.
Draw a graph with the independent variable on the horizontal axis and the dependent variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-axis value intersects the y-axis value. If two dots fall together, put them side by side, touching, so that you can see both. Look at the pattern of points to see if a relationship is obvious. If the data clearly form a line or a curve, you may stop because variables are correlated. You may wish to use regression or correlation analysis now.
Otherwise, complete steps 4 through 7. Divide points on the graph into four quadrants. If number of points is odd, draw the line through the middle point. Count the points in each quadrant. Do not count points on a line. This gives rise to the common phrase in statistics that correlation does not imply causation. It is possible that the observed relationship is driven by some third variable that affects both of the plotted variables, that the causal link is reversed, or that the pattern is simply coincidental.
For example, it would be wrong to look at city statistics for the amount of green space they have and the number of crimes committed and conclude that one causes the other, this can ignore the fact that larger cities with more people will tend to have more of both, and that they are simply correlated through that and other factors.
If a causal link needs to be established, then further analysis to control or account for other potential variables effects needs to be performed, in order to rule out other possible explanations.
When a scatter plot is used to look at a predictive or correlational relationship between variables, it is common to add a trend line to the plot showing the mathematically best fit to the data. This can provide an additional signal as to how strong the relationship between the two variables is, and if there are any unusual points that are affecting the computation of the trend line. A common modification of the basic scatter plot is the addition of a third variable.
Values of the third variable can be encoded by modifying how the points are plotted. For a third variable that indicates categorical values like geographical region or gender , the most common encoding is through point color. Giving each point a distinct hue makes it easy to show membership of each point to a respective group.
One other option that is sometimes seen for third-variable encoding is that of shape. One potential issue with shape is that different shapes can have different sizes and surface areas, which can have an effect on how groups are perceived. However, in certain cases where color cannot be used like in print , shape may be the best option for distinguishing between groups.
For third variables that have numeric values, a common encoding comes from changing the point size. A scatter plot with point size based on a third variable actually goes by a distinct name, the bubble chart. Larger points indicate higher values. A more detailed discussion of how bubble charts should be built can be read in its own article. Hue can also be used to depict numeric values as another alternative. Rather than using distinct colors for points like in the categorical case, we want to use a continuous sequence of colors, so that, for example, darker colors indicate higher value.
Note that, for both size and color, a legend is important for interpretation of the third variable, since our eyes are much less able to discern size and color as easily as position. If you want to use a scatter plot to present insights, it can be good to highlight particular points of interest through the use of annotations and color.
Desaturating unimportant points makes the remaining points stand out, and provides a reference to compare the remaining points against. When the two variables in a scatter plot are geographical coordinates — latitude and longitude — we can overlay the points on a map to get a scatter map aka dot map.
This can be convenient when the geographic context is useful for drawing particular insights and can be combined with other third-variable encodings like point size and color.
As noted above, a heatmap can be a good alternative to the scatter plot when there are a lot of data points that need to be plotted and their density causes overplotting issues. However, the heatmap can also be used in a similar fashion to show relationships between variables when one or both variables are not continuous and numeric.
If we try to depict discrete values with a scatter plot, all of the points of a single level will be in a straight line. Heatmaps can overcome this overplotting through their binning of values into boxes of counts. If the third variable we want to add to a scatter plot indicates timestamps, then one chart type we could choose is the connected scatter plot. Rather than modify the form of the points to indicate date, we use line segments to connect observations in order. This can make it easier to see how the two main variables not only relate to one another, but how that relationship changes over time.
0コメント