Scatter Plots

In this section:

A scatter plot, also called a scatter gram or scatter diagram, is used to investigate the possible association between two variables that both relate to the same event by distributing a series of points or nodes, each representing a value in the data set, across a grid. Scatter plots are used to demonstrate cause-and-effect, and evaluate the nature and degree of associations between two attributes.

Scatter plots share many of the characteristics of basic line graphs, and data can be plotted using variable scales on both axes. You can use the scatter plot to analyze correlations, clusters of points, patterns, and the influence of one variable upon another. That is, plotting two groups of numbers as one series of XY coordinates. Scatter plots can also be used to examine how the value of Y changes as a function of X, including changes over time if the x-axis is a time sequence.

The following image shows an example of a scatter plot.

Scatter Plots example

With WebFOCUS Visual Discovery, you can easily visualize correlations between two items. Additionally, scatter plots can be enhanced by:

The scatter plot accepts continuous variables. It does not accept string variables. Unselected items are drawn as open circles and selected values are drawn as filled circles of either the default highlight color or the color used to code the data.


Top of page

x
Inserting a Trend Line in a Scatter Plot

How to:

Reference:

You can add a trend line to a data series in a scatter plot. Trend lines are commonly used to study problems of prediction (regression analysis) and to smooth fluctuations in data to show the pattern or trend more clearly. The trend option can show or hide trend lines for both the data and the selected subset.

The detail options on the trend menu are used for drawing non-trend type lines through the scatter plot data. Each of the options draws a line connecting dots in the direction of the x-axis. The difference in the options lies in how the line is drawn when dots with the same x-value and different y-values are encountered.



x
Procedure: How to Use a Trend Line
  1. In the Visual Discovery Properties dialog box, click the Scatter tab.
  2. In the Trend Line field, select:
    • Local to insert a trend line which curves to fit the data points.
    • Linear to insert a linear regression trend line that is drawn through all loaded points. The line approximates all the points in the overall scatter plot. This trend lines uses the linear equation, y=mx+b.
    • None so no trend line is inserted. None is the default setting.
    • Min to draw a line to the dot that has the lowest y-value.
    • Max to draw a line to the dot that has the highest y-value.
    • Ave to draw a line to the average of the y-values (regardless of whether there is a dot at that value or not).
    • All to connect all of the dots.
  3. Adjust the slider bar in the smoothing section to adjust the graph quality.
  4. Click Apply, and then click OK.


x
Reference: Scatter Tab

The following image shows the Scatter tab in the Visual Discovery Properties dialog box.

Scatter Tab

Show Unselected

Shows unselected data in gray. If the Show Unselected check box is not selected, unselected data is omitted from the display.

To add unselected data, either select the Show Unselected check box in the Properties dialog box, or use the right mouse button menu in the view and click Select All.

Highlighted Point Size

Adjusts the size of the selected nodes, thereby enabling them to stand out from the unselected nodes.

Trend Line
None

Turns off the trend line option.

Linear

Creates a trend line using the linear equation, y=mx+b.

Local

Creates a trend line that curves to fit the data points.

For complete details, see Inserting a Trend Line in a Scatter Plot.

Detail Data Points
Min

Draws a trend line through the minimum values in the data set.

Max

Draws a trend line through the maximum values in the data set.

Avg

Draws a trend line through the average of the points.

All

Draws a trend line through every item in the data set.

Glyph Size

Specifies the name of the data field that controls the size of points. It is best to size by a numeric field.

Glyph Style

Specifies the name of the data field that controls the shape that is drawn for a point. It is best to style by a categorical field (string or integer). A fixed set of shapes are automatically assigned to data. These include a filled or open circle, filled or open square, filled or open diamond, and filled or open triangle.

Jitter (Immediate effect)

When points in your data set have exactly the same coordinates, overplotting can cause them to appear as a single plotted point, thus obscuring the display. You can jitter points to put more space between them.

X

Jitters the x-axis points.

Y

Jitters the y-axis points.

Both

Jitters both the x-axis and y-axis points.

None

Turns off the jitter option.

Point size

Adjusts the size of glyphs. To make the glyphs larger, move the slider to the right. To make the glyphs smaller, move the slider to the left.

Transform Axis
Linear

Plots all data on the axis with their exact values. You can choose to have exact values plotted for data on the x-axis, y-axis, or both axes.

This is the default data presentation.

Root

Displays data along the axis as the square root of its current value. For example, if the current value of a piece of data is 9 and you select Root, the data displays at 3. This is useful when looking at highly skewed distributions, because it evenly reduces the positions of the items, making them appear closer together.

You can choose the Root option for data plotted on the x-axis, y-axis, or both axes.

Log

Displays data along the axis as the logarithm of its current value. For example, if the current value of a piece of data is 100 and you select Log, the data displays at 2 (the log of 100 to the base 10 is 2). When you have a large range of an item (like one item in a million), Log is useful because it maintains the relative position of the item while reducing the extremes.

You can use the Log option for data plotted on the x-axis, y-axis, or both axes.

Zooming

Links the x-axis and y-axis when zooming.



x
Reference: Scatter Plot Pop-up Menu

The pop-up menu appears when you right-click a Visual Discovery control in the analytical dashboard. These options do not appear when you are developing in HTML Composer.

Show unselected

Shows unselected data in gray. If the Show Unselected check box is not selected, unselected data is omitted from the display.

To add unselected data, either select the Show Unselected check box in the Properties dialog box, or use the right mouse button menu in the view and click Select All.

Highlighted size

Adjusts the size of selected nodes.

Trend

Adds a trend line. For more information, see Inserting a Trend Line in a Scatter Plot.

Linear

Changes the scale to linear. This is the default.

Root

Displays data along the axis as the square root of its current value. For example, if the current value of a piece of data is 9 and you select Root, the data displays at 3. This is useful when looking at highly skewed distributions, because it evenly reduces the positions of the items, making them appear closer together. You can choose the Root option for data plotted on the x-axis, y-axis, or both axes.

Log

Displays data along the axis as the logarithm of its current value. For example, if the current value of a piece of data is 100 and you select Log, the data displays at 2 (the log of 100 to the base 10 is 2). When you have a large range of an item (like one item in a million), Log is useful because it maintains the relative position of the item while reducing the extremes.

You can use the Log option for data plotted on the x-axis, y-axis, or both axes.

Jitter

When points in your data set have exactly the same coordinates, overplotting can cause them to appear as a single plotted point, thus obscuring the display. You can jitter points to put more space between them.

Unjitter

Turns the jitter option off.

Larger/Smaller Points

Increases or decreases the size of the selected glyphs.

Undo

Reverses the previous action. You may repeatedly undo actions retained in the history file for your current session by selecting Undo over and over again. A description of the previous action appears on the pop-up menu. If you have performed no action, Undo is not available for selection and no action appears to the right of the word Undo.

Redo

Restores the previous undo action. If you have performed no action, Redo is not available for selection and no action appears to the right of the word Redo.

Select All

Selects all of the items in the graph. When you choose Select All, any previous selections are ignored. Selection state returns to the original setting.

Unselect All

When selected, all of the items become unselected. All items appear in the unselected color (gray, by default) or are hidden in the graph (if hide unselected is active).

Toggle All

Reverses the selection state of items. Selected items become unselected and unselected items become selected.

Exclude Unselected

Excludes (temporarily removes) items from the graph.

Restore Excluded

Restores the items you excluded. If you accidentally excluded the unselected, this menu option restores those excluded items.

Save Image

Saves the graph to a GIF or JPEG file.

Copy Image

Copies the selected component and pastes it to another file.

Properties

Takes you to the collection of tabs available for the respective visualization component. Common tabs include Data, Selecting, and Colors.


WebFOCUS