Calculating Trends and Predicting Values With Multivariate REGRESS

How to: - Create a Multivariate Linear Regression Column Using Report Painter
- Create a Multivariate Linear Regression Column Using WF Language
Reference: |

You can calculate trends and predict values with multivariate regression. This method derives a linear equation that best fits a set of numeric data points, and uses this equation to create a new column in the report output. The equation can be based on one to three independent variables.

This method estimates values by assuming that the dependent variable (y, the new calculated values) and the independent variables (x1, x2, x3) are related by the following linear equation:

y = a1*x1 [+ a2*x2 [+ a3*x3]] + b

When there is one independent variable, the equation represents a straight line. This produces the same values as FORECAST using the REGRESS method. When there are two independent variables, the equation represents a plane, and with three independent variables, it represents a hyperplane. You should use this technique when you have reason to believe that the dependent variable can be approximated by a linear combination of the independent variables.

REGRESS uses a technique called Ordinary Least Squares to calculate values for the coefficients (a1, a2, a3, and b) that minimize the sum of the squared differences between the data and the resulting line, plane, or hyperplane.

Procedure: How to Create a Multivariate Linear Regression Column Using Report Painter

- In the Report Painter main window, select a numeric BY field.
- Click the Forecast button.
- From the Step 1: Choose a Method drop-down list, select the Multivariate Regression option.
- From the Step 2: Choose the First Independent Variable drop-down list, select a variable. You have the option to choose additional independent variables in Step 3 and Step 4.
- From the Step 5: Choose the Dependent Variable drop-down list, select a dependent variable.
- Click OK.

Syntax: How to Create a Multivariate Linear Regression Column Using WF Language

ON {sortfield} RECAP y[/fmt] = REGRESS(n, x1, [x2, [x3,]] z);

where:

- sortfield
Is a field in the data source. It cannot be the same field as any of the parameters to REGRESS. A new linear regression equation is derived each time the sort field value changes.

- y
Is the new numeric column calculated by applying the regression equation. You cannot DEFINE or COMPUTE a field with this name.

- fmt
Is the display format for y. If it is omitted, the default format is D12.2.

- n
Is a whole number from 1 to 3 indicating the number of independent variables.

- x1, x2, x3
Are the field names to be used as the independent variables. All of these variables must be numeric and be independent of each other.

- z
Is an existing numeric field that is assumed to be approximately linearly dependent on the independent variables and is used to derive the regression equation.

Reference: Usage Notes for REGRESS

- The (By) sort field used with REGRESS must be in a numeric or date format.
- REGRESS cannot operate on an ACROSS field.
- If any of the independent variables are also sort fields, they cannot be referenced in the request prior to the REGRESS sort field.
- FORECAST and REGRESS cannot be used in the same request, and only one REGRESS is supported in a request. Non-REGRESS RECAP commands are supported.
- The RECAP command used with REGRESS can contain only the REGRESS syntax. REGRESS does not recognize any syntax after the closing semicolon (;).
- Although you pass parameters to REGRESS using an argument list in parentheses, REGRESS is not a function. It can coexist with a user-written subroutine of the same name, as long as the user-written subroutine is not specified in a RECAP command.
- BY TOTAL is not supported.
- MORE, MATCH, FOR, and OVER are not supported.
- The process of generating the REGRESS values creates extra columns that are not printed in the report output. The number and placement of these additional columns varies depending on the specific request. Therefore, use of column notation is not supported in a request that includes REGRESS.
- SUMMARIZE and RECOMPUTE are not supported for the same sort field used for REGRESS.
- REGRESS is not supported for the FOCUS GRAPH facility.
- The left side of a RECAP command used for REGRESS supports the CURR attribute for creating a currency-denominated field.
- Fields with missing values cannot be used in the regression.
- Larger amounts of data produce more useful results.

Example: Creating a Multivariate Linear Regression Column

The following example uses the GGSALES data source to calculate an estimated DOLLARS column. The BUDUNITS, UNITS, and BUDDOLLARS fields are the independent variables. The DOLLARS field provides the actual values to be estimated.

- Create a new procedure and open the ggsales.mas file in Report Painter.
- Click Report at
the top of the Report Painter window, then click Define from
the drop-down list.
The Define dialog box opens.

- Enter MONTH in the Field input area, type M (a date format) in the Format input area, then double-click DATE in the Fields List to add it to the area below PERIOD, then click New.
- Enter PERIOD
in the Field input area, type I2 (Integer format) in the Format
input area, then type MONTH in the area below PERIOD, then click OK.
The Define dialog box closes and you return back to Report Painter.

- Add the PERIOD, BUDUNITS, UNITS, BUDDOLLARS, and DOLLARS fields to the report.
- Select the PERIOD field, and click the By button.
- Click the Where
button from the Where/If drop-down menu.
The Expression Builder opens.

- Select CATEGORY from the Data section.
- Enter CATEGORY EQ 'Coffee' in the Advanced section, and click New.
- Select REGION from the Data section.
- Enter REGION EQ 'West' in the Expression field, and click New.
- Enter UNITS GT 1600 AND UNITS LT 1700 in the Expression field, and click OK.
- Click the Forecast button.
The Forecast window opens.

- In the Field Name field, enter EST_DOLLARS.
- From the Step 1: Choose a Method drop-down list, select Multivariate Regression.
- From the Step 2: Choose the First Independent Variable drop-down list, select BUDUNITS.
- From the Step 3: Choose Another Independent Variable drop-down list, select UNITS.
- From the Step 4: Choose Another Independent Variable drop-down list, select BUDDOLLARS.
- From the Step 5: Choose the Dependent Variable drop-down list, select DOLLARS.
- Click OK.

When the report is run, the multivariate regression values appear in the EST_DOLLARS column. The output is shown in the following image.

TABLE FILE GGSALES PRINT BUDUNITS UNITS BUDDOLLARS DOLLARS ON TABLE RECAP ESTDOLLARS/F8 = REGRESS(3,BUDUNITS, UNITS, BUDDOLLARS, DOLLARS); WHERE CATEGORY EQ 'Coffee' WHERE REGION EQ 'West' WHERE UNITS GT 1600 AND UNITS LT 1700 END

WebFOCUS |