Missing Data in Scoring Routines

In this section:

Missing data refers to variables that have no data value in the current observation or record. The missing or inapplicable value is indicated by the default character, a dot (.).

Some modeling algorithms cannot generate a score if any of the input parameters are missing and will return the score as missing. Other modeling algorithms can generate a score even if there are missing input parameter values.

Regression and clustering techniques will return a missing value for the score if any of the input parameters are missing.

Decision tree techniques will return a score even if there are missing input parameter values. If there is a missing value, the record is assigned to the majority class of the node in which the missing value occurs.


Top of page

x
Recognizing Missing Input Parameters

In order for a scoring routine to recognize missing input parameter values and for the algorithm to derive the score appropriately, you must add the SET MISSINGTEST command to the procedure (fex) and the MISSING attribute to the individual calculated field.

  1. Set the MISSINGTEST command by either:

    Adding the following SET command to the procedure (fex):

    SET MISSINGTEST = SPECIAL

    Or, adding the following ON TABLE SET command to the report request:

    ON TABLE SET MISSINGTEST SPECIAL
  2. Add the MISSING attribute to the individual calculated field (MISSING ON). For example:
    COMPUTE PREDICTION/D20.8CM MISSING ON = 
    W_REG_LINEAR(WRAIN, DEGREES_IN_C, HRAIN,
    TIME_SINCE_VINTAGE, CHATEAU,PREDICTION);


Example: Calling an RStat Scoring Routine Function

The following report request includes WebFOCUS syntax that calls the w_reg_linear function, a scoring routine built from a linear regression model. It is designed to account for the possibility that some of the input values may be missing.

Note: Scoring routine functions cannot be embedded in other formulas or expressions. The expression on the right side of the command must consist only of the function call.

SET MISSINGTEST=SPECIAL
FILEDEF W_REG_LINEAR DISK
C:\IBI\APPS\_rstat\w_reg_linear.CSV
TABLE FILE W_REG_LINEAR
PRINT
  ID
  CHATEAU AS 'Chateau'
  WRAIN/D6.0 AS 'Winter,Rain,(inches)'
  DEGREES_IN_C/D6.0 AS 'AvgTemp,(Celsius)'
  HRAIN/D6.0 AS 'Harvest,Rain,(inches)'
  TIME_SINCE_VINTAGE/I5 AS 'Years,Since,Vintage'
COMPUTE PREDICTION/D20.8CM MISSING ON = 
W_REG_LINEAR(WRAIN, DEGREES_IN_C, HRAIN,
TIME_SINCE_VINTAGE, CHATEAU,PREDICTION);
HEADING
"Regression Linear"
ON TABLE SET PAGE-NUM OFF
ON TABLE NOTOTAL
ON TABLE PCHOLD AS W_REG_LINEAR.PDF FORMAT PDF
ON TABLE SET STYLE *
  UNITS=IN,
  PAGESIZE='Letter',
  SQUEEZE=ON,
  ORIENTATION=LANDSCAPE,
$
TYPE=REPORT,
  FONT='TREBUCHET MS',
  SIZE=9,
  COLOR=RGB(66 70 73),
.
.
.
ENDSTYLE
END

The partial output is shown in the image below. By default, the missing value is represented by a dot (.) on the report output. You can change this character designation by using the SET NODATA command. For more information on changing the missing data character, see the Handling Records With Missing Field Values chapter in the Creating Reports With WebFOCUS Language manual.

Linear Regression graph

For additional information and syntax on handling records with missing data in a report request, see the Handling Records With Missing Field Values chapter in the Creating Reports With WebFOCUS Language manual.


WebFOCUS