Evaluating the Model

In this section:

The information in this section can be used to evaluate the model.


Top of page

x
Exporting Rpart Rules

How to:

The Rpart Rule Export functionality allows you to create and store unique versions of scoring data for a selected database on a local drive or directory. The basic functionality involves choosing a database, creating a model, and then scoring the data based on the values generated and stored for that modeling scenario.

Note:

From the Model tab, you can model the scenario, clicking Execute after you make a selection. You can then export the rules. For more information, see How to Execute the Rpart Rule Export.

From the Evaluate tab, if the Score option is selected, RStat appends the rules after the data set within the resulting .csv file. The data specific to this functionality is identified with a column heading of Rpart, allowing you to locate the newly generated Rpart data easily. This is illustrated in the following image.

RStat has a unique method for naming the files associated with the Rpart Rule Export functionality. The file name for the output of the scoring routine is derived from the original database file name. RStat appends _train_score_all to the file name. For example, if the originating database is database.csv, the resulting filename is:

database_train_score_all.csv

Alternatively, you can provide your own file name or append information to the default file name assigned. For example, you may run the same scenario on different dates. You might use a date convention (for example, _mmddyyyy) to archive your files. In general, this file naming convention ensures that the original name of the database is preserved while marking the new output file as one that contains scoring data.

You can store these files locally using the default generated naming conventions (or that which you specify) for future analysis. For example, you can use WebFOCUS to create reports, graphs or other BI-related tasks. Specifically, you can use the following resources:

Note: After scoring and saving the resulting data, RStat displays the path and file name of the file that you saved when the application returns to the Evaluate tab, as shown in the following image.



x
Procedure: How to Execute the Rpart Rule Export
  1. Open RStat.
  2. Click the folder adjacent to the Filename field and select a database file.

    Note: The Rpart Rule Export does not support a sampling or testing data set. In order to execute Rpart rules, you must clear the Sample check box on the Data tab. This converts the data into training data from which rules can be extracted.

  3. Click Open to confirm the database selection and return to the RStat interface.
  4. Click Execute.
  5. Click the Model Tab and for Type, select Tree.
  6. Click Execute.
  7. Click the Evaluate tab and for Type, click Score.

    Note: Clear the check boxes for the Neural Net and SVM models, if necessary.

    1. For Data, select Training.
    2. For Include, select All.
    3. Click Execute to score the Rpart data and save it to a local file, as shown in the following image.

    Note:

    • On the Evaluate tab, the Execute and Export buttons produce the same results.
    • You can save the output file to the default directory or optionally, create a new folder for archiving purposes. You can also rename the file as required.


Top of page

x
Evaluating the Decision Tree

In this section:

Evaluation techniques in RStat allow you to investigate how well your model will make predictions. The available evaluation techniques are determined by the type of model you have generated.

  1. Select the Evaluate tab.

    Evalute tab

    Notice that the Model Type is defined as Tree and a variety of evaluation techniques are presented.

  2. Select the evaluation data.

    Evaluation model types

You can use the following data sets to evaluate the current model.



x
Error Matrix

An error matrix shows the relationship between the actual data and the predicted values.

With Error Matrix selected, click Execute.

Error Matrix window

Two error matrices are displayed. The first matrix shows the count of cases and the second shows the percentage of cases.

Looking at the second matrix, you can see that the model predicts the following:



x
Scoring New Data

In RStat, you can score new data to see how well your model predicts. The Score data option will create a new CSV file with the scored values.

  1. Select Score as the Evaluation type.

    Score Evaluation type

    New Score options appear at the bottom of the tab panel.

    Report Options

    Report options are available only for Binary Trees and Logistic Regressions, where your target is binary (two unique values). For other models, the Report options will be grayed out.

    The Report options define the type of score to be returned.

    • Class. A categorical value that is derived on a zero to 1 scale, where 0 through .5 = 0 and .5 through 1 = 1.
    • Probability. A numeric value between 0 and 1 representing the likelihood that the result will be a higher value. For character-based targets, the higher value is determined alphabetically. For example, if your target is Gender with Male and Female as the values, the probability will return the likelihood that the outcome will be Male.

    Include Options

    Include options allow you to define which fields should be included in the scored file.

    • Identifiers. Includes the identifier, the target, and the score value.
    • All. Includes all variables in the data set plus the score value.
  2. Once you have defined the Scoring options, click Execute in the RStat toolbar.

    The Score Files dialog box opens.

    Score Files dialog box

  3. Define the file name and location where the scored data will be saved.

    Note: The file name that you define will be the exact name used, so be sure that the file name contains a .csv extension.

In the example below, the Scored option has ALL selected instead of IDENTIFIERS. The output file structure will have all fields (variables) plus the Scored value (Column name=rpart) and the Rules column (column name=RpartRules).

Note: The contents for each data line are the rule details. Check the column name and verify that none are missing rules for any data line, as shown in the following image.


WebFOCUS