In this section: |
The information in this section can be used to evaluate the model.
How to: |
The Rpart Rule Export functionality allows you to create and store unique versions of scoring data for a selected database on a local drive or directory. The basic functionality involves choosing a database, creating a model, and then scoring the data based on the values generated and stored for that modeling scenario.
Note:
From the Model tab, you can model the scenario, clicking Execute after you make a selection. You can then export the rules. For more information, see How to Execute the Rpart Rule Export.
From the Evaluate tab, if the Score option is selected, RStat appends the rules after the data set within the resulting .csv file. The data specific to this functionality is identified with a column heading of Rpart, allowing you to locate the newly generated Rpart data easily. This is illustrated in the following image.
RStat has a unique method for naming the files associated with the Rpart Rule Export functionality. The file name for the output of the scoring routine is derived from the original database file name. RStat appends _train_score_all to the file name. For example, if the originating database is database.csv, the resulting filename is:
database_train_score_all.csv
Alternatively, you can provide your own file name or append information to the default file name assigned. For example, you may run the same scenario on different dates. You might use a date convention (for example, _mmddyyyy) to archive your files. In general, this file naming convention ensures that the original name of the database is preserved while marking the new output file as one that contains scoring data.
You can store these files locally using the default generated naming conventions (or that which you specify) for future analysis. For example, you can use WebFOCUS to create reports, graphs or other BI-related tasks. Specifically, you can use the following resources:
Note: After scoring and saving the resulting data, RStat displays the path and file name of the file that you saved when the application returns to the Evaluate tab, as shown in the following image.
Note: The Rpart Rule Export does not support a sampling or testing data set. In order to execute Rpart rules, you must clear the Sample check box on the Data tab. This converts the data into training data from which rules can be extracted.
Note: Clear the check boxes for the Neural Net and SVM models, if necessary.
Note:
In this section: |
Evaluation techniques in RStat allow you to investigate how well your model will make predictions. The available evaluation techniques are determined by the type of model you have generated.
Notice that the Model Type is defined as Tree and a variety of evaluation techniques are presented.
You can use the following data sets to evaluate the current model.
An error matrix shows the relationship between the actual data and the predicted values.
With Error Matrix selected, click Execute.
Two error matrices are displayed. The first matrix shows the count of cases and the second shows the percentage of cases.
Looking at the second matrix, you can see that the model predicts the following:
In RStat, you can score new data to see how well your model predicts. The Score data option will create a new CSV file with the scored values.
New Score options appear at the bottom of the tab panel.
Report Options
Report options are available only for Binary Trees and Logistic Regressions, where your target is binary (two unique values). For other models, the Report options will be grayed out.
The Report options define the type of score to be returned.
Include Options
Include options allow you to define which fields should be included in the scored file.
The Score Files dialog box opens.
Note: The file name that you define will be the exact name used, so be sure that the file name contains a .csv extension.
In the example below, the Scored option has ALL selected instead of IDENTIFIERS. The output file structure will have all fields (variables) plus the Scored value (Column name=rpart) and the Rules column (column name=RpartRules).
Note: The contents for each data line are the rule details. Check the column name and verify that none are missing rules for any data line, as shown in the following image.
WebFOCUS |