In this section: |
Load your data, as detailed in Introducing WebFOCUS RStat.
RStat provides random sampling. You can divide your data set into a training data set and a testing data set. The training data set will be used to build the model. The testing data set, also called the evaluation data set, can be used by the model evaluation techniques to test how well the model predicts.
Define the proportion of data to be included in each data set and the seed to be used to generate the random sample.
Note:
For each of the variables within your data set, you can define the role it should play in the model by clicking the appropriate column within the Variable Grid.
RStat automatically assigns roles to variables based on the following variable prefixes.
Prefix |
Role |
---|---|
ID |
Identifier |
IGNORE |
Ignored |
IMP |
Imputed |
RISK |
Risk measure |
You can have one Target and one Risk variable.
You can override these default settings by clicking the appropriate role for each of your variables.
You can set a group of variables to a single role using the Input and Ignore buttons by:
The data type of the target variable determines the type of modeling available and the specific algorithms that will be used within the modeling process. The data type is defined based on the type of data RStat identifies and the quantity of unique values found in the actual data. In RStat, data types are defined as:
Note: The target setting does not change the actual data within the data grid. It will change only the way the target data is used when the model is built.
Once you have set or confirmed the Sampling, Data roles, and Target type, click Execute from the RStat toolbar to pass these settings to RStat.
Notice that the status bar will display the:
WebFOCUS |