Data Transformation

The following options are available on the Transform tab in RStat. You may also access the Transform tab through the Tools menu.

Note: Options may vary depending on which type of data is selected.

Data transformation allows users to derive new variables from existing ones. The transformation process can change the scale of the variables, the grouping of the values, and the type of the variable. The Transform tab also allows you to impute missing values, for example, replace the missing values with new values. As with transformations, a new variable with the imputed values will be created. Transformations and imputation make the data more useful in the modeling process.

When transformations are performed:

Types of Transformations Included With RStat
Rescale

There are two types of rescaling transformations.

Normalize. The terms rescaling, normalization, and standardization are frequently used interchangeably. They denote the conversion of one unit of measurement into another by applying a mathematical formula. For example, the conversion from Celsius to Fahrenheit involves a process of multiplying by a constant and adding a constant. There are many reasons to perform normalization of the data. One reason is to make a skewed distribution normal. For example, income is frequently skewed. Using a log transformation will normalize it. Another reason is to make two measures more comparable in magnitude. For example, age and income differ significantly in magnitude, but using scale, they can be rescaled from 0 to 1 and thus used in cluster analysis.

The following image displays the Rescale transformations.

Rescale transformations window

Impute
Imputation is used to fill in the missing values in the data. The Zero/Missing imputation is a very simple method. Any missing numeric data is simply assigned 0, and any missing categoric data is put into a new category, Missing. Mean, Median, and Mode replace missing values with the population mean, median, or mode.

The following is an image of the Impute options in the Transform tab.

Impute options button in the Transform tab

Recode

Recoding is the process of reassigning values to new categories or reassigning a variable to a new type.

Binning. Binning is a process of grouping measured data into classes or categories. There are several types of binning transformations.

Cleanup

Cleanup allows users to delete various elements from the loaded data set. This is particularly useful in freeing up memory, especially if the modeler is creating many transformed variables for testing purposes.

The following is an image of the Cleanup options in the Transform tab.

Cleanup option in the Transform tab


WebFOCUS