A database in a data warehouse is often organized into
a star schema, consisting of a central fact table with the data
to be analyzed and multiple dimension tables that describes the
data.
Each dimension table has a single surrogate key, an arbitrary
unique identifier for the row. The fact table has multiple keys,
each joining to a different dimension table.
Before you create and run the data flows discussed in this section,
in addition to creating the source tables, you must also create
the target tables. See How to Create Sample Procedures and Data for Star Schema.
This example has detailed instructions for loading one of the
four dimension tables. Refer to the sample flows, DLOADCUST, DSLOADPROD,
DSLOADSALE, and DSLOADTIME, for other examples of loading a dimension
table. Refer to DSLOADFACT for an example of loading a fact table
and DSFLOWP for the complete example of loading a star schema.
Example: Loading a Dimension Table
This
example has instructions for loading the customer dimension table.
- In the DMC, right-click
an application directory in the navigation pane and select New,
then Flow. The data flow opens in the right
hand pane, with the SQL object displayed.
- Drag the data source
object DMCOMP from the ibisamp application directory in the navigation
pane into the workspace, located to the left of the SQL object.
- Right-click the SQL
object and select Column Selection.
The
Column Selection window opens.
- Right-click any column
name and Select All. Then click the arrow
to add them to the selected columns and click OK.
- Drag the target object DSDIMCUST from
the ibisamp application directory into the workspace to the right
of the SQL object.
- Right-click the target
object DSDIMCUST and select Properties.
The Properties page opens.
From the Load Type drop-down
menu, select Slowly Changing Dimensions,
as shown in the following image. Click X to
close the Properties page.
- Right-click the target
object DSDIMCUST and select Target
Transformations
The Transformations window opens.
- Click the Automap
button.
The
identically named source and target column names are added to the Expressions
window. The Transformations window should now look like the following image.
Note: The columns
CKEY and ACTIVE are not mapped because these columns are handled
automatically by the DataMigrator Slowly Changing Dimension processing.
CKEY is the surrogate key, which automatically starts with a value
of 1 and increases by one, for each row added. The Active flag is
set to 1 for currently active rows.
Click OK to
close the Transformations window.
- Click Save from
the Quick Access Toolbar. Enter dsxloadcust as
the file name.
Example: Loading a Fact Table
A
fact table load requires looking up the keys for each row in the
corresponding dimension tables, and obtaining a surrogate key so
that the fact table can be joined to each of the dimensions for
subsequent reporting.
To create the fact table follow these
steps.
- In the DMC, right-click
an application directory in the navigation pane and choose New,
then Flow. The data flow opens in the right
hand pane, with the SQL object displayed.
- Drag the data source
object DMORD from the ibisamp application
directory in the navigation pane into the workspace, to the left
of the SQL object.
- Drag the data source
object DMSALE from the ibisamp application
directory in the navigation pane into the workspace, to the left
of the SQL object.
A JOIN object is automatically added connected
to DMORD[MS] and DMSALE.
Note: If not, delete the object.
Then go to Tools, Options, Data
Flow and select the Add join object if needed check
box. Click OK. Then add dmsale again.
- Drag the data source
object DMINV from the ibisamp application
directory in the navigation pane into the workspace, to the left
of the SQL object.
A second JOIN object is automatically added
connected to the first JOIN object and DMINV.
- Right-click the SQL
object and select Column Selection.
The
Column Selection window opens.
- Select the following
columns and click the arrow to add them to the Selected Columns
list:
- ORDER_DATE from DMORD(T1)
- SHIP_DATE from DMORD(T1)
- PROD_NUM from DMINV(T3)
- STORE_CODE from DMORD(T1)
- EMPID from DMSALE(T2)
- LINEPRICE from DMORD(T1)
- QUANTITY from DMORD(T1)
- Click the Insert
Columns
button.
The SQL Calculator
opens.
- For Alias, type LINCOGS and
for calculation, expand DMPROD. Double-click QUANTITY, type*,
then expand DMINV and INVINFO.
Select COST so that the Expression window
shows T1.QUANTITY * T3.COST.
- Click OK to
close the SQL calculator.
The Column Selection window should appear
as in the following image.
- Click OK to
close the Column Selection window.
- Drag the target object DSFACT from
the ibisamp application directory to the workspace to the right
of the SQL object.
- Right-click DSFACT and
select Target Transformations.
The Transformations
window opens.
- Double-click ODKEY,
adding it to the Expressions window. With the line selected, click
the Edit Transform
button.
The
Transformation Calculator window opens.
- Click the Functions tab,
expand the Data Source and Decoding folder,
and double-click DB_LOOKUP.
The Lookup
function assist opens.
- Click the ellipsis button
after Lookup Synonym. Select the synonym DSDIMTIME from
the ibisamp directory and click Select.
- Under Lookup fields,
click the Add/Delete button. Select ORDER_DATE from
the left hand side and TDATE on the right
hand side. Click OK to close the dialog
box.
- Click the ellipsis button
after Return Field.
The Lookup Field dialog box opens.
- Double-click TKEY to
select it.
The Lookup window opens.
Click OK to
close the window.
- Repeat steps 12-17
above for SDKEY using SHIP_DATE as the source field, TDATE as the
lookup field, and TKEY as the return field.
- Similar transformations
are required for each of the key columns.
Use the following tables:
CKEY
- DSDIMCUST PKEY - DSDMINPROD SKEY - DSDIMSALE
For the rest
of the key columns, the Source Field and Lookup Field are pre-selected since
the names are the same in the Fact and Dimension tables. In addition, transformations
are required for the remaining columns.
When you have created
the remaining transformations, the Transformations window should
appear as shown in the following image.
- Click OK to
close the window.
- Click Save
as from the Quick Access Toolbar. Enter dsloadfact as
the file name.
Example: Loading a Star Schema Using a Parallel Group
In
order to load the fact table in this example, all the dimension
tables have to be loaded first so that the surrogate keys are available.
- In the DMC, right-click
an application directory in the navigation pane, select New and
then Flow. The data flow opens in the right
pane.
- Click the Process
Flow tab. The view switches to the process flow with
a Start icon.
- On the Flow tab,
in the Insert group, click Parallel Group and
drag it into the work area to the right of the Start icon.
A box
appears on the screen. This is an empty parallel group.
- Right-click the Start icon
and drag a line towards the parallel group box.
Release the line
when the arrow is touching the box.
- From the navigation
pane, drag the DSXLOADCUST flow inside the
parallel group box. If you did not create this data flow, drag
the DSLOADCUST flow from the ibisamp directory.
- From the ibisamp
directory, drag the DSLOADPROD, DSLOADSALE,
and DSLOADTIME flows into the parallel group.
- The DSLOADTIME flow
requires a parameter of the first date to load. Double-click the
flow to open the properties for the flow. For Parameters, enter STARTDATE=20040101.
- On the Flow tab,
in the Insert group, click Wait and drag
it into the work area to the right of the parallel group.
- Right-click the parallel
group, drag a line to the Wait icon, and release.
- Drag the DSXLOADFACT flow
to the right of the Wait icon. If you did not create this data
flow, drag the DSLOADCUST flow from the ibisamp
directory.
- Right-click the Wait icon,
drag the line to the DSLOADFACT flow, and release.
When you have
finished, your process flow should appear as shown in the following image.
- Click Save
as from the Quick Access Toolbar. Enter dsxflowp as
the file name.
- On the Flow tab,
in the Run group, click Run and select Submit to
run the flow.
- When the flow completes,
on the Flow tab, in the Reports group, click View Last
Log.
The process flow log opens, as shown in the following
image.
The log shows the four dimension data
flows run and the fact table load, which runs when the data flow
runs are complete. The lines in blue indicate links to the detail
logs for the individual flows.
- Click the
fact table log, which is the last blue line shown. The detail log
opens, as shown in the following image.