Reference: |
Change Data Capture loads data targets using only the rows that have changed since the last load. This approach is useful when dealing with large databases that would take too much time and resources to reload completely.
The source table used in this tutorial is an inventory table called dminv that is created by running Create Sample Procedures and Data. The source table is in DB2 or UDB.
The tutorial has the following steps:
The first step is to create a data flow that creates the target table and loads it with data from the current contents of the source table.
Note: Key column(s) are required for loading a table to uniquely identify each row so that Insert/Update/Delete processing can be performed.
To use Change Data Capture, the source database must be configured to use database logging. In most organizations this must be done by a database administrator. To enable the database for logging, see Enabling the Database for Logging.
This example uses dminv as the source table. To create this table, see How to Create Sample Procedures and Data.
The database table must be enabled for logging and the procedure depends on the database being used.
If you are using:
The table must have CDC enabled. This can be done from Microsoft SQL Server Management Studio. You must be a member of the db_owner database role to run the stored procedure. Issue the following commands:
exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'dminv', @role_name = NULL select is_tracked_by_cdc FROM sys.tables WHERE name = 'dminv'
You will see a 1 if the table is enabled, a 0 otherwise.
The table must have capture mode enabled. This can be done from the DB2 Command Center or a stored procedure.
Note: For IBMÂ i users, start the program strsql from the user ID that will run the CDC flows and issue the following commands:
CREATE COLLECTION database CHGPRF CURLIB(database)
The table must have all data columns logged. This can be done from the Oracle SQL Developer or a stored procedure.
Add supplemental log information for an entire database by entering:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
or for individual tables by entering:
ALTER TABLE dminv DATA CAPTURE CHANGES
A synonym for a log table allows reading from the database log records for a selected table. The synonym contains the same columns as the table itself plus three additional columns that identify changes to the table.
This synonym can be used as a source in a data flow with the special load type of IUD Processing. IUD is an abbreviation for Insert/Update/Delete.
Note: A synonym for a log table is not a substitute for a synonym for the table itself. A synonym is still needed for the table to be able to read or write from the table directly.
To create a synonym for a log table, in the DMC:
The Select Adapter window opens.
The Select Synonym Candidates window opens.
Optionally, check Filter by owner/schema check box and for object name enter dminv.
Under the Default synonym name, click DMINV and change the name to DMINV_LOG, as seen in the following image.
The window now looks like this:
Note that the first three columns for the synonym are added automatically and are used for DataMigrator IUD processing. They have the following functions:
In this step, we will create a DataMigrator direct load flow to copy data from the log table to the target table.
Note: Using a direct load flow with the checkpoint option ensures that all rows read from the log are written to the target table before the checkpoint file is updated. This ensures that the flow can be re-run from the original checkpoint if it cannot write to the target table (for example, if the database is not running).
While a data flow can also be used, it does not provide this assurance. The read and write operations are separate so that the checkpoint file is updated when the log is read.
The target properties page opens.
Click the X to close the properties page.
The Transformations window opens.
Click OK to close the window.
To test the Change Data Capture process, make changes to the dminv source table. In this example, three rows are inserted, one row is updated, and one is deleted.
For SQL Server, enter the line:
begin transaction;
insert into dminv values ('2001','Compact Flash',1000,50,25) ; insert into dminv values ('2002','Memory Stick',1000,80,40) ; insert into dminv values ('2003','Memory Stick Pro', 1000,200,100) ; update dminv set QTY_IN_STOCK=500 where prod_num = '2002' ; delete from dminv where prod_num = '2001' ;
commit transaction;
For Oracle or DB2, enter the command:
commit work;
Note that the CDC configuration saves a checkpoint of the last transaction processed. However, when you run a Sample Data report using the synonym for the log table, the checkpoint is not saved.
The following image shows a Sample Data report for the log.
Note that the statistics show the number of rows inserted, updated, and deleted.
Close the report.
Note: To return the dminv table back to its original state, enter the following lines in a stored procedure, where db is the engine used, and run them:
SQL db delete from dminv where prod_num > '2001' ; END SQL db commit; END
The final step in setting up Change Data Capture processing is scheduling the flow so that it runs periodically and processes transactions.
When you request sample data for a log table, a dialog box opens so that you can supply values to override the parameters in the synonym.
For example, you can sample data from a log table without creating a checkpoint by checking the check box in front of &&CDC_CHKPT_SAVE and entering the value NO.
You can also specify where to start reading the log by specifying a value for &&CDC_START. The values are the same as those specified in the synonym.
iWay Software |