Improving Data Loading Performance
You can improve performance when loading data into the
data target. Keep the following suggestions in mind.
Prior to load options. Before
you load data into an existing table (for most RDBMS targets), DataMigrator
gives you the option to Delete all rows from table or Truncate table.
When deleting the rows, the relational database internally logs
the rows deleted in case a ROLLBACK is issued, which for large tables
can be slow. Truncating the table does not do any logging, so it
is much quicker, but then a ROLLBACK cannot be issued.
relational database and FOCUS targets, DataMigrator lets you specify
the number of rows before a COMMIT (or CHECK) is issued. The default value
is 1000. A higher value generally improves throughput, but this
depends on the memory available on the server, the database, the
number of columns loaded, and their sizes. Increasing the commit
value too much can slow performance. Try different values to see
what works best.
is the default load method. For a relational database target:
include duplicates is selected each incoming row is sent to the
target table without first checking to see if the key value is found.
This generally provides the best loading time. If there is a duplicate
key, the database will reject the row with a unique constraint violation.
When reject or update duplicates is selected, the keys of each incoming
row are matched against the target table. This additional step does
slow processing, but allows for matched rows in the target table
to be updated. In this case, care should be taken to use as few
columns as keys as possible to uniquely identify the row. Note that
if no columns are identified as key columns, the entire column is considered
to be the key, which slows processing.
Using insert records from memory. This
load option uses a database vendor bulk or array insert API when
available. This enables inserting a block of rows at a time. The
default block size is 1000 rows, but a higher or lower value may
provide better throughput, depending on the memory available on
the server, the database, the number of columns loaded and their
sizes. Try different values to see what works best.
incoming rows are not matched against the target table as with insert/update
logic, so the number of rows inserted or rejected are not recorded.
There are no flow statistics available.
This method loads
faster than the default load logic. For most databases, this is the
fastest method, when the query can use automatic passthrough (so
no intermediate flat file is created).
Using Bulk Load Utility using a disk file. This
load option uses a database vendor bulk load program. As
the name implies, this method always creates an intermediate flat
file. It is still faster than the default load logic.
The options available depend on the database. For most databases,
it is not quite as fast as Insert
Records from Memory, when that option is available. However,
for some databases designed for data warehousing (such as Sybase
IQ and IBM Redbrick Warehouse), this is by far the fastest option.
Additional tuning options are available depending on the database.