Working With the New Data Model

In this section:

The Core Paradigm
Migrating to the New Architecture
Data Lineage
Load, Recalculate, and Copy (LRC) Loads

As of PMF 5.3.2, the model where each individual measure needs to be loaded as a separate, schedulable entity is no longer applicable. Instead, PMF now allows you to group (and to schedule loads for) your loadable data by its source.

You can specify how that data should be harvested from individual physical data sources. You can:

Harvest field values directly, aggregating the values you acquire by dimensionality. You can either aggregate all matching values or filter only particular sets of values.
Count occurrences of particular flags on one or more fields per record in your source. These counts are automatically kept at the dimensional levels, as indicated on the loadable source.
Distinctly count occurrences of particular flags on one or more fields per record in your source. These counts are automatically kept at the dimensional levels, as indicated on the loadable source. Additionally, a distinct count is applied on the field you indicate.
Create custom formulas to describe how data should be harvested, for more complex harvesting techniques.

You can also specify that some datapoints are acquired from end users. Typically, you do this if you do not already have a physical source for this data, but want PMF to become the system of record for the data and to track it as it is captured.

PMF provides user-input features that allow it to capture, update, validate, and store the data in datapoints for use downstream.

When setting up user-entered sources, specify the level of dimensionality common to all datapoints in a single source group. At capture time, end users can input all datapoints at the same time, increasing the speed and convenience for end users.

Once data has been harvested from physical sources or data has been entered by the user, PMF can regularly recalculate any derived datapoints. Recalculation is performed in lineage order. This means that PMF itself determines which derived datapoints have dependencies, and waits to perform recalculation on any datapoint until all of its precursor dependencies have been resolved.

After all datapoints are reloaded, and derived datapoints are fully recalculated, PMF checks for measure dependencies of these datapoints. It then copies the data, as you designed it, to the measures.

Top of page

The Core Paradigm

The new metrics model in PMF allows you to think differently about measure loads than in previous versions of PMF:

You can now time a single source load to harvest all values needed from any particular external RDBMS system. You no longer have to think about timing multiple separate measure loads to events in external systems. PMF determines the timing for you, harvesting the data at the time you set, and then pushing the data downstream to derivations and then to each dependent measure in the system.
PMF now knows the lineage of each calculation and the order in which each dependency must be calculated, so you do not have to determine dependency order for calculated fields.
PMF can now mix user-input data with loadable data. You do not have to create special interim data tables and user interface schemas to accept user input that needs to be combined on measures or sub-calculations (either as separate measure components or as dependent components in complex calculations).
PMF can now synthesize data at will, using rules that allow you to determine dimensional depth and method of generation. This means that creating demonstrations of new metric models does not require spending time generating data in Excel and figuring out how to load that data into PMF.
Determining dimensional dependencies is now more automatic. Since dimensional dependencies are attributes of sources and datapoints, just connecting datapoints to a measure automatically determines the dimensions and levels that will be available to that measure.

Top of page

Migrating to the New Architecture

In this section:

How the Migrator Works
Should I Upgrade Legacy Measures?

How to:

Perform a Migration

PMF allows you to automatically migrate legacy measure load scripts that were created prior to PMF Release 5.3.2 to the new architecture.

Top of page

How the Migrator Works

If you were a PMF user prior to release 5.3.2, you used legacy measures and set up loads directly on them. There were no sources or datapoints. Automated Migration enables PMF to automatically migrate all old measure components to use the new architecture.

The migrator attempts to find common data sources for your measures, examines the dimensional intersections these use, and binds the ones that have common data sources and common dimensional intersections into sources.
When creating Sources during migration, the migrator names the created sources sequentially. You can rename these sources at any time after the migration completes.
Naming of datapoints is based on the option you choose:
- You can choose to have PMF name the datapoints tied to these sequential sources based on the source name.
- You can choose to name the datapoints after their linked measures.

Note: You can rename the created datapoints at any time after migration is completed.

Top of page

Should I Upgrade Legacy Measures?

Before upgrading legacy measures, take the following into consideration:

If you plan to use the new architecture features, immediately migrate your data mart to use the new architecture. The PMF migrator requires that you take an either/or approach. Once you begin creating new-style measures, PMF will not be able to both preserve them and to migrate your legacy measures for you.
You should always run a snapshot before attempting migration. This will enable you to quickly revert back to using legacy measures and retrying migration with other options if the result is not what you expected.
If you want to continue to use legacy measures unchanged, note that migration to the new architecture is currently an optional step. PMF 8 supports legacy measures, as well as new measures. It is strongly recommended that you upgrade to the new model as the legacy measures are deprecated. The new model will be required in a future release.
Automated migration to the new architecture is not required to upgrade the PMF data mart version. That operation is done when you first start log in to PMF as an administrator after upgrading.

Top of page

Procedure: How to Perform a Migration

Before performing a migration, note the following:

You should always run a snapshot before attempting migration. This will enable you to quickly revert back to using legacy measures and attempting another migration, with other options, if the results of the initial migration are not what you expected.
It is recommended that you use migration using the default options and then examine the results. If the results are now what you expected, restore your backup snapshot and perform the migration again after tweaking the options.
If you were using PMF prior to release 5.3.2, it is not recommended that you add any new sources, datapoints, or measures to PMF until you complete the legacy migration. If there are already existing sources, datapoints, or new measures, the migration operation will quit. You can delete these components before performing the migration using the option available, but this is not advised if you spent a lot of time in the new architecture.

To perform a migration to the new architecture:

From the Manage page, click the Data Mart subtab.
Click Migrate Legacy Measures. The Migrate Legacy Measures to New panel opens.
The following options are available:
Retain the Legacy Load Mode

Enable this option to transfer the deletion option used by the migrated measures into its equivalent default Wipe setting on the sources to be created. This setting is enabled, by default.

Retain Alternate Targets

If you are using Alternate Targets (for example, Benchmark, Stretch, Budget, and Forecast targets), enable this option to make sure that they are migrated.

If you are not using Alternate Targets, it is recommended that you leave this option disabled. This is the default setting.

Delete Measures that do not migrate

If enabled, the migrator will delete any measures that do not migrate. It is recommended that you select this option after doing a test migration and determining that it is safe to do so. This option is disabled, by default.

Note: Generally, legacy measures that are incomplete will not be migrated, since PMF does not have any way to map these into the new architecture. You can always choose not to delete these, manually set them up as new measures, and then delete the unconverted legacy measures.

Delete any existing Sources and dependent objects

If enabled, the migrator will delete any existing sources and datapoints that were created using the new architecture. This will delete any new sources, datapoints, and measures in your system.

If you have this option disabled, and you have new measures, existing sources and existing datapoints, the migrator will produce an error. This is to prevent the overwriting of any new components. This option is disabled by default.

Base Loadable Datapoint names
The following options are available:
- on legacy Measure names. This option names loadable datapoints created in each source using the names of the measures to which the migrator links them.
- on new Loadable Source names. This option names loadable datapoints created in each source using the source name, so that the names contain the Master File information.
Click Migrate.

PMF will perform the migration. This operation can take a few minutes and it is important to wait until it is complete before performing another operation. Once done, a status message will confirm that migration is completed.
Once the migration is done, return to the Manage page and review the results.

Top of page

Data Lineage

Data lineage refers to the entire path of data through the PMF load architecture.

Step 1

For PMF, data lineage starts at the source. Datapoints that are linked to a source are left-side endpoints in the lineage. This means they are harvested directly from:

Physical database tables or views, or various flat file formats.
User entry of data, in situations where there is no physical representation of the data.
Generated data (in the exception where no data exists, and is being created for various demonstration purposes).

Example 1: In the case of a manufacturing company, there might be sources defined to harvest data from systems in Warehousing, Production Line, Quality Control, Shipping and Logistics, Supply Chain/Purchasing, Prospect Management, and Wholesale Sales. PMF harvests data as follows:

Warehousing. Average cost of storage per item, per warehouse.
Production Line. Average cost of labor per item, per plant.
Quality Control. Average cost of testing, per item, per plant.
Shipping and Logistics. Cost to ship, per item, per plant, per customer.
Supply Chain/Purchasing. Cost of supply, per item, per plant.
Prospect Management. Average Cost of Customer Acquisition, averaged per item, per customer.
Wholesale Sales. Sale Amount, per item, per customer.

Step 2

Lineage then proceeds through each generation of derived datapoints. There is no limit to the number of phases possible.

Example 2: Continuing from Example 1, you can derive the following datapoints from those you loaded:

Total Product Cost = Average cost of storage per item + Average cost of labor per item + Average cost of testing, per item
Total Sale Cost = Average Cost of Customer Acquisition + Shipping Cost
Net COGS = Total Product Cost + Total Sale Cost
Profit = Total Sale – Net COGS
Margin = Profit/Total Sale (as a %)

These datapoints need to be calculated in the following order:

Total Product Cost, Total Sale Cost
Net COGS
Profit
Margin

Step 3

Lineage then ends at measures.

Top of page

Load, Recalculate, and Copy (LRC) Loads

Reference:

Checking the Administrative Log Reports

To load measures, the PMF load architecture puts data through three phases:

Load. All sources indicated for Load are loaded, and data is fed into the datapoints for each source.
- Volitional load. The Load button is clicked on any source. In this case, the only source that will be loaded is the one you indicated for load.
- Scheduled direct load. A source load is scheduled to run at that time.
- Scheduled optional cascaded load. If any of the dimensions that are linked to the source are reloaded, a source load could be forced (cascaded), depending on how the dimension Cascade Load settings are configured.
- Scheduled forced cascaded load. If any of the dimensions that are linked to the source are reorganized, a source load will always be forced (cascaded), regardless of how the dimension Cascade Load settings are configured.
Note: During schedule processing, if more than one source has to be loaded during the scheduled run, all scheduled sources would be loaded before the next step runs. This prevents inefficiently repeating the recalculation.
Recalculate. PMF looks at all the sources that were reloaded, and analyzes all derived datapoints with dependencies on the sources that were loaded. PMF then analyzes the lineage of all derived datapoints to determine the correct order to recalculate them, respecting their dependencies. Finally, PMF performs the recalculation step, in phases, with the number of phases determined by the generations of lineage of the derived datapoints.
Copy. PMF analyzes all measures that use the datapoints that were recalculated in Step 2. It then copies the data for the linked datapoints into the measures cube, making the data ready for reporting and dashboard publication.

Top of page

Reference: Checking the Administrative Log Reports

PMF logs all activity that involves load, recalculation, or copy actions. Logged data is stored as peer data in the PMF Data Mart.

PMF captures the following data in its source load logs:

Earliest and latest effective dates for the source data.
What dimensions were processed on the load.
Status of the load:
- Running: The load is currently running.
- Success: The load completed successfully.
- Error: The load operation completed with an error thrown.
How many source records were retrieved to process the load.
How many datapoint records were inserted, updated, and deleted.
How many dimensional linkage mismatches were processed.
Gaps in dimensional intersections.
Datapoint sparsity count (how much of the potential total Cartesian cross product was not loaded).
LRC load type:
- Load a source.
- Load a datapoint.
- Recalculate.
- Copy to measure.
Owner ID under which the load ran.
Start and end time of the load.
Messages that were returned from the server during the load.
Compare mode that was flagged for the load:
- NO_DATA: There is no incoming/parent data ready to process.
- NEW: All ready parent data is new (outside the range of the target data), and all parent data is in the lineage for all incoming datapoints for at least one common time period.
- NEW_W_MISSING: All ready parent data is new (outside the range of the target data), but data is missing for at least one required parent data point in the lineage.
- POSS_UPD: All incoming ready data falls within the range of current data, but there is no ready data for at least one incoming data point. There is no possibility of new rows being generated.
  - For Copy to Measure, there are possible updates.
  - For Recalculation, there are also possible updates if there is available data for all other parents (this has not yet been coded).
  - It is possible that there is some ready data that is new for some parents, but not for all parents. Since we are lacking new data for at least one parent, there can be no new processing.
- POSS_UPD_NEW: Similar to POSS_UPD, except that there is ready data for all parents in the lineage, so it is possible that there is some new data.
  The most likely scenario for this would be that you source loaded EAST for 2012/01 and then separately source loaded WEST for 2012/01. In order to process the POSS_UPD_NEW mode, you need to split out those incoming or parent rows that match the dimensionality existing/child rows versus those that are new. For those that are new, you have to check for gaps.
- MIXED: Similar to NEW, except there is also some ready parent data which overlaps existing data. For incoming data that falls within the range of existing data, it is possible that:
  - The data will not be in the current child data, in which case, the data will be New provided that there is already data in all required parents.
    or
  - It may be that the child data already exists, in which case, we can do C2M update processing OR recalc updates (if there is available data for all other parents).