Using JOIN to Clean Multiple Addresses
Using JOIN allows cleaning multiple addresses contained
in any data source for which you can create a synonym.
Example: Cleaning a Table of Addresses
This
example shows how to clean a table that contains address data by
using a JOIN to pass parameters to AddressDoctor.
- Create a data flow. Right-click on an application directory
and select New, Flow.
- Add the source table. Right-click on the left hand side of the
Data Flow and select Add Source. In the Select
Synonym window Look in pull-down, select ibisamp.
Scroll down and select persinfo.
- Add the AddressDoctor synonym as a source. Expand the application
directory where you created the AddressDoctor synonym, drag the
synonym to the left side of the data flow.
- A Join object is automatically added. (If not in Tools, Options,
Data Flow, check Add Join Object. Then delete the synonym and add
it again).
- Double click on the Join object. The Join Editor opens.
- Under Right Source, right-click on Name and
select Segment and Field View.
- Under Left Source, click on PIN, and
under Right Source, click on KEY1. Click
on to add the fields to Join Conditions.
Note that this field is not cleansed by AddressDoctor, it is used
for identification.
- Repeat for each of the following pairs of fields:
T1.STREETNO
|
T2.STREET1
|
T1.APT
|
T2.STREET2
|
T1.STATE
|
T2.PROVINCE1
|
T1.POSTALCODE
|
T2.POSTALCODE1
|
T1.CITY
|
T2.LOCALITY1
|
T1.COUNTRY
|
T2.COUNTRY1
|
- In the Join Type pull-down, select Left Outer Join.
When you are complete and window looks like this, click OK.
- Right-click on the SQL object and select WHERE Filter.
- The Where Filter Calculator opens. Enter this condition, then
press Ok.
T1.COUNTRY = 'USA'
- Right-click on the right side of the Data Flow, select Add Target, New.
- A new target object is added to the Data Flow. Right-click on
the object and select Properties and change
the database, synonym name, and other options as desired.
- Click on the run button and wait for the flow to complete.
- Right-click on the target object and select Toggle.
Click on the Sample Data tab to see the results.
Note that some of
the addresses were corrected (PROCESSSTATUS of C4) others could
not be (PROCESSSTATUS of I2 and I4). From this, we can conclude
that this file has a lot of bad addresses.