Using JOIN to Clean Multiple Addresses

Using JOIN allows cleaning multiple addresses contained in any data source for which you can create a synonym.


Top of page

Example: Cleaning a Table of Addresses

This example shows how to clean a table that contains address data by using a JOIN to pass parameters to AddressDoctor.

  1. Create a data flow. Right-click on an application directory and select New, and then click Flow.
  2. Add the source table. Right-click the left hand side of the Data Flow and click Add Source. In the Select Synonym window Look in drop-down menu, select ibisamp. Scroll down and select persinfo.
  3. Add the AddressDoctor synonym as a source. Expand the application directory where you created the AddressDoctor synonym, and drag the synonym to the left side of the data flow.
  4. A Join object is automatically added. If not in, go to Tools, Options, Data Flow, and select Add Join Object. Then delete the synonym and add it again.

    Flow

  5. Double-click the Join object. The Join Editor opens.
  6. Under Right Source, right-click Name and click Segment and Field View.
  7. Under Left Source, click PIN, and under Right Source, click KEY1. Click to add the fields to Join Conditions. Note that this field is not cleansed by AddressDoctor, it is used for identification.
  8. Repeat for each of the following pairs of fields:

    T1.STREETNO

    T2.STREET1

    T1.APT

    T2.STREET2

    T1.STATE

    T2.PROVINCE1

    T1.POSTALCODE

    T2.POSTALCODE1

    T1.CITY

    T2.LOCALITY1

    T1.COUNTRY

    T2.COUNTRY1

  9. In the Join Type drop-down menu, select Left Outer Join. When this is complete and your window resembles the following image, click OK.

  10. Right-click the SQL object and click WHERE Filter.
  11. The Where Filter Calculator opens. Enter this condition, then press Ok.
    T1.COUNTRY = 'USA'
  12. Right-click the right side of the Data Flow, select Add Target, and then click New.
  13. A new target object is added to the Data Flow. Right-click on the object and select Properties and change the database, synonym name, and other options as desired.
  14. Click the Run button and wait for the flow to complete.
  15. Right-click the target object and click Toggle. Click the Sample Data tab to see the results, as shown in the following image.

    Note that while some of the addresses were corrected (PROCESSSTATUS of C4), others could not be (PROCESSSTATUS of I2 and I4). From this, we can conclude that this file has a lot of bad addresses.


iWay Software