Skip to Content

Introduction

This document will explain address cleansing using custom cleansing package which include cleansing of CITY, COUNTRY and PINCODE

Custom Cleansing Package :

In custom cleansing package , developer can parse the data  &  define static standard forms as well as Regular Expressions.

Scenario example

Suppose, User have address list of customers  from  legacy database, and that address list is like –

Street

Location

1 1 113 Germantown Ave

PHILADELPHIAS US 19115

Under Location column we have combination of “CITY COUNTRY and PINCODE”. Now User has a requirement that in target database Location should be broken into 3 parts / columns and data should be stored like:

Street

City

Country

Postal Code

1 1 113 Germantown Ave

PHILADELPHIA

US

19115

Also in target database User want to have standardization of data like:

1. Postal Code: Consider only those postal codes which are 5 digits long.

2. City & Country grouping: There should be one single convention for country & city. For example in legacy data, user has “US “and “United States”. But in target, User just wants to see data in one convention i.e. United States .

Therefore, conversion of non stand forms into standard form.

Cleansing Package Builder: Data cleansing can be done through information steward under Cleansing Package Builder tab, as shown in below screen shot:

/wp-content/uploads/2013/04/is_200840.jpg

Screen shots:

  1. Log in  Information Steward and click Cleansing Package Builder.
  2. Under Cleansing package builder , click on New cleansing Package drop down  and  Custom Cleansing Package as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

3.  Create Custom cleansing Package: Give the desired name to cleansing package. Also a sample file is also required on which will include sample data of source system .

Browse the file as shown below:

/wp-content/uploads/2013/04/is_200840.jpg

4.  Select the sample file from the local system and click Open

Sample file data will be available while creating CP and standardization.

/wp-content/uploads/2013/04/is_200840.jpg

5. Select the column which you want to parse and standardize. In this case we have “LOCATION ” which holds the list of (CITY, COUNTRY and PIN CODE)

/wp-content/uploads/2013/04/is_200840.jpg

6. Select all rows to analyze while building CP and click NEXT

/wp-content/uploads/2013/04/is_200840.jpg


7. Select the way by which you want to parse the data. In our scenario , we want to parse data with white spaces

/wp-content/uploads/2013/04/is_200840.jpg

Click NEXT


8. Create a Category under which you will define attributes like CITY, COUNTRY, etc.

/wp-content/uploads/2013/04/is_200840.jpg

Click FINISH.

9. Custom cleansing package is created as shown. But we still need to standardize the data

/wp-content/uploads/2013/04/is_200840.jpg

10. Once you double click the CP_Cust , you will get below screen :

/wp-content/uploads/2013/04/is_200840.jpg


11. Add Attributes under Location. Click add to create attributes of LOCATION that is CITY COUNTRY and PINCODE


/wp-content/uploads/2013/04/is_200840.jpg

Adding COUNTRY

/wp-content/uploads/2013/04/is_200840.jpg

Adding PINCODE

/wp-content/uploads/2013/04/is_200840.jpg

13. Once you have added attributes you can see those attributes appearing under “INPUT SAMPLE RECORDS” tab.

Below screenshot shows three different columns CITY , COUNTRY and PINCODE:

/wp-content/uploads/2013/04/is_200840.jpg

Now Data needs to be filled under these three columns with the help of “Input Sample Records”.

As shown in below screen shot PHILADELPHIAS should come under CITY column, US under COUNTRY and 19115 under PINCODE

Also, US needs to be replaced with United States (Standard form)


/wp-content/uploads/2013/04/is_200840.jpg


14. Select “CITY” and add Standard forms(which user want to see in target system)

/wp-content/uploads/2013/04/is_200840.jpg

Click add to include PHILADELPHIAS as standard form as shown below:

/wp-content/uploads/2013/04/is_200840.jpg

As and when PHILADELPHIAS added under standard forms same will be reflected under Variations and more Values can be added for one standard form. Suppose in legacy system there are chances that few records are with PHILADELPHIA , So this will also come under variation.

/wp-content/uploads/2013/04/is_200840.jpg

Click add to include PHILADELPHIA under variation panel

/wp-content/uploads/2013/04/is_200840.jpg

Also in above screen shot, it can be seen that now under CITY column PHILADELPHIAS is appearing.

In similar fashion it is required to add all the CITY names (which are expected to be in Source System) under Standard Form and their respective Variations.

Once standard forms and their Variations included for CITY, data will appear under CITY column as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

15. Once standard forms are included for CITY, in similar way standard forms for COUNTRY needs to added.

In the Input Sample data we have US & we want it to appear as UNITED STATES as a standard form .

Select COUNTRY and click add against Standard Forms  as shown below:

/wp-content/uploads/2013/04/is_200840.jpg

16. Include UNITED STATES under standard Forms and enter

/wp-content/uploads/2013/04/is_200840.jpg

17. Include US under Variation and click enter  as shown below:

/wp-content/uploads/2013/04/is_200840.jpg

18. Under COUNTRY tab UNITED STATES appearing as shown below :

/wp-content/uploads/2013/04/is_200840.jpg


19. Once standard forms for COUNTRY are added, we move on to cleansing of PINCODE.

PINCODE cleansing is different from CITY & COUNTRY because PINCODE is a 5 digit no. and there will be many combinations for standard forms.

So to make cleansing simple we will use regular expression.

Select  PINCODE and click on Define Context as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

20. When you click Define Context you will get below screen , click add to define Regular Expression

/wp-content/uploads/2013/04/is_200840.jpg

21. Add context for PINCODE

/wp-content/uploads/2013/04/is_200840.jpg

22. Add Component

/wp-content/uploads/2013/04/is_200840.jpg


22. Give name to component as PINCODE_1 and select pattern as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

Click define pattern button

23. Create regular expression and test and validate it, as shown in below screen shot:

For PIN CODE , we need 5 digit number only , therefore , regular expression will be as shown below:

/wp-content/uploads/2013/04/is_200840.jpg

Click OK

24.  Click OK to save the context

/wp-content/uploads/2013/04/is_200840.jpg

25. Once you save the context , you can see the required 5 digit PINCODE appearing under PINCODE column as shown below :

/wp-content/uploads/2013/04/is_200840.jpg


26.  After adding regular Expression for PINCODE click Cleansing Package Tasks as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

27. Publishing Custom Cleansing Package: Once you have created cleansing package , it will be available to you as a local copy and if you want that it should be visible to others and also at BODS side , you need to Publish it.

Select the Cleansing Package and click Publish as shown below

/wp-content/uploads/2013/04/is_200840.jpg

28. Give description and click Publish :

/wp-content/uploads/2013/04/is_200840.jpg

/wp-content/uploads/2013/04/is_200840.jpg

29. Published Cleansing Package will appear in published panel as shown below :

/wp-content/uploads/2013/04/is_200840.jpg

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply