Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
VijayetaSharma
Active Contributor

Introduction

This document will explain address cleansing using custom cleansing package which include cleansing of CITY, COUNTRY and PINCODE

Custom Cleansing Package :

In custom cleansing package , developer can parse the data  &  define static standard forms as well as Regular Expressions.

Scenario example

Suppose, User have address list of customers  from  legacy database, and that address list is like –

Street

Location

1 1 113 Germantown Ave

PHILADELPHIAS US 19115

Under Location column we have combination of “CITY COUNTRY and PINCODE”. Now User has a requirement that in target database Location should be broken into 3 parts / columns and data should be stored like:

Street

City

Country

Postal Code

1 1 113 Germantown Ave

PHILADELPHIA

US

19115

Also in target database User want to have standardization of data like:

1. Postal Code: Consider only those postal codes which are 5 digits long.

2. City & Country grouping: There should be one single convention for country & city. For example in legacy data, user has “US “and “United States”. But in target, User just wants to see data in one convention i.e. United States .

Therefore, conversion of non stand forms into standard form.

Cleansing Package Builder: Data cleansing can be done through information steward under Cleansing Package Builder tab, as shown in below screen shot:

Screen shots:

  1. Log in  Information Steward and click Cleansing Package Builder.
  2. Under Cleansing package builder , click on New cleansing Package drop down  and  Custom Cleansing Package as shown below :

3.  Create Custom cleansing Package: Give the desired name to cleansing package. Also a sample file is also required on which will include sample data of source system .

Browse the file as shown below:


4.  Select the sample file from the local system and click Open

Sample file data will be available while creating CP and standardization.

5. Select the column which you want to parse and standardize. In this case we have “LOCATION ” which holds the list of (CITY, COUNTRY and PIN CODE)


6. Select all rows to analyze while building CP and click NEXT


7. Select the way by which you want to parse the data. In our scenario , we want to parse data with white spaces

Click NEXT


8. Create a Category under which you will define attributes like CITY, COUNTRY, etc.

Click FINISH.

9. Custom cleansing package is created as shown. But we still need to standardize the data


10. Once you double click the CP_Cust , you will get below screen :


11. Add Attributes under Location. Click add to create attributes of LOCATION that is CITY COUNTRY and PINCODE


Adding COUNTRY

Adding PINCODE

13. Once you have added attributes you can see those attributes appearing under “INPUT SAMPLE RECORDS” tab.

Below screenshot shows three different columns CITY , COUNTRY and PINCODE:

Now Data needs to be filled under these three columns with the help of “Input Sample Records”.

As shown in below screen shot PHILADELPHIAS should come under CITY column, US under COUNTRY and 19115 under PINCODE

Also, US needs to be replaced with United States (Standard form)



14. Select “CITY” and add Standard forms(which user want to see in target system)

Click add to include PHILADELPHIAS as standard form as shown below:


As and when PHILADELPHIAS added under standard forms same will be reflected under Variations and more Values can be added for one standard form. Suppose in legacy system there are chances that few records are with PHILADELPHIA , So this will also come under variation.

Click add to include PHILADELPHIA under variation panel

Also in above screen shot, it can be seen that now under CITY column PHILADELPHIAS is appearing.

In similar fashion it is required to add all the CITY names (which are expected to be in Source System) under Standard Form and their respective Variations.

Once standard forms and their Variations included for CITY, data will appear under CITY column as shown below :

15. Once standard forms are included for CITY, in similar way standard forms for COUNTRY needs to added.

In the Input Sample data we have US & we want it to appear as UNITED STATES as a standard form .

Select COUNTRY and click add against Standard Forms  as shown below:

16. Include UNITED STATES under standard Forms and enter

17. Include US under Variation and click enter  as shown below:

18. Under COUNTRY tab UNITED STATES appearing as shown below :



19. Once standard forms for COUNTRY are added, we move on to cleansing of PINCODE.

PINCODE cleansing is different from CITY & COUNTRY because PINCODE is a 5 digit no. and there will be many combinations for standard forms.

So to make cleansing simple we will use regular expression.

Select  PINCODE and click on Define Context as shown below :

20. When you click Define Context you will get below screen , click add to define Regular Expression

21. Add context for PINCODE

22. Add Component


22. Give name to component as PINCODE_1 and select pattern as shown below :

Click define pattern button

23. Create regular expression and test and validate it, as shown in below screen shot:

For PIN CODE , we need 5 digit number only , therefore , regular expression will be as shown below:

Click OK

24.  Click OK to save the context


25. Once you save the context , you can see the required 5 digit PINCODE appearing under PINCODE column as shown below :


26.  After adding regular Expression for PINCODE click Cleansing Package Tasks as shown below :

27. Publishing Custom Cleansing Package: Once you have created cleansing package , it will be available to you as a local copy and if you want that it should be visible to others and also at BODS side , you need to Publish it.

Select the Cleansing Package and click Publish as shown below

28. Give description and click Publish :

29. Published Cleansing Package will appear in published panel as shown below :


1 Comment
Labels in this area