Skip to Content
Technical Articles
Author's profile photo Kamal Kumar

SEARCH AND REPLACE using Regular Expression in Data Service 4.X , User_Defined Part-2

In my Previous post( I explained How to useĀ  User_defined Transform.

Here I am going to explain the Code which i used to Remove special Characters from input field

Follow up to step 5 (Create INPUT/OUTPUT fields.)of previous post and then

*. CODE:-

1. import package re(Regular Expression)

2. define Local variable

3. Fetch Input field value in Local Variable.


re.sub:- re package name, sub function name used for search_replace .

Ex. re.sub(“\W”,””,dct[u’SALARY])

“\W” :- Regular Expression indicates characters except Alpha Numeric [^0-9a-zA-Z_]

so in this example we are replacing all Non alphanumeric character with BLANK””.

5. Set Output field with modified value of local variable.

In this Example I replaced special character (character except [0-9a-zA-Z_]) using \W with BLANK.

Similarly we can use following Pattern and can replaced with our desired value …

Eg:- To Replace White Space with Underscore( _ ) , Expression will like this:- re.sub(“\s”,”_”,COLUMN NAME)

And you can use any range itself in expression like to replace Alphabet with 0 :-re.sub(“[a-zA-Z”,”0″,COLUMN NAME)


I would like to share more function of Regular expression using Python in further Posts.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Kamal thank you for your work, please don't stop.

      DS scripts very poor and limited.

      Python is DS - as enhancement platform (or tool) - looks very pretty.

      Author's profile photo Former Member
      Former Member
      Author's profile photo Former Member
      Former Member

      Other than to show a simple example of Python, I don't understand why you would do this in Python, you can accomplish the same thing in DS scripts if desired (regex is supported by match_regex).

      Mikhail, while I laud your passion for Python, Data Services scripting is quite verbose and can accomplish nearly every task I have attempted with it. I have even written a scheduler inside of DS script that had considerably more functionality than the base OS interfaced schedulers. I have done base conversions, bitwise operations and other complex logical tasks. Given that it's primary task is for ETL, how do you find it lacking?

      Note that I am not against Python and I do encourage support of it for set based operations, the one limitation I find in script functionality for DS .

      Author's profile photo Former Member
      Former Member

      Ernie - Data Services scripting  very poor, by examply - does not support foreaches and so on.

      Data Services scripting - it's for simple transformation scenarios. Python - adds flexablity.

      Author's profile photo Former Member
      Former Member


      I apologize, I am not sure understand what you mean by foreaches? Are you talking about cursor functionality? If so, then I propose that you are not using the tool correctly, scripts are not meant for set based operations in Data Services. That is what the dataflow transforms are for. Can you give me an example of a task (not an internal operation like a cursor) that you are able to accomplish in Python that you cannot in Data Services transforms? I am open to changing my mind, I just don't see your use case.

      - Ernie

      Author's profile photo kamal kumar
      kamal kumar
      Blog Post Author


      foreach is type of loop......

      it will process an Array till the end point of array..


      Author's profile photo Former Member
      Former Member

      So, in DS script it would be a while loop? The concept of an array does not exist, that type of processing should be done in a table with a data flow.

      - E

      Author's profile photo Former Member
      Former Member

      simple scenarios

      you get sql-query script thats return 100 mil records. Execution time - 4 days.

      How you can reduce execution time?

      Author's profile photo Former Member
      Former Member

      I don't use scripts? 100 million records is not that much, should run through normal data flows in 2-3 hours, depending on how much manipulation you need to do. No need for cursors or array processing. Data Services is designed for SET based processing, not walking an array of data. If you use the tool as it was designed, it works much better. Don't take my word for it. Ask Werner Daehn or Manoj or probably anyone on the DS design team.

      If you are insisting that every ETL should have cursor capability we will simply have to agree to disagree on that point.

      - Ernie

      Author's profile photo Former Member
      Former Member

      ok, it's off top for this blog.

      I create discussion for this theme.