Skip to Content
Technical Articles

SEARCH AND REPLACE using Regular Expression in Data Service 4.X , User_Defined Part-2

In my Previous post( I explained How to useĀ  User_defined Transform.

Here I am going to explain the Code which i used to Remove special Characters from input field

Follow up to step 5 (Create INPUT/OUTPUT fields.)of previous post and then

*. CODE:-

1. import package re(Regular Expression)

2. define Local variable

3. Fetch Input field value in Local Variable.


re.sub:- re package name, sub function name used for search_replace .

Ex. re.sub(“\W”,””,dct[u’SALARY])

“\W” :- Regular Expression indicates characters except Alpha Numeric [^0-9a-zA-Z_]

so in this example we are replacing all Non alphanumeric character with BLANK””.

5. Set Output field with modified value of local variable.

In this Example I replaced special character (character except [0-9a-zA-Z_]) using \W with BLANK.

Similarly we can use following Pattern and can replaced with our desired value …

Eg:- To Replace White Space with Underscore( _ ) , Expression will like this:- re.sub(“\s”,”_”,COLUMN NAME)

And you can use any range itself in expression like to replace Alphabet with 0 :-re.sub(“[a-zA-Z”,”0″,COLUMN NAME)


I would like to share more function of Regular expression using Python in further Posts.

You must be Logged on to comment or reply to a post.
  • Kamal thank you for your work, please don’t stop.

    DS scripts very poor and limited.

    Python is DS – as enhancement platform (or tool) – looks very pretty.

  • Other than to show a simple example of Python, I don’t understand why you would do this in Python, you can accomplish the same thing in DS scripts if desired (regex is supported by match_regex).

    Mikhail, while I laud your passion for Python, Data Services scripting is quite verbose and can accomplish nearly every task I have attempted with it. I have even written a scheduler inside of DS script that had considerably more functionality than the base OS interfaced schedulers. I have done base conversions, bitwise operations and other complex logical tasks. Given that it’s primary task is for ETL, how do you find it lacking?

    Note that I am not against Python and I do encourage support of it for set based operations, the one limitation I find in script functionality for DS .

    • Ernie – Data Services scripting  very poor, by examply – does not support foreaches and so on.

      Data Services scripting – it’s for simple transformation scenarios. Python – adds flexablity.

      • Mikhail,

        I apologize, I am not sure understand what you mean by foreaches? Are you talking about cursor functionality? If so, then I propose that you are not using the tool correctly, scripts are not meant for set based operations in Data Services. That is what the dataflow transforms are for. Can you give me an example of a task (not an internal operation like a cursor) that you are able to accomplish in Python that you cannot in Data Services transforms? I am open to changing my mind, I just don’t see your use case.

        – Ernie

          • So, in DS script it would be a while loop? The concept of an array does not exist, that type of processing should be done in a table with a data flow.

            – E

          • I don’t use scripts? 100 million records is not that much, should run through normal data flows in 2-3 hours, depending on how much manipulation you need to do. No need for cursors or array processing. Data Services is designed for SET based processing, not walking an array of data. If you use the tool as it was designed, it works much better. Don’t take my word for it. Ask Werner Daehn or Manoj or probably anyone on the DS design team.

            If you are insisting that every ETL should have cursor capability we will simply have to agree to disagree on that point.

            – Ernie