Intro

In my post SAP Data Services 4.2 SP4 New Features I highlighted some new functionality introduced in Data Services 4.2 SP04. One of the new functionality was the ability for Data Services to mask data. This blog focuses on how you can mask data.

The Data Mask transform enables you to protect personally identifiable information in your data. Personal information includes data such as credit card numbers, salary information, birth dates, personal identification numbers, or bank account numbers. You may want to use data masking to support security and privacy policies, and to protect your customer or employee information from possible theft or exploitation.

Data Mask Example

Here is an example of a very basic data flow with the mask being used.

DataMask.jpg




We then on the input tab indicate fields to mask

Mask_Input.jpg




Then fill in the options tab. For every field you want to mask you must duplicate the mask section. Then assign a field to each mask out section.

Mask_Options.jpg



Then on the output tab choose the fields you want to output

Mask_output.jpg




Here is a sample of the masking by using View Design-Time Data button.

Mask_Sample.jpg







Conclusion

The masking option is a nice addition. Very welcomed. It however feels a bit incomplete. Many places we use masking here in South Africa is to mask credit card numbers, we however only mask for example the middle part of the credit card number. Currently this mask only allows you to say where to start from and it will then mask the remaining characters. So I would like to see an option to stipulate where to start and how many characters to mask.


Also I noticed that when the field names have spaces you get a few errors saying the mapping is not complete. Once I removed spaces in the fields I was error free.


Hope the above helps.


Thanks.



To report this post you need to login first.

14 Comments

You must be Logged on to comment or reply to a post.

  1. Lynne Lintelman

    Hi Louis,

    First thank you for the great write up on our new Data Mask transform.

    You wish has been granted, regarding the ability to mask out the middle portion of credit card numbers.  In Data Services 4.2 SP5 release we have enhanced our Data Mask transform to allow users to mask data that may follow a specific pattern, such as credit card numbers, personal identification numbers, bank accounts and so on.  Users will have the ability to mask out the entire pattern or specific portions of the pattern.

    New Pattern Variance Options, available in Data Services 4.2 SP5:

    Pattern_Options.png

    Pattern_Examples.png

    As you mention, the Data Mask transform is a new transform that was delivered in Data Services 4.2 SP3 and we continue to enhance the functionality.  Please feel free to contact me, via this medium, if you have additional  enhancements you would like to see. 

    Thank you,

    Lynne

    (0) 
  2. Kolli Srinivas

    Hi Louis,

    Thanks for presenting a nice article.

    My concern here is that DS is already having a inbuilt function called encrypt_aes() & decrypt_aes() to mask or safeguard data with in the DS staging level.

    How can this new transform will be different?

    (0) 
    1. Lynne Lintelman

      Hi Kolli,

      This transform is different in lots of ways.  The Data Mask transform allows you to mask sensitive data while keeping the data relevant so it can be used by other systems and allows you to keep the referential integrity.  I’ve added some examples below of what the data mask transform can do for you:

       

      Preserving relationships:
      In a normalized relational database quite frequently we rely on data itself to
      be a key of the table, for example social security number is unique and could
      be the primary key of that table. In this case references to SS# in other
      tables (foreign keys) should match the values of the primary key, otherwise it
      will be impossible to relate the tables. For example one table may contain SS#
      and names, other may contain SS# and purchase history and the tables need to be joined on SS#.

      Preserving the shape of the data set:
      Blanking out sensitive data also makes it hard to do analysis on it down
      stream. For example, changing all birthdates to 1/1/1900 while hiding
      information makes it hard to do demographic analysis. In this particular
      example it may better to scramble the day and month while retaining the year.  Similarly, though we may de-identify zip codes, we want the number of rows mapping to the obfuscated zip code to remain the same so we know how the data is distributed without knowing the actual data

      Keeping the data sensible:
      People consuming the data downstream will find it hard if they are dealing with nonsensical data. For example if we convert a name like ‘John W. Duncan’ to ‘@#R%amkGG87%%’ it is jarring both visually and also creates problems for other programs operating on this data downstream (for example asserting that name should be made of alphabets). It would be better if we convert ‘John W. Duncan’ to ‘Scott T. Smith’ instead. This also holds for address information, ‘TX’ should be scrambled to something like ‘IL’ not ‘ZZ’. Care should be taken that we don’t inadvertently provide wrong data – it would not do if we change a social security number and the new social security number identifies a different but valid person! Another point to keep in mind is the range of the de-identified value. For example, if we want to scramble the salary information of employees that new value should fall within a reasonable range – this could be an absolute number (between 10K and 1000K) or a percentage (+/- 50% of original value).

      Preserving part of the data:
      Quite often only part of the data is sensitive, for example in a credit card
      number the first 6 digits identify country and issuing bank and that can be
      open (and is useful information), but the rest of the digits should be hidden.
      Similarly in an e-mail we may want to hide just the username part while leaving the domain open.

      Preserving the format:

      A simpler requirement (than the sensible data need) is to at least maintain the
      format of the data. For example e-mail should map from
      lynne@sap.com to asdfa@xx_email.com rather than ‘elrjnmkmjnfer##’,
      same thing with social security numbers.

      Sufficient de-identification:
      It is important that information isn’t reconstituted by linking the scrambled
      information with other sources that are not scrambled. For example if in the
      Salary table we obfuscate the names of people but leave their city of residence
      open and if there is another table where the employees and their addresses are listed we can by cross reference infer information in the scrambled data set. For example we have:
      Name: John Smith Sal: $9,000,000 City: Scotts VallyAnd in another table we have Name: John Schwarz City: Scotts Valley and there is only 1 employee in Scotts Valley ergo we know how much John makes. So it is important that combination of multiple fields be considered when creating the anonymization. This is also important when there are a limited number of source values, for e.g. if you want to hide the *** of a person no point mapping M to X and F to Y, it is easily deducible.

      I hope this helps.

      Lynne

      (0) 
  3. Salam Abdelkhader

    Hi Louis,

    how i can use the above platform to mask more than 6 numeric fields, as the above platform has only 6 numeric fields.

    so there is any ability to add new numeric Fields to the same platform or i need to use more than one platform to do the masking in multi-level.

    (0) 
    1. Lynne Lintelman

      Hi Salam,

      An option would be to have two Data Mask transforms in your dataflow.  The first one could mask the first 6 numeric fields and the second transform could mask numeric field7-12.

      I hope that helps answer your question.

      Thanks,

      Lynne

      (0) 
      1. Salam Abdelkhader

        Thank you lynne,

        It seems this is the only solution..  I already use this solution..  i hope if sap enhance this platform so that we can add new fields.

        Thanks

        Salam

        (0) 

Leave a Reply