Skip to Content
Author's profile photo Former Member

how do i know the data has to be cleansed

Hi All,

          I am completely new to this and i have a basic question. I was just taking some classes on DS and i came up with a question. How do i know what kind of errors are there in the heterogeneous data sources that iam going to stage. Say if it has 1 million records. I know that i cant go through each and every record and i also think that i cant write a sql code because i dont know what kind of errors it has. Please do let me know how the flow or process actually works?

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Hi Raghu

      You need to define what is valid data, and use a validation transform to validate the data.

      For eg: If customer name can be maximum 30 chars, cannot contain special chars.....create a validation logic for the same,  the data passing through a validation transform are divided into "pass" and "fail" records. You can create such validation rules for all columns of a table in a single validation transform.

      Author's profile photo Former Member
      Former Member

      Thankyou for answering my question. As i said, iam just learning off the videos, so look forward for more questions. 🙂

      Author's profile photo Former Member
      Former Member

      Hi Raghu,

      As told by Debapriya Mandal you can use Validation transform & use its both the options of "Pass" & "Fail" . You attach another table for Fail condition so all the failed records are inserted in that table & you can see & analise the errors.