Skip to Content

Unstructured Data Processing through Data Services:

  • Text Data Processing is all about being able to take unstructured textual data and turn it into something you can analyze and act on.
  • It allows you to deal with information overload by mining very large corpora of words and making sense of it without having to read every sentence!
  • This article deals with Text Data Processing using SAP Business Objects Data Services with the intension of Text Analytics.
  • Entity Extraction transform available as a part of Text Data Processing of Data Services, helps to extract entities, entity relationships and facts from unstructured data for downstream analytics

Case study:

  • There was a vendor information in a text and email files, which holds European Countries and Cities.
  • Based on the Country name, the country codes are identified and matched with Vendor Master Table LFA1.LAND1

/wp-content/uploads/2013/04/unstrc_data1_203842.jpg

File Format:

  • Create a New File Format with File type as “Unstructured text
  • Enter the File name(s) to process. Also we can use wild-character in this placeholder as *.*

/wp-content/uploads/2013/04/unstrc_data2_203824.jpg

Data Flow:

  • Next in the dataflow, place a Base_EntityExtraction transform of Data Services, after the unstructured file format. Link the transform with the file format.

/wp-content/uploads/2013/04/unstrc_data3_203832.jpg

Base Entity Transform:

  • This transform provides a user friendly GUI interface, having three tabs namely Input, Options and Output. The transform accepts textual format such as a text, HTML, or XML.

/wp-content/uploads/2013/04/unstrc_data4_203833.jpg

  • Check the Options tab and set the Language value accordingly, in my case English. Leave the rest of the options as it is.
  • On the Output tab select the fields of interest. Best practice deals with only the fields STANDARD_FORM and TYPE. By default the output schema of the transform will generate a maximum of 11 fields. We can use them if we want.

/wp-content/uploads/2013/04/unstrc_data5_203837.jpg

Text Analysis – Results:

  • Next we just add a Template Table as target. Run the job and lets check the meaningful text data extracted by the transform

/wp-content/uploads/2013/04/unstrc_data6_203838.jpg

  • Also a little bit of analysis, to know places of Vendors as in the textual data

/wp-content/uploads/2013/04/unstrc_data7_203839.jpg

  • To make more meaningful, in the next data flow the Country Codes are identified and matched with LFA1Table.

/wp-content/uploads/2013/04/unstrc_data8_203840.jpg

/wp-content/uploads/2013/04/unstrc_data9_203841.jpg

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Ankit Kumar

    Hi John,

    Thanks for sharing this wonderful article!

    However,I’m stuck at a point where I need your expert advice.

    I have my Unstructured source file(.txt) placed in a SFTP location but when I’m trying to use it as a source in my job,I ‘m getting permission denied error.

    What’s annoying is that the same folder has other files as well(Delimited ones) which I’m able to use without any such error.

    Should I have any different folder/location to place unstructured files?

    Regards,

    Ankit

    (0) 

Leave a Reply