Skip to Content

Unstructured Data Processing through Data Services:

  • Text Data Processing is all about being able to take unstructured textual data and turn it into something you can analyze and act on.
  • It allows you to deal with information overload by mining very large corpora of words and making sense of it without having to read every sentence!
  • This article deals with Text Data Processing using SAP Business Objects Data Services with the intension of Text Analytics.
  • Entity Extraction transform available as a part of Text Data Processing of Data Services, helps to extract entities, entity relationships and facts from unstructured data for downstream analytics

Case study:

  • There was a vendor information in a text and email files, which holds European Countries and Cities.
  • Based on the Country name, the country codes are identified and matched with Vendor Master Table LFA1.LAND1

/wp-content/uploads/2013/04/unstrc_data1_203842.jpg

File Format:

  • Create a New File Format with File type as “Unstructured text
  • Enter the File name(s) to process. Also we can use wild-character in this placeholder as *.*

/wp-content/uploads/2013/04/unstrc_data2_203824.jpg

Data Flow:

  • Next in the dataflow, place a Base_EntityExtraction transform of Data Services, after the unstructured file format. Link the transform with the file format.

/wp-content/uploads/2013/04/unstrc_data3_203832.jpg

Base Entity Transform:

  • This transform provides a user friendly GUI interface, having three tabs namely Input, Options and Output. The transform accepts textual format such as a text, HTML, or XML.

/wp-content/uploads/2013/04/unstrc_data4_203833.jpg

  • Check the Options tab and set the Language value accordingly, in my case English. Leave the rest of the options as it is.
  • On the Output tab select the fields of interest. Best practice deals with only the fields STANDARD_FORM and TYPE. By default the output schema of the transform will generate a maximum of 11 fields. We can use them if we want.

/wp-content/uploads/2013/04/unstrc_data5_203837.jpg

Text Analysis – Results:

  • Next we just add a Template Table as target. Run the job and lets check the meaningful text data extracted by the transform

/wp-content/uploads/2013/04/unstrc_data6_203838.jpg

  • Also a little bit of analysis, to know places of Vendors as in the textual data

/wp-content/uploads/2013/04/unstrc_data7_203839.jpg

  • To make more meaningful, in the next data flow the Country Codes are identified and matched with LFA1Table.

/wp-content/uploads/2013/04/unstrc_data8_203840.jpg

/wp-content/uploads/2013/04/unstrc_data9_203841.jpg

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply