Unstructured Data Processing through Data Services:
- Text Data Processing is all about being able to take unstructured textual data and turn it into something you can analyze and act on.
- It allows you to deal with information overload by mining very large corpora of words and making sense of it without having to read every sentence!
- This article deals with Text Data Processing using SAP Business Objects Data Services with the intension of Text Analytics.
- Entity Extraction transform available as a part of Text Data Processing of Data Services, helps to extract entities, entity relationships and facts from unstructured data for downstream analytics
- There was a vendor information in a text and email files, which holds European Countries and Cities.
- Based on the Country name, the country codes are identified and matched with Vendor Master Table LFA1.LAND1
- Create a New File Format with File type as “Unstructured text”
- Enter the File name(s) to process. Also we can use wild-character in this placeholder as *.*
- Next in the dataflow, place a Base_EntityExtraction transform of Data Services, after the unstructured file format. Link the transform with the file format.
Base Entity Transform:
- This transform provides a user friendly GUI interface, having three tabs namely Input, Options and Output. The transform accepts textual format such as a text, HTML, or XML.
- Check the Options tab and set the Language value accordingly, in my case English. Leave the rest of the options as it is.
- On the Output tab select the fields of interest. Best practice deals with only the fields STANDARD_FORM and TYPE. By default the output schema of the transform will generate a maximum of 11 fields. We can use them if we want.
Text Analysis – Results:
- Next we just add a Template Table as target. Run the job and lets check the meaningful text data extracted by the transform
- Also a little bit of analysis, to know places of Vendors as in the textual data
- To make more meaningful, in the next data flow the Country Codes are identified and matched with LFA1Table.