Iam going to explain the process of reading the RSS Feed using BODS 4.1.
To begin with, we need to know the basic knowledge about the Text Data Processing. Text data processing is primarily used to pull the unstructured data i.e by reading the URL of the particular feed and importing the data in to a structurted layout.
///////Step-by-Step approach is given below for the reference//////.
I have taken an RSS XML file which is taken as the input to read the data. Check the link (http://www.rss-tools.com/rss-directories.htm) where you can extract the XML file and convert to either an DTD or the XSD format which is supported by BODS application and used as the input source. Here this blog illustrate only the static version, such as placing a file in the local system and retrieving the records in to the file
Converting the XML file to XSD/DTD file
Conversion process is carried on using the several techniques. I am using the trang.jar technique to convert the .xml script to .xsd script. This .jar file has to be embedded with the Java platform. I have extracted the jar file and placed within this location C:\Program Files\Java\jdk1.7.0_21\jre\lib since the jar file is compatible with the java environment and the Java Edition is placed within the specified drive.
using the Command prompt type in the required command to perform the conversion process – jar trang.jar <args><File name> where <args> specifies the input and the output argument. Input and Output argument can be specified by using the – I <input parameter type> and – O<output parameter type>. Input parameter is XML and the required output paramter needed is XSD.
so the syntax can be as java -jar trang.jar -I xml -O xsd <File name> which is executed using the command prompt
Placing the XSD file in to the Data services
To begin with the process, we need to embed the XSD file in to the BODS application. Below are the process to embed the XSD file.
To create a new XSD file. Click on the LOCAL OBJECT LIBRARY menu and right click on the XML Schema to create a new schema.
Once the file has been converted to XSD, try to import it in the designer, Steps are illustarted to import the required file in to the designer.
Here the Fig 2, tells about the ways to import the XSD file. In the similar pattern, we can even import the DTD file. Root element name defined the parent node and the associated childs will be listed in a nested relational. Once the file has been imported it gets placed in the local object library menu under the XML schema tab. Drag and Drop the newly created schema in to the work area to extract the structured entities from it. Place the Query transformation to extract the relevant fields from the input( XML file).
Click on the “Unnest with sub-schemas” in the schema out region to unnest the pattern. To fragment the unstructured data we use this technique. Now all the data is fragmented. To parse the data we use platform Entity Extraction
In the Entity Extraction platform under the Input tab, embed the field title present in the Schema In region to the TEXT available. Create a rule each time to add the fsm file in to it. Text Analysis deals with the segregation of records and the sentiment deals with the bad and good characteristics of a particular object.
Add the fsm file available from the path SAP BusinessObejcts –> Data Services –> Text Analysis –> languages. Have selected three fsm file to parse the records based on the requirements. In the Output tab, select all the fields. Target is placed with an Template XML to populate the record. can even use an normal template file to populate the records in a structured way
In the Designer , the final workspace would appear like this (Screen shot attached below) . Now the data can be previewed in the source as well as in the target end. Now validate the job and check it. If no error occurs, then execute the job.
Now the final output can be previewed in the glass pane
Here the data collected from the unstructured to the structured layout. I have taken just a single file and tried out. Will share some few tips if i generate to read the multiple files from the RSS Feed or twitter.