Skip to Content

Hi

Iam going to explain the process of reading the RSS Feed using BODS 4.1.

To begin with, we need to know the basic knowledge about the Text Data Processing. Text data processing is primarily used to pull the unstructured data i.e by reading the URL of the particular feed and importing the data in to a structurted layout.

///////Step-by-Step approach is given below for the reference//////.

I have taken an RSS XML file which is taken as the input to read the data. Check the link (http://www.rss-tools.com/rss-directories.htm) where you can extract the XML file and convert to either an DTD or the XSD format which is supported by BODS application and used as the input source. Here this blog illustrate only the static version, such as placing a file in the local system and retrieving the records in to the file

Converting the XML file to XSD/DTD file

Conversion process is carried on using the several techniques. I am using the trang.jar technique to convert the .xml script to .xsd script. This .jar file has to be embedded with the Java platform. I have extracted the jar file and placed within this location C:\Program Files\Java\jdk1.7.0_21\jre\lib since the jar file is compatible with the java environment and the Java Edition is placed within the specified drive.

using the Command prompt type in the required command to perform the conversion process – jar trang.jar <args><File name> where <args> specifies the input and the output argument. Input and Output argument can be specified by using the – I <input parameter type>  and – O<output parameter type>. Input parameter is XML and the required output paramter needed is XSD.

                                         

so the syntax can be as java -jar trang.jar -I xml -O xsd <File name> which is executed using the command prompt

Placing the XSD file in to the Data services

To begin with the process, we need to embed the XSD file in to the BODS application. Below are the process to embed the XSD file.

To create a new XSD file. Click on  the LOCAL OBJECT LIBRARY menu and right click on the XML Schema to create a new schema.

                

FIG 1

                                                    Ca1.JPG

Once the file has been converted to XSD, try to import it in the designer, Steps are illustarted to import the required file in to the designer.

FIG 2

                                                      Capture.JPG

Here the Fig 2, tells about the ways to import the XSD file. In the similar pattern, we can even import the DTD file. Root element name defined the parent node and the associated childs will be listed in a nested relational. Once the file has been imported it gets placed in the local object library menu under the XML schema tab. Drag and Drop the newly created schema in to the work area to extract the structured entities from it. Place the Query transformation to extract the relevant fields from the input( XML file).

FIG 3          

                                                      Ca2.JPG

  Click on the “Unnest with sub-schemas” in the schema out region to unnest the pattern. To fragment the unstructured data we use this technique. Now all the data is fragmented. To parse the data we use platform Entity Extraction 

FIG 4    

                                                       Ca3.JPG                                     

In the Entity Extraction platform under the Input tab, embed the field title present in the Schema In region to the TEXT available. Create a rule each time to add the fsm file in to it. Text Analysis deals with the segregation of records and the sentiment deals with the bad and good characteristics of a particular object.

FIG 5    

Add the fsm file available from the path SAP BusinessObejcts –> Data Services –> Text Analysis –> languages. Have selected three fsm file to parse the records based on the requirements. In the Output tab, select all the fields. Target is placed with an Template XML to populate the record. can even use an normal template file to populate the records in a structured way

                                                           Ca4.JPG

                                                            

FIG 6       

In the Designer , the final workspace would appear like this (Screen shot attached below) . Now the data can be previewed in the source as well as in the target end. Now validate the job and check it. If no error occurs, then execute the job.

                                                               Ca07.JPG

FIG 7

Now the final output can be previewed in the glass pane

                                                                  Ca08.JPG

                        

Here the data collected from the unstructured to the structured layout. I have taken just a single file and tried out. Will share some few tips if i generate to read the multiple files from the RSS Feed or twitter.

Best

Sanjay                                  

To report this post you need to login first.

4 Comments

You must be Logged on to comment or reply to a post.

  1. Former Member

    Thanks for sharing. Can I check if BODS does support dynamic RSS? Meaning to say it allows creation of batch-job to pull information of RSS, auto-convert, then parse the text?

    Thanks.

    Quang.

    (0) 
    1. Former Member Post author

      Hi

      You can read the dynamic RSS text. The process is once you read the dynamic rss have to place it in a staging area, and from there you can collect it and process it with the batch jobs.

      Best

      Sanjay

      (0) 
  2. Venkata Ramana Paidi

    Thanks Sanjay,

    For sharing nice document.

    Could you please one rss xml for source. I unable to find rss xml file. I used one but it contains the multiple root elements rss,category and new Dataset. I have selected rss as root element. But while running the job  I am getting error.

    924    5340    XML-240108    11/8/2015 4:06:45 PM    An element named <copyright> present in the XML data input does not exist in the XML format used to set up this XML source in

    Thanks & Regards,

    Ramana.

    (0) 

Leave a Reply