A requirement I was recently tasked with solving was how take an Excel (.xls) file as input and map it to an IDoc for integration with SAP. Now the mapping to an IDoc is the straight forward part so I will not go into those details here. However the part that wasn’t so simple was which was the best approach to converting this Binary .xls document to a readable XML message that could be used within the Pipeline.
I researched a few different options for doing this but eventually settled on finding a solution using Conversion Agent by Informatica. The reason for this decision was because the company that I was implementing this for already had Conversion Agent installed and it is a graphical tool which made maintenance and evolutions easier as you do not need a hot shot java programmer to be able to do this.
Conversion Agent is a third party tool provided by Informatica and has many powerful Parsing (Structured/Unstructured Formats à XML) and Serialisation (XML à Structured/Unstructured Formats) processors that can be utilized with little effort or prior knowledge. You can use Conversion Agent for conversions of PDF, Word, Excel, EDI, HL7 Documents and many more.
For some background and initial tutorials on how to use Conversion Agent please see the following link and in particular the Getting Started Section.
OK so now that you are acquainted with Conversion Agent lets get to the meat of how we actually transform the binary .xls to our XML.
First and foremost you must create the XSD that will be used in the Design Repository as your source Message Type. You do this my going through the normal motions in the Design Repository and then once you have created your Message Type Exporting as an XSD on to your desktop for later import into the Conversion Agent Studio.
Next you need to ensure you have a sample of your Excel file that needs to be Parsed. Once you have the XSD and the Example Source file we can move to the Conversion Agent Studio for the real fun.
So first create a new Parser Project
On the next screen give it a name
Now name your Parser
Name your script
Now specify your XSD
Define how you will provide you sample data
Specify your sample data
Define the encoding here we will choose Microsoft Excel
Now we must decide how we want the Excel file to be processed, so from this Excel what kind of message do we want to work with when parsing. I have chosen XML as this is the only one that doesn’t require Microsoft Excel to be installed on the Server.
Now define your source data format, so almost the same thing as above.
Now we are ready to get into the actual parsing, just click finish and we will go.
You now are presented with your environment for developing your parser. Now with my specific source document my excel consisted of three worksheets within the one workbook.
The sheet I was interested in was the Material List. Fortunately there was an XML tag in the now processed Excel document called Material List. So I have simply defined this as my first Marker. This means that when in PI we receive the Excel Document and it starts to Parse it the first thing it will look for is the Material List sheet and hence will not process the Rows and Cells in other Worksheets.
So as mentioned above I specify this as my first Marker which is also a TextSearch Marker
My next problem was the fact that the information I need within this sheet is actually located from the 4th Row and downwards, so the information in the above rows are not needed and hence shouldn’t be parsed. To solve this I then define another marker which is a place where I identified as being able to confirm the end of the 3rd row. That is to say that everything after this point can be taken into account for parsing.
Now I have done this I come to my rows of data which I need to get into my source xml for processing in PI. These are repeating groups of information so I need to use the repeating group function. Here I define my repeatinggroups as being separated by the XML tag row number.
Now this is done we are ready to add our content. The Content Anchors require three pieces of information: The opening marker to define where the content is located, the example value with the function learn by example, and finally we need to define exactly which field of our XML will hold this data.
You need to repeat this content step for as many times as required to fill each field of the XSD that is needed. You should then end up with something that looks like this:
Finally when you are done with the repeating group content you need to define where it will end, otherwise it will continue to the next worksheet and perform the same conversions. To define an end point again we utilize the Marker. This time I use the closing XML tag for Sheet.
Once you have finished marking your example you can use the function Mark Entire Example. This is used to mark the rest of the source document according to the rules that you defined for the first row.
You can now look through your example source pane and ensure that all of the fields that you would expect to be marker are in fact marked.
If this is to your satisfaction the final test is to run the script on the source file and look at your output XML. To do this first right click the root of your script and select the Set as Startup Component.
Now Run the Script.
You should then see your result appear in your left hand project pane
Simply double click this and you will see the XML that can be expected in your PI pipeline.
This script that you have just created can then be deployed onto your PI server and called via one of your Communication Channels via the Modules Tab.
For information on how to utilize these scripts in one of your Communication Channels please see the following link:
So there you have it, a way of converting an Excel Document into an XML message that can be used in PI.