OpenXML in word processing – Introduction and how to use it in ABAP
In first part of this blog I give introduction to OpenXML in word processing. In second part I will provide ABAP code how to read word files.
Starting with Microsoft Word 2007 when you create new document in word and save it – a new file is created with extension “*.docx”. This file represents zipped xml files which describe whole word document. It includes, texts, tables, font sizes, colors, comments, margin settings, sections settings and everything what user manually placed and maintained in document. It is all about xml files bounded via relations one with each other in specific structure and zipped into file.
To explore this structure create your test document with something in it and save it. Rewrite extension “*.docx” into “*.zip” and unzip file. After unzpipping you see all xml files in specified structure. If you need to check and have a look at these xml files often I reccomend more convenient way. I suggest to install OOXML Tool which is add-on for Chrome browser. In easy drag and drop way you can see whole word document.
For example I created Test.docx with text “Hello World”. Note that until you provide any input in word it has size of 0. I drag word file into chrome using above mentioned add-on to see xml structure of word docuemnt. I look for /word/document.xml to see text tag which holds value “Hello world”.
Each xml file describes properties for document parts or relation between parts. For example:
- Conten_types xml describes type of content used in each part of whole document(package)
- _rels part describes relation between two parts
- doc properties part describe general properties of document in app and core xml file (application, author, version…)
- custom xml part is part which can hold customer specific data – this will be more described in other blog
- content of document is in /word/document.xml file
- fontTable xml contains information about used font types
- styles xml describes used styles
SAP provides class CL_DOCX_DOCUMENT which can help us to read and modify word document and go through its structure. Here is simple code which does the job..
*&---------------------------------------------------------------------*
*& Report ZDOCX_DOCUMENT
*&
*&---------------------------------------------------------------------*
*& Report demonstrates using CL_DOCX_DOCUMENT class to read and maintain
*& word document.
*& Pavol Olejar 23.4.2017
*&---------------------------------------------------------------------*
REPORT zdocx_document.
DATA: lv_length TYPE i,
lt_data_tab TYPE STANDARD TABLE OF x255,
lv_docx TYPE xstring,
lv_string TYPE string,
lv_xml TYPE xstring,
lr_docx TYPE REF TO cl_docx_document,
lr_main TYPE REF TO cl_docx_maindocumentpart.
* Upload file
CALL METHOD cl_gui_frontend_services=>gui_upload
EXPORTING
filename = 'C:\Test.docx'
filetype = 'BIN'
IMPORTING
filelength = lv_length
CHANGING
data_tab = lt_data_tab.
* Get XSTRING format from BIN table
CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
EXPORTING
input_length = lv_length
IMPORTING
buffer = lv_docx
TABLES
binary_tab = lt_data_tab.
* Instanciate word document in ABAP class CL_DOCX_DOCUMENT
CALL METHOD cl_docx_document=>load_document
EXPORTING
iv_data = lv_docx
RECEIVING
rr_doc = lr_docx.
* Get main part where content of word document is stored
lr_main = lr_docx->get_maindocumentpart( ).
* Get data (XSTRING) of main part
lv_xml = lr_main->get_data( ).
* Convert to string for simple maintaining
CALL FUNCTION 'CRM_IC_XML_XSTRING2STRING'
EXPORTING
inxstring = lv_xml
IMPORTING
outstring = lv_string.
* Change text
REPLACE FIRST OCCURRENCE OF 'Hello world.' IN lv_string
WITH 'Hello world. This is my Test_new.docx document.'.
* Convert back to XTSRING
CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
EXPORTING
text = lv_string
IMPORTING
buffer = lv_xml.
* Replace main part with new data and save it
lr_main->feed_data( iv_data = lv_xml ).
lv_docx = lr_docx->get_package_data( ).
* Save new word document locally
lv_length = xstrlen( lv_docx ).
CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
EXPORTING
buffer = lv_docx
TABLES
binary_tab = lt_data_tab.
CALL METHOD cl_gui_frontend_services=>gui_download
EXPORTING
bin_filesize = lv_length
filename = 'C:\Test_new.docx'
filetype = 'BIN'
confirm_overwrite = 'X'
CHANGING
data_tab = lt_data_tab.
Methods get*part of class can provide different parts of document. Inhere we were interested in main part.
Method get_data( ) will give you back xml file from the part and using method feed_data( ) you store xml in used part of the document. These methods are part of every class which represents different parts of documents. For example In our case it is CL_DOCX_MAINDOCUMENTPART. See in debugger
Method get_package_data( ) of class CL_DOCX_DOCUMENT will save all current parts and pack them into zip file.
You can check that in debugger when looking at variables lv_xml and lv_docx using view XML browser. For variable lv_xml you see xml file of main part.
For lv_docx you are prompt with pop-up if you want to save zip.file which is result of get_package_data( ) method.
In my next blog I will describe custom part of word document and how ABAP developer can use it.
Excellent blog. A good ABAP tools for listing the package is available. check program ROPENXML_LISTER. Also I have enhanced CL_DOCX_ALTERNATIVEFORMATPART to allow for DOCX and MHT files as altChunks. I have written few transformations to update chart data and the replace content controls. Hope to write a blog about it sometime soon.
Thanks very much. Blog about writing transofrmations related to this topic would be really nice.
Hello colleagues,
as fas as I understand this is only possibe, if the gui frontend services are available, right?
Are there also solutions if we work with UI5, OData and HANA, means without any SAP Gui Services?
Regards
André
Hi Andre,
Maybe I can imagine if you do this stuff in a way of web service which can be consumed by your application this might be possible. But take this only as possible option to explore. I do not have much experience with this. Actually Java nad C offers much more possibilities in more convenient way how to work with altChunk. To be able to process this in ABAP first I had to explore sample codes in C and Java – you can find really plenty of those on web.
Regards
Pavol
Hello Pavol,
thanks for you answer, but I do not understand how Java and C comes in here. I searching for a ABAP solution without SAP Gui installed. We use NetWeaver 7.50 with OData (Gateway) and because of this no SAP Gui.
Regards
André
Hi Andre,
I was not probably clear. I am not so familiar with SAP UI5 but as far as I know javascript is big part of it. Which is also not my cup of tea - yet :). So it came into my mind that instead of relying only on ABAP, try implement some kind of "Service" which does the job and call it from your UI5 application. And this service does not need to be limited for ABAP and you can use maybe Java, C to code it. Then make this service avalilable. But I repeat take this only as suggestion - it is not something I tried before.
Regards
Pavol
Hello Pavol,
you are right JavaScript is used in UI5, but there we can’t convert. We are searching for a solution where we can convert an xstring, which is an excel file, in ABAP to ABAP internal tables.
We get the xstring using OData in our ABAP stack and need to convert it here into ABAP internal tables. We have no Java or C in our stack. So we need a ABAP solution. Because of this I ask if your solution is usable in ABAP, but without SAP GUI.
We can't use cl_gui_frontend_services, but I'm not sure if we can use cl_docx_document. Is it possible to use this class cl_docx_document also without having cl_gui_frontend_services in place?
Regards
André
Hi Andre,
Once you have XSTRING it is all you need as input for class CL_DOCX_DOCUMENT. If you have this class (standard or Z-one) in your ABAP stack you can use it without SAp GUI I think. Just try it.
Regards
Pavol
Thanks, I will try it
You can do that. use the xstring with cl_docx_document class to parse through and get the sheet part(CL_XLSX_WORKSHEETPART) and get the sheet data in xml format. you can then use a custom ST or XSLT to transform the xml to internal table.
Thanks
Hi Pavol,
Lets say i have an excel file with with extension "*.xlsx" which has multiple sheets within it. Can i read read of those excel sheet using OLE functionality and then convert each sheet's BIN string into a ZIP XML using CL_XLSX_DOCUMENT class? My requirement is to read an excel which has multiple sheets and then update some values in those sheets and then convert it back to zip xml and upload that whole file to front end webdynpro applicaiton.
Thanks,
Aditya.
Hi Aditya,
As already mentioned somewhere I have not used this tecnique with Excel, only with Word application. SO cannot help ypu here.
Pavol.
Hi Pavol.
Excellent blog, I am trying to follow you, but I have a problem, when I want to download the document, some paragraphs or sentences that I defined in the template disappear or in some cases the paragraphs, even some data from the custom XML are showing incomplete.
Do you know any about it?
Thanks.
Hello,
I think it should be withiut issues if you try only upload, read content and download doc file without changing it. Maybe you are doing mistakes when trying to change content. Try it first without changing content if it works. Really hard to tell with so less information.
Pavol
Hello Pavol,
Thank you for the blog. I've found it very informative. However class CL_DOCX_FORM is included in DEPRECATED package meaning the class is deprecated itself. What could you suggest in that case?
Regards,
Dima
Hello Dima,
I have checked this classes in my system and they are placed in package which is not depricated. I am running on S4 HANA instance.
Regards
Pavol