OpenXML in word processing – how to merge multiple word documents into one using altChunk
In my previous blogs Custom XML part – mapping flat data and Custom XML part – mapping structured data I have discussed custom XML parts and how to use it to bind custom data in document. Please read these blogs before going through this one and if you are new to this topic I reccomend to start with introduction to OpenXML in ABAP. Here I will try to show how we can merge multiple word documents into one final document. By merging in this case I mean putting separately stored word documents files into one big word file going one by one.
There are two ways how to do this. You can write a program which will read whole documents tag by tag and put his into other document. During this process you will have to deal with plenty of issues like comments, styling, margins, footers, headers and many other things. Something like you would do when manually copying content of one word document into other using Ctrl+C and Ctrl+V. Second much easier way is to use tag called altChunk.
AltChunk stands for alternative chunk. This will do most of the job instead of you. It is very useful and powerful technique in comparism to 1st option. AltChunk tag tells application to import content stored in alternative part of document into main document part where tag can be used. Microsoft Word is able to import couple of content types with tag AltChunk. You can import html, rtf, xhtml, xml, textplain, macro, word template or another word document. We are interested in last option.
I investigated if SAP offers any standrad funcitonality what can be used to “altChunk” more documents into one final and I came into conclusion that there is none. Standard ABAP class which work with word document in openXML approach is CL_DOCX_DOCUMENT. But it does not offer options needed to use altChunk technique.
In order to altChunk one document into other you have to:
- create new alternative part with unique ID for document (merging document)
- store a content of document which you want to merge/import in newly created alternative part
- create new relation in final document between main document part (final merged document) and alternative part(merging document)
- add altChunk tag in main document part
To be able to do this we will enhance, copy and change couple of standard classes. Before you want to use any code from this blog you have to go through steps in altChunk – preparation of code where I describe all preparation steps to use altChunk.
Once finished with preparation code part let’s prepare some test documents for merging. I created 3 documents for merging and 1 empty document which is empty.
merge1.docx with simple text “Hello World.”
merge2.docx with simple text and section break – next page.
merge3.docx with simple text.
final.docx as empty document – make sure it has no size 0.
Documents are ready so we can test following code.
*&---------------------------------------------------------------------* *& Report ZALTCHUNK *&---------------------------------------------------------------------* *& Report demonstrates altChunk usage in ABAP *& Pavol Olejar 23.4.2017 *&---------------------------------------------------------------------* REPORT zaltchunk. DATA: lr_merge1 TYPE REF TO cl_docx_document, lr_merge2 TYPE REF TO cl_docx_document, lr_merge3 TYPE REF TO cl_docx_document, lr_final TYPE REF TO zcl_docx_document, lr_main TYPE REF TO zcl_docx_maindocumentpart, lr_altpart1 TYPE REF TO cl_docx_alternativeformatpart, lr_altpart2 TYPE REF TO cl_docx_alternativeformatpart, lr_altpart3 TYPE REF TO cl_docx_alternativeformatpart, docx TYPE xstring, mainx TYPE xstring, lv_id TYPE string, s TYPE string, lv_current_chunk TYPE string, lv_replace TYPE string, lv_length TYPE i, lt_data_tab TYPE STANDARD TABLE OF x255. * READ final document. Note we are using z-class. PERFORM load_file USING 'C:\final.docx' CHANGING docx. lr_final = zcl_docx_document=>load_document( iv_data = docx ). lr_main = lr_final->get_maindocumentpart( ). * ADD alternative parts lr_altpart1 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). lr_altpart2 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). lr_altpart3 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). * Read document to be merged/inserted PERFORM load_file USING 'C:\merge1.docx' CHANGING docx. * Provide data to store in alternative part lr_altpart1->feed_data( iv_data = docx ). * REPEAT for 2nd and 3rd file PERFORM load_file USING 'C:\merge2.docx' CHANGING docx. lr_altpart2->feed_data( iv_data = docx ). PERFORM load_file USING 'C:\merge3.docx' CHANGING docx. lr_altpart3->feed_data( iv_data = docx ). * Get xml of main part to insert altChunk tags using string operations mainx = lr_main->get_data( ). CALL FUNCTION 'CRM_IC_XML_XSTRING2STRING' EXPORTING inxstring = mainx IMPORTING outstring = s. lv_id = lr_main->get_id_for_part( lr_altpart1 ). CONCATENATE '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. lv_id = lr_main->get_id_for_part( lr_altpart2 ). CONCATENATE lv_current_chunk '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. lv_id = lr_main->get_id_for_part( lr_altpart3 ). CONCATENATE lv_current_chunk '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. * Prepare alt chunk tags CONCATENATE '<w:body>' lv_current_chunk '</w:body>' INTO lv_replace. * Replace body tag REPLACE FIRST OCCURRENCE OF REGEX '<w:body>.*</w:body>' IN s WITH lv_replace. CALL FUNCTION 'CRM_IC_XML_STRING2XSTRING' EXPORTING instring = s IMPORTING outxstring = mainx. * Provide new main part with alt chunk tags and save document lr_main->feed_data( iv_data = mainx ). docx = lr_final->get_package_data( ). lv_length = xstrlen( docx ). CALL FUNCTION 'SCMS_XSTRING_TO_BINARY' EXPORTING buffer = docx TABLES binary_tab = lt_data_tab. CALL METHOD cl_gui_frontend_services=>gui_download EXPORTING bin_filesize = lv_length filename = 'C:\final_new.docx' filetype = 'BIN' confirm_overwrite = 'X' CHANGING data_tab = lt_data_tab. FORM load_file USING path TYPE string CHANGING docx TYPE xstring. CALL METHOD cl_gui_frontend_services=>gui_upload EXPORTING filename = path filetype = 'BIN' IMPORTING filelength = lv_length CHANGING data_tab = lt_data_tab EXCEPTIONS OTHERS = 19. CALL FUNCTION 'SCMS_BINARY_TO_XSTRING' EXPORTING input_length = lv_length IMPORTING buffer = docx TABLES binary_tab = lt_data_tab EXCEPTIONS OTHERS = 2. ENDFORM.
Result final_new.docx document should looks like this:
In this example you can see that altChunk imports content of alternative part one by one. When 1st and 2nd documents were merged their content is going one by one. With 3rd document I used section break – new page tag which moves its content to new page.
There is lot of room to play with it and take this as very simple example how to use it. You can play with page orientation, footers and headers, page numbering or different section breaks to achieve what you need.
Also note that if you open final document (after creating it using ABAP) in word application and save it again then altChunk tags will be gone. All imported contents will be saved under standrad tags and relation to alternative parts will be also gone.