OpenXML in word processing – how to merge multiple word documents into one using altChunk
In my previous blogs Custom XML part – mapping flat data and Custom XML part – mapping structured data I have discussed custom XML parts and how to use it to bind custom data in document. Please read these blogs before going through this one and if you are new to this topic I reccomend to start with introduction to OpenXML in ABAP. Here I will try to show how we can merge multiple word documents into one final document. By merging in this case I mean putting separately stored word documents files into one big word file going one by one.
There are two ways how to do this. You can write a program which will read whole documents tag by tag and put his into other document. During this process you will have to deal with plenty of issues like comments, styling, margins, footers, headers and many other things. Something like you would do when manually copying content of one word document into other using Ctrl+C and Ctrl+V. Second much easier way is to use tag called altChunk.
AltChunk stands for alternative chunk. This will do most of the job instead of you. It is very useful and powerful technique in comparism to 1st option. AltChunk tag tells application to import content stored in alternative part of document into main document part where tag can be used. Microsoft Word is able to import couple of content types with tag AltChunk. You can import html, rtf, xhtml, xml, textplain, macro, word template or another word document. We are interested in last option.
I investigated if SAP offers any standrad funcitonality what can be used to “altChunk” more documents into one final and I came into conclusion that there is none. Standard ABAP class which work with word document in openXML approach is CL_DOCX_DOCUMENT. But it does not offer options needed to use altChunk technique.
In order to altChunk one document into other you have to:
- create new alternative part with unique ID for document (merging document)
- store a content of document which you want to merge/import in newly created alternative part
- create new relation in final document between main document part (final merged document) and alternative part(merging document)
- add altChunk tag in main document part
To be able to do this we will enhance, copy and change couple of standard classes. Before you want to use any code from this blog you have to go through steps in altChunk – preparation of code where I describe all preparation steps to use altChunk.
Once finished with preparation code part let’s prepare some test documents for merging. I created 3 documents for merging and 1 empty document which is empty.
merge1.docx with simple text “Hello World.”
merge2.docx with simple text and section break – next page.
merge3.docx with simple text.
final.docx as empty document – make sure it has no size 0.
Documents are ready so we can test following code.
*&---------------------------------------------------------------------* *& Report ZALTCHUNK *&---------------------------------------------------------------------* *& Report demonstrates altChunk usage in ABAP *& Pavol Olejar 23.4.2017 *&---------------------------------------------------------------------* REPORT zaltchunk. DATA: lr_merge1 TYPE REF TO cl_docx_document, lr_merge2 TYPE REF TO cl_docx_document, lr_merge3 TYPE REF TO cl_docx_document, lr_final TYPE REF TO zcl_docx_document, lr_main TYPE REF TO zcl_docx_maindocumentpart, lr_altpart1 TYPE REF TO cl_docx_alternativeformatpart, lr_altpart2 TYPE REF TO cl_docx_alternativeformatpart, lr_altpart3 TYPE REF TO cl_docx_alternativeformatpart, docx TYPE xstring, mainx TYPE xstring, lv_id TYPE string, s TYPE string, lv_current_chunk TYPE string, lv_replace TYPE string, lv_length TYPE i, lt_data_tab TYPE STANDARD TABLE OF x255. * READ final document. Note we are using z-class. PERFORM load_file USING 'C:\final.docx' CHANGING docx. lr_final = zcl_docx_document=>load_document( iv_data = docx ). lr_main = lr_final->get_maindocumentpart( ). * ADD alternative parts lr_altpart1 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). lr_altpart2 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). lr_altpart3 = lr_main->add_alternativeformatpart( iv_content_type = cl_docx_alternativeformatpart=>co_content_type_word ). * Read document to be merged/inserted PERFORM load_file USING 'C:\merge1.docx' CHANGING docx. * Provide data to store in alternative part lr_altpart1->feed_data( iv_data = docx ). * REPEAT for 2nd and 3rd file PERFORM load_file USING 'C:\merge2.docx' CHANGING docx. lr_altpart2->feed_data( iv_data = docx ). PERFORM load_file USING 'C:\merge3.docx' CHANGING docx. lr_altpart3->feed_data( iv_data = docx ). * Get xml of main part to insert altChunk tags using string operations mainx = lr_main->get_data( ). CALL FUNCTION 'CRM_IC_XML_XSTRING2STRING' EXPORTING inxstring = mainx IMPORTING outstring = s. lv_id = lr_main->get_id_for_part( lr_altpart1 ). CONCATENATE '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. lv_id = lr_main->get_id_for_part( lr_altpart2 ). CONCATENATE lv_current_chunk '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. lv_id = lr_main->get_id_for_part( lr_altpart3 ). CONCATENATE lv_current_chunk '<w:altChunk r:id="' lv_id '" />' INTO lv_current_chunk. * Prepare alt chunk tags CONCATENATE '<w:body>' lv_current_chunk '</w:body>' INTO lv_replace. * Replace body tag REPLACE FIRST OCCURRENCE OF REGEX '<w:body>.*</w:body>' IN s WITH lv_replace. CALL FUNCTION 'CRM_IC_XML_STRING2XSTRING' EXPORTING instring = s IMPORTING outxstring = mainx. * Provide new main part with alt chunk tags and save document lr_main->feed_data( iv_data = mainx ). docx = lr_final->get_package_data( ). lv_length = xstrlen( docx ). CALL FUNCTION 'SCMS_XSTRING_TO_BINARY' EXPORTING buffer = docx TABLES binary_tab = lt_data_tab. CALL METHOD cl_gui_frontend_services=>gui_download EXPORTING bin_filesize = lv_length filename = 'C:\final_new.docx' filetype = 'BIN' confirm_overwrite = 'X' CHANGING data_tab = lt_data_tab. FORM load_file USING path TYPE string CHANGING docx TYPE xstring. CALL METHOD cl_gui_frontend_services=>gui_upload EXPORTING filename = path filetype = 'BIN' IMPORTING filelength = lv_length CHANGING data_tab = lt_data_tab EXCEPTIONS OTHERS = 19. CALL FUNCTION 'SCMS_BINARY_TO_XSTRING' EXPORTING input_length = lv_length IMPORTING buffer = docx TABLES binary_tab = lt_data_tab EXCEPTIONS OTHERS = 2. ENDFORM.
Result final_new.docx document should looks like this:
In this example you can see that altChunk imports content of alternative part one by one. When 1st and 2nd documents were merged their content is going one by one. With 3rd document I used section break – new page tag which moves its content to new page.
There is lot of room to play with it and take this as very simple example how to use it. You can play with page orientation, footers and headers, page numbering or different section breaks to achieve what you need.
Also note that if you open final document (after creating it using ABAP) in word application and save it again then altChunk tags will be gone. All imported contents will be saved under standrad tags and relation to alternative parts will be also gone.
Hello Mr. Olejar,
awesome blog, very helpful, enjoyed reading it!
However, I still have a question: Is there any method using this type of code (or the cl_docx_docusment class) to include images/ create tables into a single word document?
I have not tried this. But I think you can store any image in document as part of it(alternative part for example) and then you have to place correct tag for image within document. For table you can store own data also in alterantove part and then use table tags to populate table. But I am have not tried this so this can be challenging. With tables I used only technique described in this blog.
thanks for the tutorial. I managed to merge two docx files with your solution.
Unfortunately the merged file does not show page numbers anymore. Do you know how to fix that?
Try to set numbering you need (and other formating) in your final doc. Numbering is not overtaken from alternative part but should be taken from file into which you insert these parts.
thanks for the quick reply. Adding the page number (footer) to my final.docx didn't help. Any other suggestions?
My merged docx contains three footer.xml's now, footer2.xml contains my original footer.
The merged documents are both created with SAP Document Builder, maybe that's also an issue?
Not sure what migth be an issue here. Did you manage to fix?
I didn't manage to fix it. We just left he page numbers out, which is OK for the moment. If I find the time one day, I'll look further into it :-).
Thanks for your help.
I think I found out what was missing. In document.xml.rels my two to-be-merged word documents were using two different reference-ids to include the numbering/footers/etc. (rId13 and rId6). This way my second word file referenced into nothing and no page number was shown.
After fixing it manually everything is working fine now.
I did not find a solution to use the standard classes for this, so I've created my own methods.
Great job, it's really working.
I have some issues like if final.docx having page orientation Landscape, but after merging all documents, the final.docx layout is not the same.
Please help with coding to get layout, page numbers, and most importantly page headers and footers.
I am not sure how you want to influence page numbers, that should be clear through whole merged document. When talking about orientation of pages I suggest to use page breaks when orientation of page is changed. Headers and footers should be similar as page numbers, that should be unified through whole document I guess.
Thank you so much for your reply, did not get your inputs, how can we add page headers and footers dynamically, also how to overcome page orientation issues. Could you please ping me email@example.com. If you have a sample code please fo send for layout, headers, and footers, Really appreciated your help, we have a road block in one of my project.
Page orientation issue can be solved with using page breaks at the end of each docuemnt which will be merged. I do not know how to dynamically work with headers and footers.
Appreciate your help if you can provide ABAP code to insert page breaks after the end of each document while merging in your code.
I did put page breaks manually in document using word processor and not dynamically in coding, so I cannot help you here. Try to study page breaks tags in document.