Usually we do data exchange using a modern generic syntax like XML that contains information about encoding in the XML declaration. Those text documents can even contain binary data in Mime-64 parts for example. But this technology is still young and most solutions in legacy environments use different techniques. They work quite well, but integrating them is no fun.
Some time ago I had to solve following task: A SAP System had to create a file that had to be transferred to an EBCDIC dataset with variable line length on MVS mainframe. In fact this sounds simple, but this file should contain packed Cobol numbers and we should avoid heavy post processing. That means that we shouldn’t write a Cobol program that knew the record structure of the dataset. The latter aspect made it difficult.
Packed Cobol Integers and the Joy of FTP
Let me briefly sketch how packed Cobol numbers work. A packed integer is a hexadecimal number. If you want to “pack” the number 123 you get the value 0x123c, where “c” means a positive sign. In other words 123 will be packed to the two bytes 12 and 3c.
Before I describe the ABAP solution in detail I have to mention pesky details of FTP. Of course you know that there are two modes for file transfer: ASCII mode (=text mode) and binary mode. In ASCII mode you can use the QUOTE SITE command to define the codepages of source and target system and you will have an explicit character conversion.
If you want to create a dataset of variable line length on MVS this is the easiest approach because CR/LF will trigger the beginning of a new line of the target dataset. If we would chose this FTP mode to solve the task above described is it very likely that the file will be corrupted after FTP because packed Cobol integers will be translated according code page conversion, too.
Even if you we decide to create an EBCDIC file in SAP to avoid translation problems it is important that we choose the right codepage because there are many EBCIDC codepages. Therefore we use transaction SCP.
But there is a second problem: nearly all FTP servers I know support unicode in ASCII (=text) mode and they interpret the bytes 0x0d00…. as beginning of unicode sequence and will delete following 0x00 after 0x0d so that out file will be corrupted if there occur two packed Cobol integers after each other. And there is a third problem: even if we would find a FTP program that supports the EBCDIC NEL (=new line) character (that is 0x15 or u0085 in unicode) and triggers a new line in the target file, this would collide with our packed Cobol numbers and triggers a new line every time the numbers 15 occur in a packed Cobol integer.
This approach does not work, so we have to create a file in the same codepage of the target system and use FTP in binary mode. On the one hand we have the advantage that no bytes will be translated and no bytes will be lost, on the other hand we have no chance to tell MVS when a new line should start. The consequence is that the result on MVS is a dataset with a constant number of characters each line (in fact it will be the maximum possible number of bytes of the allocated dataset). But it is even worse: we have x0d0a bytes in the target file (these are the line break characters of the source file) which have in fact not the meaning of CR/LF in MVS – so they corrupt our file.
But this problem can be solved with a very simple post processing on MVS: we copy the dataset with fixed line length into one with variable length and create a new line every time we detect CR/LF. In fact this is what we using REXX as a scripting language for.
ABAP and Files with Legacy Codepage
As I mentioned above file transfer is the most difficult part of this problem. Creating files in a legacy codepage is simple in ABAP: OPEN DATASET dsn FOR OUTPUT IN LEGACY TEXT MODE ENCODING CODE PAGE cp .
For the solution above we have to create a binary dataset. Therefore we translate the text content using the (nowadays obsolete) ABAP command: TRANSLATE c TO CODE PAGE cp or better we use the conversion class CL_ABAP_CONV_IN_CE.
The creation of packed Cobol integers is very easy. I show how I solved it under SAP release 4.6C:
By the way, you can verify that the same problems occur in a UNIX world where you have only LF (0x0a) as newline. Fortunately above solution works as well in this context, too.
What can we learn?
The most important thing is that you should try to avoid ancient and proprietary technology. Packed Cobol numbers in data exchange are a really bad idea in my opinion. Nearly every data exchange format defines standards to code numbers in a text format, think of EDIFACT or XML-based standards for example. Files that contain text characters as well as binary parts will cause trouble.
If you write an SAP application I suggest you to implement a special class for output generation so that you are flexible to change the encoding and endian information of output: If this information will be used in many parts of your programs you will have to work hard to find all occurrences in case the output mode is changed.
I suggest to write those classes for file-read and -write operations so that they are unicode-enabled. If they are not they will probably contain errors. But this rule applies to any kind of ABAP programming.
If you start to implement an application integration project that uses file transfer we should take care about codepages issues. In fact SAP supports more than 390 different codepages. Until there is no homogenous standard like unicode in your IT landscape codepage conversions will be the cause many trouble. And if you do data exchange with external partners this \ will even get worse…
Does XML solve our Problems?
There are people who believe that XML is a silver bullet. Of course XML is an important technology especially in data exchange. But in legacy integration other problems might occur: at first parsing XML with Enterprise Cobol on z/OS is no fun and there are some restrictions. And unfortenately the problems with new line character is not solved completely in XML 1.0. These are some of the reasons for the creation of XML 1.1.