Skip to Content
Author's profile photo Tobias Trapp

Sometimes Legacy Integration is bizarre….

Usually we do data exchange using a modern generic syntax like XML that contains information  about encoding in the XML declaration. Those text documents can even contain binary data in Mime-64 parts for example. But this technology is still young and most solutions in legacy environments use  different techniques. They work quite well, but integrating them is no fun.

Some time ago I had to solve following task: A SAP System had to create a file that had to be transferred to an EBCDIC dataset with variable line length on MVS mainframe. In fact this sounds simple, but this file should contain packed Cobol numbers and  we should avoid heavy post processing. That means that we shouldn’t write a Cobol program that knew the record structure of the dataset. The latter aspect made it difficult.

Packed Cobol Integers and the Joy of FTP

Let me briefly sketch how packed Cobol numbers work. A packed integer is a hexadecimal number. If you want to “pack” the number 123 you get the value 0x123c, where “c” means a positive sign. In other words 123 will be packed to the two bytes 12 and 3c.

Before I describe the ABAP solution in detail I have to mention pesky details of FTP. Of course you know that there are two modes for file transfer: ASCII mode (=text mode) and binary mode. In ASCII mode you can use the QUOTE SITE command to define the codepages of source and target system and you will have an explicit character conversion.

If you want to create a dataset of variable line length on MVS this is the easiest approach  because CR/LF will trigger the beginning of a new line of the target dataset. If we would chose  this FTP mode to solve the task above described is it very likely that the file will be corrupted  after FTP because packed Cobol integers will be translated according code page conversion, too.

Even if you we decide to create an EBCDIC file in SAP to avoid translation problems it is important  that we choose the right codepage because there are many EBCIDC codepages. Therefore we use transaction SCP.

But there is a second problem: nearly all FTP servers I know support unicode in ASCII (=text) mode  and they interpret the bytes 0x0d00…. as beginning of unicode sequence and will delete following 0x00  after 0x0d so that out file will be corrupted if there occur two packed Cobol integers after each other.  And there is a third problem: even if we would find a FTP program that supports the EBCDIC NEL (=new line) character  (that is 0x15 or u0085 in unicode) and triggers a new line in the target file, this would collide with our packed Cobol numbers and triggers a new line every time the numbers 15 occur in a packed Cobol integer.

This approach does not work, so we have to create a file in the same codepage of the target system  and use FTP in binary mode. On the one hand we have the advantage that no bytes will be translated and no bytes will be lost, on the other hand we have no chance to tell MVS when a new line should start.  The consequence is that the result on MVS is a dataset with a constant number of characters each line  (in fact it will be the maximum possible number of bytes of the allocated dataset). But it is even worse: we have x0d0a bytes in the target file (these are the line break characters of the source file)  which have in fact not the meaning of CR/LF in MVS – so they corrupt our file.

But this problem can be solved with a very simple post processing on MVS: we copy the dataset with  fixed line length into one with variable length and create a new line every time we detect CR/LF. In fact this is what we using REXX as a scripting language for.

ABAP and Files with Legacy Codepage

As I mentioned above file transfer is the most difficult part of this problem. Creating files in a legacy codepage is simple in ABAP: OPEN DATASET dsn FOR OUTPUT IN LEGACY TEXT MODE ENCODING CODE PAGE cp .

For the solution above we have to create a binary dataset. Therefore we translate the text content using the (nowadays obsolete) ABAP command: TRANSLATE c TO CODE PAGE cp or better we use the conversion class CL_ABAP_CONV_IN_CE.

The creation of packed Cobol integers is very easy. I show how I solved it under SAP release 4.6C:

By the way, you can verify that the same problems occur in a UNIX world where you have  only LF (0x0a) as newline. Fortunately above solution works as well in this context, too.

What can we learn?

The most important thing is that you should try to avoid ancient and proprietary technology. Packed Cobol numbers in data exchange are a really bad idea in my opinion. Nearly every data exchange format defines standards to code numbers in a text format, think of EDIFACT or  XML-based standards for example. Files that contain text characters as well as binary parts will cause trouble.

If you write an SAP application I suggest you to implement a special class for output  generation so that you are flexible to change the encoding and endian information of output: If this information will be used in many parts of your programs you will have to work hard  to find all occurrences in case the output mode is changed.

I suggest to write those classes for file-read and -write operations so that they are unicode-enabled. If they are not they will probably contain errors. But this rule applies to any kind of ABAP programming.

If you start to implement an application integration project that uses file transfer we should  take care about codepages issues. In fact SAP supports more than 390 different codepages. Until there is no homogenous standard like unicode in your IT landscape codepage conversions will be the cause many trouble. And if you do data exchange with external partners this \ will even get worse…

Does XML solve our Problems?

There are people who believe that XML is a silver bullet. Of course XML is an important technology especially in data exchange. But in legacy integration other problems might occur: at first parsing XML with Enterprise Cobol on z/OS is no fun and there are some restrictions. And unfortenately the problems with new line character is not solved completely in XML 1.0. These are some of the reasons for the creation of XML 1.1.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member
      Hello Tobias,

      reading about your experiences certainly makes one sensitive about the pesky problems that arise when one tries to make systems interoperable as an afterthought. Considering that today’s cutting edge technologies will cause tomorrow’s legacy integration problems, do you think it’s possible to identify “stable” technologies that work well now and will probably continue to work in a few years as well?


      Author's profile photo Tobias Trapp
      Tobias Trapp
      Blog Post Author
      Hi Achim,

      that's a good question... I think with XML we have a generic syntax and with unicode there are enough characters even if we join a united federation of planets 😉 So if we are compliant to those standards we have chances to make it better - unfortunetaley z/OS has has some problems.

      In my example the problems occurred because we had mixed text and binary content. With XML we can solve this problem (if the target platforms supports this standard, of course), but at the moment we replace the file-transfer by very complicated technologies and link the documents to generic transfer and message protocols. And now it's getting complicated: the "Collaboration-Protocol Profile and Agreement Specification Version 1.0" of ebXML consists of 90 pages, the "ebXML Business Process Specification Schema" consists of 136 pages, the "Message Service Specification" of ebXML is 90 pages long, the "ebXML Registry Information Model v2.0" has 60 pages, the ebXML Registry Services Specification v2.0" 128 pages and so on.

      In fact we add semantic to messages by defining envelopes that contain information for routing, encryption, signatures, integration into business processes. And we hope that our integration technology is smart so that it can handle everything. Of course this is better than the opposite: there are lots of eBusiness standards that don't adopt this "layer concept" but integrate some (but not all) aspects mentioned above within their business-specific "vocabulary".

      At the moment we are starting to ask what a message or a document is or what it should be at least. There is a very interesting blog (About Electronic Documents, Business Documents and Messages) that gives answers. One is that a electronic document is a set of characters and it is self contained so that it can be understood on it's own and does not change its meaning within a different context. This implies that the "document" character does not depend from the message and transportation layer.

      This is right, of course, but it's ironic if you see it from the view of my weblog: in legacy file transfer a business document doesn't change its semantic in relation to a certain transport layer (say TCP/IP for instance). In fact this was out of question in my scenario (it was only difficult not to scramble the file).

      To come to the end: if you use new technologies then new problem will occur, even some problems that are thought to eliminated in the past.