Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
akirwsky
Explorer

Dear Reader,


This blog describes how to implement fixed length data conversion for full width characters such as Chinese and Japanese. Full-width characters are contained in two bytes and occupy a full-width square. There are some resources explaining how to convert characters with fixed length. And most of them are using XSLT Mapping. This solution did not work for my scenario with a mix of half and full-width chracters, and I realized this solution using XSLT mapping works fine when the data contains ONLY half-width characters. Therefore, I built my solution which works well with BOTH half-width and full-width characters. If you have any questions or suggestions to improve my solution, please kindly write in the comment. Thank you for reading in advance.

Scenario:


In my scenario, data should be sent to the target server with fixed length. And the data contains both half-width characters and full-width characters (Japanese Hiragana and Kanji). To explain why I had to use my solution, let’s compare the output data generated from the XSLT mapping solution with the one generated from my solution.

Field names and the fixed lengths.









































Field Name


Fixed Length (in bytes)


1


Number


8


2


Type


2


3


Date


8


4


Text


30


5


PIC


10


6


Location


6


Sample data


<Table>
<Record>
<Number>00000001</Number>
<Type>A</Type>
<Date>20230701</Date>
<Text>あいうえお</Text>
<PIC>Tim</PIC>
<Location>123456</Location>
</Record>
<Record>
<Number>00000002</Number>
<Type>B</Type>
<Date>20230701</Date>
<Text>漢字-FullWidth</Text>
<PIC>Lisa</PIC>
<Location>123456</Location>
</Record>
<Record>
<Number>00000003</Number>
<Type>C</Type>
<Date>20230701</Date>
<Text>FullWidthひらがな</Text>
<PIC>Mike</PIC>
<Location>123456</Location>
</Record>
:
</Table>

 

Failure Case: XSLT Mapping



 

1. Input sample data


By using the content modifier, input the sample data.


 

2. XSLT mapping


You can refer to this resource how to create XSLT mapping file.

(Fixed Length File Generation Scenario Through SAP Cloud Platform Integration | SAP Blogs)
<?xml version="1.0" encoding="UTF-8"?>                                            
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="yes" indent="yes" method="text"/>
<xsl:strip-space elements="*"/>

<xsl:template match="Record">
<xsl:apply-templates />
<xsl:text>
</xsl:text>
</xsl:template>

<xsl:template match="Number">
<xsl:value-of
select="substring(concat(., ' '), 1, 8)"/>
</xsl:template>
<xsl:template match="Type">
<xsl:value-of
select="substring(concat(., ' '), 1, 2)"/>
</xsl:template>
<xsl:template match="Date">
<xsl:value-of
select="substring(concat(., ' '), 1, 8)"/>
</xsl:template>
<xsl:template match="Text">
<xsl:value-of
select="substring(concat(., ' '), 1, 30)"/>
</xsl:template>
<xsl:template match="PIC">
<xsl:value-of
select="substring(concat(., ' '), 1, 10)"/>
</xsl:template>
<xsl:template match="Location">
<xsl:value-of
select="substring(concat(., ' '), 1, 6)"/>
</xsl:template>
</xsl:stylesheet>

I tried with other character codes such as Shift-JIS and MS932.

Shift-JIS



<?xml version="1.0" encoding="Shift-JIS"?> 

MS932



<?xml version="1.0" encoding="MS932"?> 

 

3. Check the output data


As you can see, the first characters of each PIC field’s value are not aligned exactly. I used Sakura Editor to make it easy to count the bytes of characters. -> Releases · sakura-editor/sakura (github.com)

UTF-8


Shift-JIS



MS932


 

Successful Case:


This solution uses three content modifiers and a groovy script. These three content modifiers are for passing parameters to the groovy script. If you want to manage the parameters in the script, you can remove these three content modifiers. I implemented in this way with three content modifiers so that anyone who don't know how to code can just copy and paste my code.


 

1. Input sample data.


(Same as the previous way)

2. Set fields names in the properties, using content modifier.


You need to use the same field names and seuquence of the fields as the ones in the input data.


 

3. Set fields fixed length in the properties, using content modifier.


For each field, type fixed length. And use the same seuquence of the fields as the one in the previous step. For example, the 'length1' should be the length of the 'field1' property in the previous content modifier. Make sure to use the fixed length in bytes, but not in characters. Again, even though one half-width character and one full-width can be considered as one character, they have different length in bytes. One half-width character is one byte, but one full-width character is two bytes. Therefore, you always need to convert chracters with fixed length in the unit of bytes.


 

4. Set other parameters such as the record name and the number of fields.


Set a closure name of the record and the number of fields. In this demonstration, because the sample data has '<Record></Record>' for the closure name of the records, type 'Record' in the source value of 'RecordName'. And because we have 6 fields, type 6 in the source value of 'Number OfFields'.


 

5. Write groovy script to convert characters with fixed length.


Take all the headers and properties we prepared in the previous steps and use those to iterate over the records and fields properly. This code will pad each field value with spaces to convert it with the fixed length in the unit of bytes.
import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
import java.lang.Math;

def Message processData(Message message) {
//take a message body, properties, headers
def body = message.getBody(java.lang.String) as String;
def properties = message.getProperties();
def headers = message.getHeaders();

//Get the number of records
String record_name = headers.get("RecordName")
String[] lines_record = body.split("<"+record_name+">", -1);
int num_records = lines_record.length;

//Get the number of fields
int num_fields = Integer.parseInt(headers.get("NumberOfFields"));

//you will send this new message body
def new_body = new StringBuffer();

//iterate over records
for(int r=1; r<num_records; r++){
//iterate over fields
for(int f=1; f<=num_fields; f++){

//take the field value
String field_name = properties.get("field"+String.valueOf(f));
String[] lines1 = lines_record[r].split("<"+field_name+">", -1);
String[] lines2 = lines1[1].split("</"+field_name+">", 2);
String field_value = lines2[0];

//take the fixed length for the field
int fixed_length = Integer.parseInt(properties.get("length"+String.valueOf(f)));

//you will keep the padded field value here
String padded;

//calculate the number of spaces to pad
//you need to choose a right character code for your language. Ex) Japanese -> Shift_JIS
int bytes = field_value.getBytes("Shift_JIS").length;
int num_to_pad = fixed_length - bytes;

//if padding is necessary for the fixed length
if (num_to_pad > 0){
//pad the field value with spaces
padded = field_value + new String(new char[num_to_pad]).replace('\0', ' ');
}

//case when bytes of the field value exceeds the fixed legnth
else {

//take current bytes and a current length of the field_value
int cur_bytes = bytes;
int cur_len = field_value.length();

//trim the last exceeding characters one by one
while (cur_bytes > fixed_length){
cur_len = field_value.length();
field_value = field_value.substring(0, cur_len-1);
cur_bytes = field_value.getBytes("Shift_JIS").length;
}

//if the last trimmed character was full-width, you need to pad it with one more space
if ( cur_bytes != fixed_length)
field_value = field_value + new String(new char[1]).replace('\0', ' ');

//set in the padded
padded = field_value;
}

//append in the new_body
new_body.append(padded);
}

//append a line break (CR/LF)
new_body.append('\r\n');
}

//set the new body back to the message body
message.setBody(new_body);
return message;
}

 

6. Check output file.


As you can see, the first characters of each PIC fields and of Location fields are aligned beautifully.

Labels in this area