Removing invalid and non-printable characters in HANA based BW transformation
Most customers know the problem that invalid characters can be found in data records. Data load or DSO activation for char type InfoObjects can fail due to permitted or non-printable characters in BW.
There exist several solutions for this problem. Some customer uses function module calls in ABAP transformations for example RSKC_CHAVL_OF_IOBJ_CHECK or RSKC_ALLOWED_CHAR_GET. Other implement field routines with ABAP like
REPLACE ALL OCCURRENCES OF REGEX '[^[:print:]]' IN RESULT WITH 'replace_char' IGNORING CASE.
or…
IF RESULT(1) = '!' OR return = '#'. RESULT(1) = 'replace_char'. ENDIF.
Now BW/4HANA has the option to push down transformation logic to HANA database, to reduce runtime of data loads. For custom transformations to be pushed down to HANA database, start, end and expert routine type need to be created as AMDP script (ABAP Managed Database Procedure script) instead of an ABAP based routine.
For more details on creation of AMDP Script based transformation you can refer to:
https://blogs.sap.com/2016/05/24/hana-based-bw-transformation/ https://blogs.sap.com/2016/06/17/hana-based-transformation-deep-dive/
For the example we created a sample file with invalid characters ‘!’ and ‘#’ in record 2, 4, 5 and 6, and a non-printable character Tapstop in record 7.
We load the data in a field based write optimized ADSO and want it to be loaded and activated in a standard ADSO with InfoObjects and changelog.
To remove invalid and non-printable characters with an AMDP Script in a field routine, you can follow these steps.
Step 1: Select rule type routine for the transformation rule, see (1). A popup dialog asks for processing type, see (2). Choose AMDP script to create an AMDP script based field routine.
Step 2: an AMDP class will be generated with default method – PROCEDURE and default interface – IF_AMDP_MARKER_HDB. The BW Transformation framework opens the Eclipse based ABAP class editor.
Step 3: Enter your code for the field routine in the body of the method PROCEDURE.
You can use the following template. The code replaces the regular expression ([^[:print:]]|[#|!]) in source field “CHARACTERS” with replacement char ” and transfer the result in the target InfoObject “/BIC/ZSPECHAR”.
If you want to remove invalid characters at the first position only, you can use this template.
If necessary this can be extended with upper case conversion or similar.
After removing the characters in transformation our example data will be loaded without errors and the data activation in ADSO works well.
Here you can find some more information about SQL String function syntax and regular expression:
https://help.sap.com/http.svc/rc/abapdocu_751_index_htm/7.51/en-US/abenregular_expressions.htm
Brilliant, thanks!
Nice work! 🙂
Thanks for the post Antje Rogotzki.
For some reason the below statement shown in second screen print is working in HANA SQl console but not in my in AMDP .It is just removing the # but not replacing with space in my AMDP in BW4HANA .I checked in debugging also.Below screenprint for reference which I am getting from my source .This is causing one record missing in BW4HANA as below columns are keyfileds to one of my infoobject.
Also I see the below warning in AMDP when I use my entries from RSKC in the regular expression range.
Only ASCII 7 bit characters are allowed in AMDP procedures. Control characters are not allowed.
Thanks,
Dharma
If you want to allow accents ….
The range 00C0-017F is the unicode character range ( https://unicode-table.com/en/ ) – the characters you want to allow in addition.
Dear Antje, thanks for your Blogs
I want to delete de character ' " ' into the start routine
The source is:
And the Start Routine is:
METHOD GLOBAL_START BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY.
-- outTab = SELECT * FROM :inTab;
outTab = SELECT REPLACE_REGEXPR('([^[:print:]]|["])' IN "VERSIONVERSION" WITH '' OCCURRENCE ALL) AS "VERSIONVERSION",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "TIEMPOYEAR" WITH '' OCCURRENCE ALL) AS "TIEMPOYEAR",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "TIEMPOMONTH" WITH '' OCCURRENCE ALL) AS "TIEMPOMONTH",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "CENTRO_DE_COSTOID" WITH '' OCCURRENCE ALL) AS "CENTRO_DE_COSTOID",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "CUENTA_DE_MAYORID" WITH '' OCCURRENCE ALL) AS "CUENTA_DE_MAYORID",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "NUMERO_DE_ORDENID" WITH '' OCCURRENCE ALL) AS "NUMERO_DE_ORDENID",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "VALUE" WITH '' OCCURRENCE ALL) AS "VALUE",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "VALUEUNIT" WITH '' OCCURRENCE ALL) AS "VALUEUNIT",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "RECORD" WITH '' OCCURRENCE ALL) AS "RECORD",
REPLACE_REGEXPR('([^[:print:]]|["])' IN "SQL__PROCEDURE__SOURCE__RECORD" WITH '' OCCURRENCE ALL) AS "SQL__PROCEDURE__SOURCE__RECORD"
FROM :intab;
I have the follow Error Message:
Error en clase AMDP: /BIC/SS3S0Q78RZ0VVW5DL4KO_M=>GLOBAL_START: OUTTAB[
Do you have any Idea, what could be wrong?
Dear Antje, thanks for your Blogs
I want to delete de character ‘ ” ‘ into the start routine
The source is:
And the Start Routine is:
METHOD GLOBAL_START BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT OPTIONS READ-ONLY.
— outTab = SELECT * FROM :inTab;
outTab = SELECT REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “VERSIONVERSION” WITH ” OCCURRENCE ALL) AS “VERSIONVERSION”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “TIEMPOYEAR” WITH ” OCCURRENCE ALL) AS “TIEMPOYEAR”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “TIEMPOMONTH” WITH ” OCCURRENCE ALL) AS “TIEMPOMONTH”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “CENTRO_DE_COSTOID” WITH ” OCCURRENCE ALL) AS “CENTRO_DE_COSTOID”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “CUENTA_DE_MAYORID” WITH ” OCCURRENCE ALL) AS “CUENTA_DE_MAYORID”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “NUMERO_DE_ORDENID” WITH ” OCCURRENCE ALL) AS “NUMERO_DE_ORDENID”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “VALUE” WITH ” OCCURRENCE ALL) AS “VALUE”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “VALUEUNIT” WITH ” OCCURRENCE ALL) AS “VALUEUNIT”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “RECORD” WITH ” OCCURRENCE ALL) AS “RECORD”,
REPLACE_REGEXPR(‘([^[:print:]]|[“])’ IN “SQL__PROCEDURE__SOURCE__RECORD” WITH ” OCCURRENCE ALL) AS “SQL__PROCEDURE__SOURCE__RECORD”
FROM :intab;
I have the follow Error Message:
Error en clase AMDP: /BIC/SS3S0Q78RZ0VVW5DL4KO_M=>GLOBAL_START: OUTTAB[
Do you have any Idea, what could be wrong?
In BW4, the easiest way......add a formula in Transformation to the field that you need:
REMOVE_UNPRINTABLE_CHARACTERS(UCASE(name_of_source_object))
Example: REMOVE_UNPRINTABLE_CHARACTERS(UCASE(CLR_DOC_NO))
Solved!
Great job Sebastian!
Solved with the formula.
Thank you very much.
Thanks for the post Antje Rogotzki.
I followed your example, but i get an error when i try to activated it.
AMDP
Error:
Error en clase AMDP: SQLSCRIPT message: some out table variable is notassigned: ERRORTAB, OUTTAB &A0&A1&A2&A3&A4&A5&A6&A7&A8&A9
Nº mensaje: RSDHA374
Could you help me to understand where is the error??
Here is the code: