Invalid Characters in Characteristic Values of Infoobjects
It’s very well known that the most errors during data load occur due to invalid characters in data inflow.
The previous my blog was dedicated to invalid characters in the texts of infoobjects. Hereinafter, all is to be said is applied to invalid characters in characteristic values only. Though, provided here information is valid for non-unicode systems, I believe that the big part of it might be applied to Unicode systems also.
It’s amazing that even some very experienced people share some delusions about invalid characters. So, here is a list of myths related to invalid characters in characteristic values.
Myth 3: The SPACE symbol is a special one and has to be added to allowed characters in RSKC (if we want to have the SPACE in characteristic values)
Not all characters are allowed in characteristic values of infoobjects. There is a predefined set of allowed characters. All notes, HowTos and help documents state that this set include the following characters (see, for example, the OSS Note #173241 – “Allowed characters in the BW System”):
As you can see, there is no SPACE symbol. And I remember that there were even threads in BI forums related to inserting the SPACE in RSKC by pressing ALT-0160 or ALT-255.
Actually, the default set of allowed characters DOES INCLUDE the SPACE symbol. You can see it by yourself:
Execute SE37, type in RSKC_ALLOWED_CHAR_GET.
Double click in the statement
E_DEFAULT_CHAR = G_C_ALLOWED_CHAR
on the right part.
You’ll be brought to LRSKCTOP include.
There you can see the default allowed chars, and the SPACE is the first one:
There is also a list of clearly forbidden characters: from hex code 00 to 1F.
Additionally, you may just make a little experiment – typing in some characters with spaces in characteristic value of infoobject of type CHAR:
You’ll see that SPACE symbol is allowed without any special settings in RSKC.
Myth 4: Presence of # sign in RSKC allows all invalid characters
Since invalid characters are often represented in BW as # sign, many beginners believe in this myth.
It’s a delusion. The system uses # character for all hexadecimal values, for which a code page does not know the appropriate character to use. This is the case for invalid characters, and hence, this sign may represent ANY invalid characters in BW messages.
The presence of this sign in RSKC will not prevent failure of data load with invalid characters. It only will allow presenting the # sign itself (hexadecimal code 23) if it is not alone in the value.
Myth 5: ALL_CAPITAL setting in RSKC along with the other special characters solves problems with invalid characters
It’s wrong. Put NOTHING else than “ALL_CAPITAL” in transaction RSKC, otherwise the system won’t recognize it. Only additional special characters set will be recognized. ALL_CAPITAL will be treated simply as letters A, L, _, C, P, I and T to be allowed. Though, these symbols are already permitted by default.
Myth 6: ALL_CAPITAL setting in RSKC allows all (special) characters
As you can see in the picture to Myth 3, characters with hex codes from 00 to 1F are forbidden. These very characters cause the most of problems during data load.
Remember that pressing the keys like TAB, ENTER, arrows, BACKSPACE creates exactly these not permitted codes. Hence, some validation of incoming to BW data must be made if data is created in an application where a user types in information and the application doesn’t check it.
By calling some ABAP functional modules returning l_userdef_char (user defined allowed characters) you’ll not get a set of permitted symbols in case of ALL_CAPITAL setting. You’ll simply get this very ‘ALL_CAPITAL’ string.
‘ALL_CAPITAL’ is interpreted by the other modules that check languages installed in the system.
ALL_CAPITAL_option, as it is said in the How To “Permitted Characters in Characteristic Values” allows you to use all the characters that are capitals in the local language (the language in which the batch processes run). The result is that the loading process becomes language dependent.”
Two questions arise immediately:
1. Is it possible to use lowercase letters with ALL_CAPITAL option?
There is a common belief that with this option only uppercase letters are allowed. That’s not true! If you check ‘Lowercase Letters’ for an infoobject, you can load data with lowercase letters too! Even for foreign languages installed in the system.
2. Since except ‘ALL_CAPITAL’ string you cannot set another characters in RSKC, how to allow the other special characters like #, ~ etc. (not hex 00-1F codes)?
You do not need this. ALL_CAPITAL option takes care about such symbols too!
So, as we see, ALL_CAPITAL is very powerful option and allows the maximum number of signs in BW.
It cannot manage only a small set of exceptions. For completeness of information it makes sense to provide here a quotation from the mentioned How to:
The following characteristic values are not permitted and lead to errors in the system:
- Values that consist only of the character #. This is because blank entries in variables are marked with # (no entry = space).
- Values that begin with the character !. The system deletes these values.
- Control characters with the hex-display 00 to 1F (valid as of BW 3.0A if small letters were previously allowed).
- Characters are not allowed if they are represented by a hex code that has a small letter in one of the installed languages. You could, for example, only allow the capital Ö if no language is installed where the corresponding hex code is a small letter. In this example, it would not be possible to have Russian installed, because the hex code for capital Ö means a small sch in this example.
Special currency indicators
The symbols for the US Dollar ($), British Pound (£), Japanese Yen (¥) and the Euro are not allowed by default.
The symbol for the US Dollar ($) is allowed in BW customizing. The other special currency indicators lead to errors in the system.
In the InfoObject maintenance screens or in the transfer rules, you need to create a transfer routine for any currency fields and other characteristic values that have special currency indicators. This transfer routine converts the invalid characters into valid characters or character sets. This conversion is mandatory for currency fields. The currency codes, into which you need to change the currencies, are stored in BW customizing under General Settings -> Currencies -> Check Currency Codes (or ISO4217). The key must agree with the keys in the currency tables.
Most of the special currency indicators can be assigned to three-character currency codes. For example, the $ dollar symbol is converted into USD for US dollars or AUD for Australian dollars.
How to catch and eliminate invalid characters?
As we can see, the most errors during data load belong to the first three categories provided in the quotation.
How to deal with these invalid characters? The best option is to write an ABAP code and place it into a routine in transfer rules. Here, in the forum I saw many examples of such a code.
I somewhat like the one posted in this thread:
Though, I don’t agree with the statement that “for characteristics to be used for navigation – lower case is not permitted”.
I’ve made an experiment:
- created 3 infoobjects (type CHAR) with lowercase letters flag checked,
- made one of the infoobjects as navigational attribute, and another one as a compounding attribute,
- loaded master data with lowercase letters for this basis infoobject,
- created an ODS (BEx relevant) and a cube with this infoobject,
- loaded transaction data into infoproviders,
- created 2 queries, for the ODS and the cube.
Queries showed results without problems. Drill down by navigational attribute also worked fine. I had ‘ALL_CAPITAL’ setting in RSKC.
I think that the main prerequisite for successful data load is a load of master data before transactional ones.
So, in case of loading data in infoobjects with lowercase letters flag checked, there is no necessity to translate everything to upper case.
Unfortunately, as far as I remember, all the code examples for removing invalid characters posted in the forums deal with any settings in RSKC, EXCEPT ALL_CAPITAL.
One More Code
The code below deals with any settings in RSKC. It doesn’t compare incoming text with permitted characters. It simply replaces by space the following invalid characters:
- forbidden symbols with hex codes 00 – 1F;
- strings started with ! sign;
- strings with the only # sign.
The code assumes that the field for which this routine is created is of CHAR type. Moreover, if lowercase letters may come from the source system, flag ‘Lowercase Letters’ should be checked for the infoobject.
Replace ZZZ by the name of the infoobject in the transfer structure.
FIELD-SYMBOLS: <ic> TYPE x, <tc> TYPE c.
DATA: ch1(32) TYPE x VALUE ‘00200120022003200420052006200720082009200A200B200C200D200E200F20’, ch2(32) TYPE x VALUE ‘10201120122013201420152016201720182019201A201B201C201D201E201F20’.
RESULT = TRAN_STRUCTURE-ZZZ.
- The only # sign is not permitted
IF STRLEN( RESULT ) = 1. IF RESULT(1) = ‘#’. RESULT(1) = ‘ ‘. ENDIF. ENDIF.
- Exclamation mark is not permitted as a first symbol of the field
IF RESULT(1) = ‘!’. RESULT(1) = ‘ ‘. ENDIF.
- Replace Invalid Characters by SPACE
ASSIGN ch1 TO .
- returncode <> 0 means skip this record
RETURNCODE = 0.
- abort <> 0 means skip whole data package !!!
ABORT = 0.
Unfortunately, the code will not work in Unicode system.
Some limited solution may be created by using method CL_ABAP_CHAR_UTILITIES=>GET_SIMPLE_SPACES_FOR_CUR_CP for getting Unicode presentation of TAB, LF and CR.
Hopefully, some ABAPers will come up with the solution for Unicode systems also.
The golden rule in data load in BW:
- Use ‘ALL_CAPITAL’ option in RSKC.
- Load master data first.
If master data SIDs were created successfully, then problems with ODS activation and loads into cubes may arise due to:Lowercase letters. Solution: check the ‘Lowercase Letters’ for appropriate infoobject or apply
TRANSLATE RESULT TO UPPER CASE.
statement in a routine in transfer rules (it might be done in a formula too) if you don’t care that all texts will be in upper case only.
Forbidden characters listed earlier. Solution: apply a routine either posted in forums or provided here.