Multiple attribute values for one line of data: how to avoid infosets in BW without HANA
I’d like to share my experience and to analyze possible solutions for a BW task which I’ve met on several retail projects.
I think this problem is quite interesting because it’s one of the examples of the general idea: how can we avoid using joins (infosets) on BW systems without HANA.
Let’s imagine that we have POS data (receipts), and every position of every receipt can be related to many(!) promo-actions or discounts. It means that several actions can be applied to the same receipt position. For example, one of them was season discount, the second one – loyalty action from CRM, the third one was bonus-buy action and so on. This data is loaded to BW from POS system (the concrete method is not important in this case, it may be POSDM or abap-proxy or something else).
So, we have the following data:
Users want to filter sales data in the reports by any action (one or many). And of course in the report we have to show full amount from the receipt position, regardless of filter by promo. For example, if user filter report by action P001 than the amount in the report should be 100, but not 33,33.
So, the question is: how to organize such data in BW, which data structure should we make?
There are several different variants:
Variant A. In one infoprovider (infocube or dso) we store receipt positions with quantity, amount and so on, in the second infoprovider we store combination [Receipt Number, Position Number, Promo ID]. Than we combine them by an InfoSet and make a Bex-query on it. This is standard BW method and the easiest data structure, but it has one problem – it doesn’t work on large data volume.
If we have a BW on HANA, then we can make two ADSO and a composite provider which realize join between them. If we make this structure correctly, then this join will work on HANA level and it will be very fast.
But if we work with BW on classis DB, than using infosets will lead to join between two very large tables (because both of them consists position of receipt), and this join will work extremely slow. The size of every table can be tens of millions records and even more. So Bex-query in such case will work for hours or will not work at all.
Variant B. The second variant is to use one infocube and to multiply every line for receipt position during loading data to the cube as many times as the number of actions for this line. In my example it will be:
In such variant we need to use special aggregation for receipt key figures (such as quantity or amount), otherwise they will be multiplied in reports (300 instead of 100).
It is rather bad variant too, because using special aggregation by the dimensions “receipt” and “receipt position” will not allow us to use aggregates. In such case aggregated reports will use the original infocube instead of it’s aggregates, special aggregation will be calculated at application server and the report’s speed will be very bad. Moreover, in such variant we need to multiply number of lines in the infocube, which is already very large. This will increase the load time and disk space used.
Variant С. The variant which I finally chose. This variant is based on the following idea: we make an additional service dimension which stores a combination of promo IDs for every receipt’s position. The code of this dimension is the concatenation of promo IDs separated by some special symbol, for example, by “&”. This dimension is filled during data load and is used for filtering data by any combination of promo-actions.
So, in this variant we should make the following steps:
- Make an organizational restriction about maximum number of promo-actions allowable for receipt position, for example, five.
- Make a new set of characteristics: ZPROMO1, …, ZPROMO5 type CHAR10 like 0RT_PROMO character.
- Make an additional characteristic ZPROMOALL type CHAR60.
- To the end-routine of transformation of POS data (to the infocube or to DSO) we add the code which fill this characteristic.
As a result we get the cube of the following structure:
On the BEX query level:
- On the ZPROMO1 dimension we make input-ready variable ZPROMO_V1 (several single values).
- Make some dummy hidden key figure, inside it we restrict ZPROMO1 dimension by this variable.
- Restrict ZPROMOALL dimension by the exit-variable ZVAR_PROMOALL (below is the full abap-code for it). In this exit we find all values of dimension ZPROMOALL which consist at least one of promo IDs selected by the user in the ZPROMO_V1 variable. This set of ZPROMOALL values is used for filtering.
As a result, user can set filter by any number of promo-actions and the query result will contain all receipt positions, which are related to one or more action from the users’ filter. Then, is necessary, user can add to report separate dimensions “Promo1”, …, “Promo5”. This approach does not require any special aggregation, allows us to use aggregates and therefore provides high reports performance.
So, the best variant is to use BW on HANA and to forget about such problem with joins:) But there are still BW implementations on classic DB, so this information may be useful. I used the last variant (var. C) on two such projects and it really worked.
ABAP-code for user-exit:
if i_step = '2'. Types: begin of ty_promo_all, code type /bic/oizpromoall, end of ty_promo_all. Data: t_promo_all type sorted table of ty_promo_all with UNIQUE key code, s_promo_all type ty_promo_all, s_str type string, l_promo_all type /bic/oizpromoall. clear t_promo_all. loop at i_t_var_range into loc_var_range where vnam = 'ZPROMO_V1'. if loc_var_range-low is not initial. concatenate '%' loc_var_range-low '%' into s_str. select /BIC/ZPROMOALL into l_promo_all from /BIC/SZPROMOALL where /BIC/ZPROMOALL like s_str. clear s_promo_all. s_promo_all-code = l_promo_all. collect s_promo_all into t_promo_all. endselect. endif. endloop. loop at t_promo_all into s_promo_all. clear l_s_range. l_s_range-sign = 'I'. l_s_range-opt = 'EQ'. l_s_range-low = s_promo_all-code. append l_s_range to e_t_range. endloop. endif.