Dimension design: A different perspective
Objective: The objective of this post is to simplify the understanding on dimension designs of an infocube and to decide upon the dimensions based on the repetition of the data held in the dimension tables.
Pre-requisites: An infocube is already created and active, and filled will data, which will be used for analysis of dimension tables.
Dimension to Fact Ratio Computation: This ratio is a percentage figure of the number of records that exists in the dimension table to the number of records in fact table or what percentage of fact table size is a dimension table. Mathematically putting it down, the equation would be as below:
Ratio = No of rows in Dimension table X 100 / No of rows in Fact Table
Dimension Table Design Concept: We have been reading and hearing over and over again that the characteristics should be added into a dimension if there exists a 1:1 or 1:M relation and they should be in separate dimension if there exists a M:M relation. What is this 1:1 or 1: M? This is the relation which the characteristics share among each other.
For instance if one Plant can have only one Storage Location and one storage location can belong to only one plant at any given point of time, then the relation shared among them is 1:1.
If 1 Functional Location can have many equipment but one equipment can belong to only one functional location then the relation shared between the functional location and Equipment is 1:M.
If 1 sales order can have many materials and one material can exist in different sales orders then there absolutely is no dependence among these two and the relation between these two is many to many or M: M.
Challenges in understanding the relationship: Often we SAP BI consultants depend on the Functional consultants to help us out with the relationship shared between these characteristics / fields. Due to time constraint we generally cannot dedicate time to educate the functional consultants on the purpose of this exercise, and it takes a lot of time to understand this relationship thoroughly.
Scenario: An infocube ZPFANALYSIS had few dimensions which were way larger than the preferred 20% ratio. This had to be redesigned such that the performance was under 20% ratio.
This ratio could be either manually derived by checking the number of entries in the desired dimension table (/BIC/D<infocube name><dimension number>) to the fact table (/BIC/F<Infocube Name> or /BIC/E<Infocube name>) or a program SAP_INFOCUBE_DESIGNS can be executed in SE38 which reports this ratio for all the dimensions, for all the infocubes in the system.
SAP_INFOCUBE_DESIGNS:
We can find from the report that the total number of rows in the fact table is 643850. Dimension 2 (/BIC/DZPFANLSYS2) has around 640430 rows, which is 99% (99.49%)of the fact table rows and Dimension 4(/BIC/DZPFANLSYS4) has around 196250 rows, which is 30% (30.48%)of the fact table rows.
Infocube ZPFANLSYS:
Approach:
Step 1: Analysis of the dimension table /BIC/DZPFANALSYS2 to plan on reducing the number of records.
/BIC/DZPFANLSYS2
Fact table:
Dimension table holds 1 record more than the fact table.
View the data in the table /BIC/DZPFANLSYS2 (Table related to Dimension 2) in SE12 and sort all the fields EXCEPT DIMID. This sorting will help us spot the rows which have repeated values for many columns, which will eventually lead to understanding the relationship between the characteristics (columns in dimension table).
Identifying the relationships:
Once the sorting is done we need to look out for the number of values that repeat across the columns. All the records which repeat could have been displayed in a single row with one dimension id assigned if all the columns had same data. The repetition is a result of one or more columns which contribute a unique value to each row. Such columns if removed from the table then the number of rows in the table will come down.
In the below screenshot I’ve highlighted the rows in green that were repeating themselves with new dimension IDs, as only 2 columns SID_ZABNUM and SID_0NPLDA have new values for every row. These two columns having new values for every row have resulted in rest of the columns repeating themselves and in turn increasing the data size in the dimension table. Hence it can be easily said that these two columns do not belong in this dimension tables, so the related characteristics (ZABNUM and 0NPLDA) need to be removed out of this dimension.
Few rows could be found which repeat themselves for most of the columns, but have a new value once in a while for some columns, as highlighted in yellow in the below screenshot. This indicates that these columns share a 1:M relation with the rest of the columns with repeated rows and these could be left in the same dimension.
Conclusion: The columns marked in green belong to this dimension tables and the columns marked in red needs to be in other dimension tables.
Step 2: Create a copy infocube C_ZPFAN and create new dimensions to accommodate ZABNUM and 0NPLDA.
ZABNUM was added to dimension C_ZPFAN8 and 0NPLDA was added to C_ZPFAN7. These were marked as line item dimensions as they have only one characteristic under them.
Analysed the issue with dimension 4 in the similar way and changed other dimensions to help the situation.
Post changes, loaded the data into the copy infocube C_ZPFAN and found the number of records in the dimension table /BIC/DC_ZPFAN2 to be 40286.
Ratio: 40286 / 657400 * 100 = 6.12 %
SAP_INFOCUBE_DESIGNS:
Dimension2 of the copy infocube: /BIC/DC_ZPFAN2
Even now there a few repeated rows and columns, but the ratio is within 20%. We can create up to 13 dimensions, but it is always better to keep a dimension or two free for future enhancements.
Hope this was helpful.
I have read lot of documents on the same topic but the way you articulated it with the help of real time example makes it unique.
Good job..Thanks for sharing with us.
Regards,
AL
Hi Anshu,
Glad that it helped, and true I had searched a lot of documents myself when I had faced the issue, but this method, even though not much polished, worked like a charm.
Thank you for the encouragement π
Manohar. D
Yah i agree with Anshu, Really well presented and Thanks Manohar for sharing.
Gr8 job.
Regards
KP
Hi Manohar,
Clear explaination. Good one !!!
Best Regards,
Chandrasekhar Kalluru
Thanks π
Hello Manohar,
Very good representation of the document ! And honestly for me you have provided new way to look into the subject/prospect . Thanks for Sharing . Keep it up !
Regards
YN
Hi Yogesh,
I'm happy that I could figure out a new perspective on dimension tables and share it. Such encouragements make me feel that it was worth spending time on elaborating and documenting it. I request you to click on LIKE or RATE the document whenever you find any document useful, it awards the author with few points, and even though I do what I do for the joy of sharing a few points along with your kind words wouldn't hurt π
Cheers,
Manohar.D
Hi Manohar bhai,
Finally one of the SAP BW expert I have seen in my career has started sharing knowledge in SCN. Super explanation.
Regards,
Venkat
Hi Venkat,
Thank you for all those good words. I still maintain that I'm not an expert, but learning as much as I can and as best as I can π Hope you can use this method of dimension design when required.
Regards,
Manohar. D
Hi Venkat,
Even we are awaiting for your BW blogs since long time. Hopefully you are busy in HANA world and HANA community π
Best Regards
Venkat...
Hi Manohar,
Very nice document about performance effective cube design with detailed steps.
As metioned above by venky, superb explanation by SAP Expert.
Regards,
Venkata Naresh
Hi Venkat,
Thanks for those good words. Hope this helps in custom infocube designs. If you sort the dimension tables of a standard infocube then you can see a similar effect. And again I'm no expert.. π
Regards,
Manohar.D
Hi Manohar,
Very nice job.
ST13 can also help us identify the characteristics.
BR,
A
Very interesting document Manohar Delampady. Thanks for sharing!
Regards,
Carlos
Hi Manohar,
This is the one of the best documents I have ever read in this context.
A big thanks for sharing such a nice document. Great job. Keep it up π .
Regards,
Nitesh
Clear and Eloquent. This doc deserves the attention.
thanks Manohar
Nice one..
Hi Manohar,
Good work appreciated always... Tnx for sharing..
And the appreciation motivates one to look for an opportunity to work BETTER.. Its a vicious cycle, I say. π Thank you..
Hi Manohar,
Really a crystal clear explaination for Dimension design.. It is very much helpful for the people who are in learning Phase of SAP BW...
Great job... π π
Regards,
Prasad Mantri
I believe the post data load analysis approach is better reserved for those who have been developing dimension tables / infocubes for some time. Learners should stick to understanding the entity relationship and then developing the infocubes.
Cheers,
Manohar. D
Hi Manohar,
Excellent information on dimension design optimization, Thanks for sharing
this and hope many in future π .
Regards,
Venkat
nice one π
Hi Manphar,
Thanks a lot for the document. Got a crystal clear idea π .
Regards,
Malathi
Happy to know Malathi, please like / rate any document that you thought was good. It awards the author a few points and encourages to come up with more documents.
Thank you,
Manohar.D
Sure manohar.
Regards,
Malathi
This is great job! Thanks for sharing!
Regards,
Sheen
Thanks π !
Best Regards,
Mir
Really Helpful.
Very Nicely Put.
Regards,
Gaurav
Good one...
Thanks,
Kavitha
Nice Document!
Good info..!
Hi Manohar,
Really worth Information about Dimension Design. I learned alot π π
Keep posting like these blogs π
Regards,
Gokulkumar RD
Excellent !!!!!
Clear and crystal expalnation.....
Thanks,
Shyamala.V
Hi MD,
Great Job, explained too technically Bravo!!!
--SumanT
Good Explanation.Clean and Crystal clear
Hi Manohar,
Really this document has all clear information about Infocube design.
Thanks for sharing.
Somesh.
Hi Manohar,
First of all thank you so much for sharing this KT.
I am faced with a similar issue and following your document guidelines.
Would you like to suggest what can be done if two of the dimensions reaches 99% of fact table and 13 customer dimensions already created.
I thought of creating two new dimension and include that as a line item but I am unable to do that as 13 dimensions have already been reached.
Cheers Aparajit
Hi Aparajit,
Thank you for the encouragement. I must say you are stuck between a rock and a hard place here as you've already consumed all the dimension tables. The only way out of such scenarios is to evaluate if some of the characteristics are included in the infocube unnecessarily (not being used in the report,not being referred to in any other look up etc) and reducing one such dimension to accomodate the high dimension ratio characteristic. If that is not going to work, or if you have already checked that option out then you could split this one big cube into two cubes with fewer dimensions but one key characteristic for the data to make sense when you finally unite the two cubes data using a multiprovider. Please explore on creating two new cubes to replace this one big cube.
Hope you find a way, and share with us what helped you π
Good Luck,
Manohar. D