Modelling of cube to avoid data load and Query performance Issues.
There are many simple things which we should take care while designing a cube to avoid performance issues.
1) Size of dimension table should not be greater than 20 % of fact table.
We would have read these at many places but we never think of it while designing a cube and even a small mistake can lead to dimension table size exceeding 20% of fact table.
Solution to this problem.
We should be aware of all the fields that have to be included in a particular dimension and their relationships. Many times we try to group the same type of characteristics together for example sales document and line item number.
Size of dimension table would be very large as different value of sales doc number and item would every time create a new entry in dimension table (size would exceed 20% easily).
A better approach would be to have sales doc number as line item dimension in one dimension and item in other dimension.If we don’t take this feature into account we would face performance issues during data load time. Conversion of characteristic value into SID would take a lot of time.
If we have large amount of data coming in daily to the cube, then the load would continue for hours if we don’t design dimesnsion tables properly.
We can use program SAP_INFOCUBE_DESIGN to find out which dimensions have size more than 20%.Dimension tables which have size more than 20% of fact table would be marked in red and we shall then re-design these red marked dimension.
Many times we face scenario when we are not sure about the relationship between characteristics. The best way in this case is to logically arrange the characteristics in dimension and then load about a minimum of 200,000 records and then check the size of each dimension tables. If the size of any imension table is large(more than 20%) then we shall try to identify the relationship between the characteristic from the data in the cube and then re-design the cube.
We can load fewer amounts of records also but then we would not get a clear picture of dimension table sizes.
2) Use of line item dimension.
In case of any characteristic which we think would have a large number of distinct value and would make the dimension table large, one should always go for line item dimension (ex.sales doc number).
Whenever we make any dimension as line item, dimension tables is not generated, only SID table is generated for that dimension which improves the performance during data load time.
3) Use of high Cardinality
We have a dimension which has many characteristics and the size of this dimension table is more than 20% then we go for high cardinality.
We should be always careful before making any dimension high cardinality. We should first of all look whether we can re-design the cube and arrange the characteristics in such a way that size of dimension table decreases.
When we think we don’t have any other option left and cube can’t be re-designed effectively then we shall go for high cardinality.
Query performance decreases when we go for high cardinality and if we are doing a drill down on any of the characteristics involved in high cardinality dimension, it would take a bit of time.
Performance of data load would improve as dimension table would not be generated but it would degrade query performance. It’s a kind of catch 22 situation where we would loose some and gain some.
Making a dimension high cardinality should always be the last option.