To Compress or not to compress
Many a time , we come across a lot of cubes which have not been compressed for a long time and have data in them which is quite aged…
We also have queries on them which perform badly. One immediate response is to compress the cube in the hope that it reduces query response times.
When should you compress :
1. Is your cube partitioned
If your cube is partitioned then it makes a lot of sense compressing the cube – this way the data is moved into the E Fact table which is partitioned.
2. Are there a lot of negative records that add up to zero.
Lets say you had a sales order with the following structure :
SO| Cust | Amount | Qty
1234 | ABC | 100 | 10
this then got cancelled leading to another record in the cube
1234 | ABC | -100 | 10
you now have two records in the cube which do not contribute anything but data volumes for your query – in this case a compression of the cube with zero elimination will remove this record – making your cube free from any records which net up to zero
3. If you have too many requests in the F Fact table –
It might be worthwhile to compress to reduce load times. This is because the F Fact table is partitioned by request – which means that each request is a separate partition. If you have more than 60 to 70 requests – then it is about 60 – 70 partitions in the cube making loading take more time because each request is a new partition. When you compress the cube ( even if the cube is not partitioned ) – all the data gets merged into one single table – this will make your indexes better – leading to better load performance and possibly better query performance.
what you should take care when partitioning :
1. Anticipate database logs – in a particular case – we compressed a cube with about 50 million records without any advance warning to the BASIS team and almost brought down the disaster recovery system because of the log volumes we generated.
Compression executes a merge statement and it is better advised to watch out for logs.
2. Do not assume that compression is going to improve query performance – it is one of the tools available to you to do so but does not necessarily increase performance or is it a substitute for bad design.
3. If required – compress the cube even if the cube is not partitioned and then repartition the cube if required.