Using a Bath Analogy to Understand OLAP System Growth for BW, BWoH, Native HANA
The following applies to BW, BWoH, Native HANA and any OLAP system that uses an LSA or LSA++ design to stage data at least once and then report on it.
A useful analogy to understanding system size and the Information Lifecycle Management (ILM) process as it impacts OLAP systems is to think of the OLAP system as a bath. There is some water in it, that’s your data and represents your current system size. The system has data flowing in from various source systems, so the taps of the bath are on. Perhaps the taps are just trickling, perhaps they’re full blast but either way if you do nothing the bath will eventually overflow (there is no overflow pipe here). So you take out the plug of the bath to represent how much data you’re able to dispose of.
This analogy is useful because it becomes clear that it is not the amount of water in the bath (current system size) or the bath size (maximum system size) that is really important. A bigger bath might buy time (and cost more in licensing and maintenance in meantime), but what is really important is how the inflow and outflow amounts compare now and in the future.
If the outflow exceeds inflow eventually the bath will empty. If inflow exceeds outflow eventually it will overflow. For sustainable system usage, you need inflow and outflow to be about the same. If someone sticks a fire hose over the edge of the bath and turns it on (perhaps there is a new rollout or a new source system and lots new transactions are arriving) then you’re going to need a bigger plughole.
Now, in reality with our OLAP system, we don’t just take data in from outside, of course. We also generate data inside the system itself. This comes from writing logs and from enhancing and replicating data as it moves through a typical LSA design. There isn’t really a good analogy for this with the bath, it is like water being generated inside the bath itself from the existing water. Perhaps as the water lands in the bath from the taps, it roils and bubbles and grows larger in volume.
In reality, the outflow is a bit more complicated too. We can delete data entirely (e.g. delete logs using housekeeping) and we can relocate data to another storage medium where we can still access it in some way. Depending on your OLAP system this external storage medium could use NLS, Dynamic Tiering or the Non-Active Data concept. Perhaps we can think of two plugholes, one going to the drain (deletion) and one draining off to another bath (another storage medium).
Pushing the analogy thus far, we can sketch out the inflows and outflows like this:
Green arrows are data inflows, red arrows are data outflows. Now we know we want inflow and outflow to be about the same, which is another way of saying that inflow divided by outflow should be approximately equal to one. Ideally inflow divided by outflow will be <1 as often as possible because the times when this is true is when the system is shrinking. From the above diagram we can produce the following equation. We’d like this equation to be true as often as possible:
This equation shows what you need to focus on to manage system size. Some observations can be made:
Minimising the data created internally (b) is a good idea. In typical LSA designs the data created internally (b) will greatly exceed the inbound data (a), as you’re likely to be replicating it at least two or three times. This is an important point. This means less staging and more virtualisation is desirable, which is of course one of the tenets of LSA++. This tells us that implementing LSA++ makes managing system size easier.
Maximising the volume of data deleted (d) is also a good idea. This means implementing good housekeeping of course (to address the logs written), but also making sure that transactional data is really being used in reports. If you dig into this enough, this leads to these questions:
- Is transactional data being used at all in reports? This should be easy enough to acertain from examining report usage.
- Is transactional data really needed at the level of granularity that it is stored at?
- Are all data subsets used? For example, are all currency types really in use?
- Is there any spurious data being stored that could never be reported on? You may be surprised how many data stores have blank fiscal periods, blank company codes, that have built up over time. These may never appear in any report, and could also be deleted.