Welcome to the problem
As usual understanding the meaning and the scope of the question is key for the answer to this question, recently filed by a customer.
So, what is the customer complaining about? What does he mean with “partition pruning“?
It’s a feature of the query optimizer to figure out, which partitions of a table or an index it can be safely left aside, when running the query.
It reduces work and thereby makes queries faster.
Easy example for this
Think of a table containing sales information together with the date of the recorded sales transactions.
Now this table is partitioned by the month (say, column “0CALMONTH”) and you want to see the sales data of the the first quarter of 2011.
You would probably come up with a SQL statement like this:
SELECT 0CALMONTH as MONTH, SUM(0AMOUNT) as SUM_AMOUNT
With the information given, the optimizer knows that all relevant data can be found in the partitions that contain the data for 201101, 201102 and 201103 – all other partitions can be ignored.
Nice, isn’t it?
So, how can we know that such stuff is happening in our database?
The explain plan contains this information:
SQL> explain plan for
SQL> select * from table(dbms_xplan.display);
15 rows selected.
The Pstart/Pstop columns tell us precisely, which partitions are used for the single steps – and we see, that only 3 of them (8,9,10) are actually used.
Back to the customer
Ok, so my customer was telling me, that exactly THIS was NOT happening on his system.
Since we use partitioning in BW on Oracle for a long time very successfully, I asked for a reproduction example, which the customer swiftly provided.
Based on what the customer told me I should run a BW query in RSRT and activate the checkbox to display the execution plans for the SQL statements (interestingly this one is called “Display Runschedule” in BW 7.0 onwards…).
A first mistake (or “to know your checkboxes and save everybody’s time”)
When I first ran the query, this was the SQL against the E-fact table which I assumed to be the problematic one (I removed some uninteresting stuff to make reading a bit easier):
====> QUERY 1
As I saw (and as you see now) partition pruning was fully operational on this query (check the bold line and the PSTART/PSTOP columns!) – so what was the customer talking about?
Well, the tricky part with this BW query started via RSRT is, that the OLAP layer does run multiple SQL statements in parallel to reduce the total runtim of queries.
That was the reason why the customer initially told me to check the “No parallel processing” box in the Execute+Debug dialog of RSRT.
A hint that I simply overlooked at first.
Et voilá, the problem!
Once I re-ran the query with parallel execution disabled, I was able to see the execution plan the customer was complaining about:
===> QUERY 2
Here we can see what the customer was complaining about: Oracle decides to read through ALL partitions, although we’ve seen that the BW query clearly contained the information about the time dimension.
So why is this happening?
Oracle does nothing wrong here!
If you look closely, then you see that the second statement didn’t contain a selection criteria on the partitioning column.
Do we have a bug in the BW OLAP code here?
Would these be BW queries on standard infocubes we would have a problem here.
Especially since we have TWO queries against the SAME table!!!
Fact is: this is not a query on a standard infocube.
This is a query against a so called inventory or non-cummulative cube.
Inventory Infocubes are a special data design implemented in SAP BW that allow to analyze stock information. Since you cannot just add up the amount of stock over time, such information required special handling by the OLAP processor.
So, what is done in this case is to store reference stock information (“markers”) and in addition to these the relative stock movements. That way OLAP can take the marker numbers and calculate backwards the amount of stock that was present for the given reference point in time.
This is of course far too short as an explanation, so make sure to check out sap note #1548125 for a far better one.
The consequence of this data design is that the OLAP engine needs to read the data in two steps and it does this indeed.
That’s why wee see four SQL queries in this case (instead of the usual two SELECT statements against F- and E-facttable)
1. Read of movement data from F- and E-facttable
2. Read of “marker” data from F- and E-facttable
Due to the nature of the “marker” data, OLAP needs to catch up ALL of them, regardless of the time restriction.
Therefore, no WHERE condition for the time dimension is created in the second kind of statements.
Since the E-facttable in this case was partitioned by the time dimension, partition pruning would require to have such a restricting WHERE condition.
But as this is missing, no partition pruning can be done – which is correct in this case.
Let’s have a closer look to the statements and the execution plans.
Review query 1 above. This is one of the queries issued by OLAP for gathering all the movement data.
Please note the restrictions on the time dimensions, especially the explicit condition on “F”.”SID_0CALMONTH” which allows for the partition pruning to take place.
Also see, that OLAP here is reading the movement data, denoted by the
condition “DP”.”SID_0RECORDTP” = 0.
As we see, in the execution plan above, ORACLE uses the restriction
on the partition key and only reads a single partition.
Now, let’s check the query 2 again.
With this query OLAP reads the “marker” data (or “reference points”) and we can see this by the predicate “DP”.”SID_0RECORDTP” = 1.
Important here is to see, that there aren’t any restrictions on the
time dimension left (except the fiscal variant, for rather formal
reasons). So, there is no way for Oracle to avoid any partition of the
Room for improvement!?
Ok, so far we’ve already learned a lot about the way OLAP gets data from the database when dealing with NCUM cubes.
But, as you may have already guessed, there’s still more to it.
The marker information always represent the most current state of data. No movement is done after the marker information. So basically, if a query should return the stock amount from a day in the past, OLAP takes the marker and calculates backwards.
Another way to put this is to say: there cannot be any information more recent or with a ‘higher‘ value in the time dimension. And in fact, marker information are always stored with the maximum/infinitive date 31.12.9999.
Now, since the E-fact table can only be partitioned by the time dimension, it’s assured that the marker data will always end up being stored in the MAXVALUE partition.
So, in theory it would be possible to create a statement that delivers the very same data to OLAP and still allow for the partition pruning to take place. For that it would ‘just’ be necessary to add the somewhat artificical selection predicate “F”.”SID_0CALMONTH” = 99991231 to the query.
I write “in theory” because this feature is not present in the OLAP-DBIF layer right now. Of course we’re discussing this internally, but whether or not this is going to be realized is completely open (for many good and valid reasons).
Once again, a rather lengthly blog post about some rather not so simple topics, but I still hope that you’ve enjoyed reading it and that you’ve learned a bit or byte.