While Oracle DBAs are used to be exposed to features like segment fragmentation (that is, although data is removed from tables/indexes these segments still allocate the same amount of space which won't be automatically released to the freespace) MaxDB users don't have to think about this.
In fact, MaxDB users cannot do a single thing to change the way the data is stored. Or could they?
They could. Just by sorting the data before inserting them into the table, quite some space can be saved.
I've created two sets of identical data. One of them will get loaded with data in a sorted order (SORTLOADED) the other one will get loaded in a random order (UNSORTLOADED). Anyhow, both tables contain the very same data afterwards.
The table structure is simple and the same for both tables:
|
|
Next I generate and load the test data.
Now this is actually a part that is not that easy with MaxDB. In Oracle you can do things like "CREATE TABLE AS SELECT ... ORDER BY ..." and thereby create arbitrarily sorted data sets.
MaxDB does not support sorts for subselects (as they are usually meaningless in relational contexts). Therefore I needed to create the data sets somewhere else and I went for MS EXCEL. Since creating the test data is not that interesting in itself, I'll skip this for now - if you're interested in this, let me know.
So, now each table contains the exact very same rows, right?
Of course we can doublecheck this:
|
Now let's check how equal the two tables really are in terms of storage. Let's use what we've learned in "MaxDB space oddities" and run:
|
Hmm... the unsorted load takes eight pages more in this example, that's 16% more space usage.
Let's check the storage of these tables a bit more detailed.
I use a systemtable called ‘tablestatistics' here, but I've to give a warning on it: whenever you select data from this table, the data is gathered from the tables you look up. For big tables this can run very very long and may block other accesses meanwhile.
Therefore it's better to use the ‘tablestoragedetails' resp. ‘indexstoragedetails' instead as these both tables deliver their data based on samples.
For the easier comparison I reformatted the result a bit:
| SORTLOADED | UNSORTLOADED | Difference |
Rows | 10000 | 10000 | 0 |
Index levels | 1 | 1 | 0 |
Index pages | 1 | 1 | 0 |
Leaf pages | 40 | 48 | 8 |
Min separator length | 3 | 3 | 0 |
Max separator length | 4 | 4 | 0 |
Avg separator length | 3 | 3 | 0 |
Min key length | 3 | 3 | 0 |
Max key length | 4 | 4 | 0 |
Avg key length | 3 | 3 | 0 |
Min row length | 26 | 26 | 0 |
Max row length | 27 | 27 | 0 |
Avg row length | 26 | 26 | 0 |
Min rows per page | 123 | 135 | 12 |
Max rows per page | 260 | 253 | -7 |
Avg rows per page | 250 | 208 | -42 |
Space used in all pages (%) | 95 | 80 | -15 |
Space used in index pages (%) | 9 | 11 | 2 |
Space used in index pages (%) max | 9 | 11 | 2 |
Space used in index pages (%) min | 9 | 11 | 2 |
Space used in leaf pages (%) | 97 | 81 | -16 |
Space used in leaf pages (%) max | 99 | 99 | 0 |
Space used in leaf pages (%) min | 48 | 53 | 5 |
Space used in root page (%) | 9 | 11 | 2 |
Used pages | 41 | 49 | 8 |
Ok, so what do we see here?
Obviously the difference in the total number of pages is due to the difference in the number of 'Leaf pages' - those pages that contain the records completely.
Looking to the 'Space used in leaf pages (%)' row we immediately spot that for UNSORTLOADED there is 16% less space usage per block compared to the SORTLOADED table.
Of course if you don't fill up the pages completely you will need more of them to save the same amount of data. Therefore we find 42 rows less on average in the pages of UNSORTLOADED than we do in the pages of SORTLOADED.
To answer this question, we've to remember how B*-Trees are build up.
I'm not going to reinterate on B*-Trees in general, but just like to point out the important facts about them, that are necessary here to explain the effect:
To fullfill these requirements, an insert to a B*-Tree usually is done like this:
Due to the mentioned features of B*-Trees there is one case where no rebalancing is done, because it's never necessary: inserts in ascending key order.
When the key value increases with each insert, than a rebalancing with the right neighbour page is not possible - there simply is no further right neighbour. And since the data is correctly sorted anyhow, all what's left to do is to update the index pages.
That way a lot of time can be saved and the pages get filled up much denser than it would be the case with random inserts.
Therefore MaxDB developers decided to implement a kind of 'shortcut' - whenever you insert a row with a 'higher' or 'more right' key than the currently 'highest' or 'rightmost' key in the table, simple add the row and don't do balancing.
This feature is also present in other DBMS, e.g. Oracle. For Oracle's B*-Tree-Indexes there are two types of page/block-splits: the ‘usual' 50:50 block split, which is just the standard index balancing and the 90:10 block split.
By the way: inserting the sorted data in descending order also leads to the 'bigger' table, since there is no special handling of sequential descending inserts.
If you ever did a system copy with R3Load and checked the table sizes in the target system against that in the source system, then you might have asked yourself: did I loose any data? The tables are smaller than in the source system! If you then are aware of the fact that R3load reads and writes it's data sorted by the primary key, then you now know the answer: all data is there, just saved in a denser way.
It's also worth noting that such dense tables need more balancing operations when you start inserting data into the middle of the table. Since there is not much freespace in the pages that could be used for the rebalancing, new pages need be added to the B*-Tree and that causes the rebalancing of whole subtrees. For that reason I would not say, that having such dense B*-Trees is actually something to wish for in a OLTP system.
Hope you liked this blog as the first "Space oddities" blog from myself.
Lars
p.s. While researching for this I came across another blog entry called "A space oddity": http://richardfoote.wordpress.com/2008/01/18/introduction-to-reverse-key-indexes-part-iii-a-space-od... by Richard Foote, who seems to be really deep into Oracle index internals.
So don't leave his blog unvisited.
Seems that this title is somehow popular for B*-Tree space management blogs.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
36 | |
25 | |
17 | |
13 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 |