MaxDB - Space oddities, take II

lbreddemann · ‎02-03-2009

After the tremendous (and very unlikely) success of the first MaxDB: Space oddities blog, I immediately decided: this must be the voice of the market asking for more.

So here you get it.

While Oracle DBAs are used to be exposed to features like segment fragmentation (that is, although data is removed from tables/indexes these segments still allocate the same amount of space which won't be automatically released to the freespace) MaxDB users don't have to think about this.

In fact, MaxDB users cannot do a single thing to change the way the data is stored. Or could they?

They could. Just by sorting the data before inserting them into the table, quite some space can be saved.

Don't believe me? Behold and believe!

I've created two sets of identical data. One of them will get loaded with data in a sorted order (SORTLOADED) the other one will get loaded in a random order (UNSORTLOADED). Anyhow, both tables contain the very same data afterwards.

The table structure is simple and the same for both tables:

create table "LARS"."SORTLOADED"(
               "KCOL1" INTEGER not null,
               "COL2" VARCHAR (10) ASCII,
               "COL3" VARCHAR (30) ASCII,
primary key ("KCOL1"))

create table "LARS"."UNSORTLOADED"(
               "KCOL1" INTEGER not null,
               "COL2" VARCHAR (10) ASCII,
               "COL3" VARCHAR (30) ASCII,
primary key ("KCOL1"))

Next I generate and load the test data.

Now this is actually a part that is not that easy with MaxDB. In Oracle you can do things like "CREATE TABLE AS SELECT ... ORDER BY ..." and thereby create arbitrarily sorted data sets.
MaxDB does not support sorts for subselects (as they are usually meaningless in relational contexts). Therefore I needed to create the data sets somewhere else and I went for MS EXCEL. Since creating the test data is not that interesting in itself, I'll skip this for now - if you're interested in this, let me know.

The unequal twin tables

So, now each table contains the exact very same rows, right?

Of course we can doublecheck this:

select *
from
      sortloaded sl
where
       not exists (select 'X'
                   from
                          unsortloaded usl
                   where 
                          sl.kcol1= usl.kcol1)
Statement 'select * from sortloaded...' successfully executed in 1.242 seconds. 
  - No result

Now let's check how equal the two tables really are in terms of storage. Let's use what we've learned in "MaxDB space oddities" and run:

select
      tablename,
      entrycount,
      trunc(treeindexsize/8) TREE_PAGES,
      trunc(treeleavessize/8) LEAF_PAGES
from 
      tables , files 
where 
      schemaname=user 
and fileid=tableid
and tablename in   ('SORTLOADED', 'UNSORTLOADED')

TABLENAME      ENTRYCOUNT      TREE_PAGES      LEAF_PAGES
----------------------------------------------------------
SORTLOADED          10000               1              40
UNSORTLOADED        10000               1              48

Hmm... the unsorted load takes eight pages more in this example, that's 16% more space usage.

What is going on here?

Let's check the storage of these tables a bit more detailed.
I use a systemtable called ‘tablestatistics' here, but I've to give a warning on it: whenever you select data from this table, the data is gathered from the tables you look up. For big tables this can run very very long and may block other accesses meanwhile.

Therefore it's better to use the ‘tablestoragedetails' resp. ‘indexstoragedetails' instead as these both tables deliver their data based on samples.

For the easier comparison I reformatted the result a bit:

	SORTLOADED	UNSORTLOADED	Difference
Rows	10000	10000	0
Index levels	1	1	0
Index pages	1	1	0
Leaf pages	40	48	8
Min separator length	3	3	0
Max separator length	4	4	0
Avg separator length	3	3	0
Min key length	3	3	0
Max key length	4	4	0
Avg key length	3	3	0
Min row length	26	26	0
Max row length	27	27	0
Avg row length	26	26	0
Min rows per page	123	135	12
Max rows per page	260	253	-7
Avg rows per page	250	208	-42
Space used in all pages (%)	95	80	-15
Space used in index pages (%)	9	11	2
Space used in index pages (%) max	9	11	2
Space used in index pages (%) min	9	11	2
Space used in leaf pages (%)	97	81	-16
Space used in leaf pages (%) max	99	99	0
Space used in leaf pages (%) min	48	53	5
Space used in root page (%)	9	11	2
Used pages	41	49	8

Ok, so what do we see here?

Obviously the difference in the total number of pages is due to the difference in the number of 'Leaf pages' - those pages that contain the records completely.
Looking to the 'Space used in leaf pages (%)' row we immediately spot that for UNSORTLOADED there is 16% less space usage per block compared to the SORTLOADED table.
Of course if you don't fill up the pages completely you will need more of them to save the same amount of data. Therefore we find 42 rows less on average in the pages of UNSORTLOADED than we do in the pages of SORTLOADED.

But why is it like this?

To answer this question, we've to remember how B*-Trees are build up.

I'm not going to reinterate on B*-Trees in general, but just like to point out the important facts about them, that are necessary here to explain the effect:

B*Trees are always balanced, that is: all leaf pages have the same distance to the root page.
The filling level of all pages is approximately equal
On leaf level, the pages are connected in ascending key order, e.g. the lowest value is the far leftmost, while the highest value is the far rightmost.

To fullfill these requirements, an insert to a B*-Tree usually is done like this:

Find the page to insert the record to
Check whether the filling of the current page will fit the filling of its neigbours after the INSERT has been done.
If not, rebalance this part of the B*-Tree so that the equal filling condition is met.
This rebalancing works by shifting rows from one page to the other and adapting index page above accordingly.
Obviously this can only work, if there are any rows that could be moved, e.g. because the new value is ‘in-between' the two pages.

Due to the mentioned features of B*-Trees there is one case where no rebalancing is done, because it's never necessary: inserts in ascending key order.

When the key value increases with each insert, than a rebalancing with the right neighbour page is not possible - there simply is no further right neighbour. And since the data is correctly sorted anyhow, all what's left to do is to update the index pages.

That way a lot of time can be saved and the pages get filled up much denser than it would be the case with random inserts.
Therefore MaxDB developers decided to implement a kind of 'shortcut' - whenever you insert a row with a 'higher' or 'more right' key than the currently 'highest' or 'rightmost' key in the table, simple add the row and don't do balancing.

This feature is also present in other DBMS, e.g. Oracle. For Oracle's B*-Tree-Indexes there are two types of page/block-splits: the ‘usual' 50:50 block split, which is just the standard index balancing and the 90:10 block split.

By the way: inserting the sorted data in descending order also leads to the 'bigger' table, since there is no special handling of sequential descending inserts.

What's this knowledge useful for?

If you ever did a system copy with R3Load and checked the table sizes in the target system against that in the source system, then you might have asked yourself: did I loose any data? The tables are smaller than in the source system! If you then are aware of the fact that R3load reads and writes it's data sorted by the primary key, then you now know the answer: all data is there, just saved in a denser way.

It's also worth noting that such dense tables need more balancing operations when you start inserting data into the middle of the table. Since there is not much freespace in the pages that could be used for the rebalancing, new pages need be added to the B*-Tree and that causes the rebalancing of whole subtrees. For that reason I would not say, that having such dense B*-Trees is actually something to wish for in a OLTP system.

Hope you liked this blog as the first "Space oddities" blog from myself.

Lars

p.s. While researching for this I came across another blog entry called "A space oddity": http://richardfoote.wordpress.com/2008/01/18/introduction-to-reverse-key-indexes-part-iii-a-space-od... by Richard Foote, who seems to be really deep into Oracle index internals.
So don't leave his blog unvisited.
Seems that this title is somehow popular for B*-Tree space management blogs.