Skip to Content
   
   

After the tremendous (and very unlikely) success of the first MaxDB: Space oddities blog, I immediately decided: this must be the voice of the market asking for more.

So here you get it.

While Oracle DBAs are used to be exposed to features like segment fragmentation (that is, although data is removed from tables/indexes these segments still allocate the same amount of space which won’t be automatically released to the freespace) MaxDB users don’t have to think about this.

In fact, MaxDB users cannot do a single thing to change the way the data is stored. Or could they?

They could. Just by sorting the data before inserting them into the table, quite some space can be saved.

Don’t believe me? Behold and believe!

I’ve created two sets of identical data. One of them will get loaded with data in a sorted order (SORTLOADED) the other one will get loaded in a random order (UNSORTLOADED). Anyhow, both tables contain the very same data  afterwards.

The table structure is simple and the same for both tables:

create table "LARS"."SORTLOADED"(
               "KCOL1" INTEGER not null,
               "COL2" VARCHAR (10) ASCII,
               "COL3" VARCHAR (30) ASCII,
primary key ("KCOL1"))
create table "LARS"."UNSORTLOADED"(
               "KCOL1" INTEGER not null,
               "COL2" VARCHAR (10) ASCII,
               "COL3" VARCHAR (30) ASCII,
primary key ("KCOL1"))

Next I generate and load the test data.

Now this is actually a part that is not that easy with MaxDB. In Oracle you can do things like “CREATE TABLE AS SELECT … ORDER BY …” and thereby create arbitrarily sorted data sets.
MaxDB does not support sorts for subselects (as they are usually meaningless in relational contexts). Therefore I needed to create the data sets somewhere else and I went for MS EXCEL. Since creating the test data is not that interesting in itself, I’ll skip this for now – if you’re interested in this, let me know.

The unequal twin tables

So, now each table contains the exact very same rows, right?

Of course we can doublecheck this:

select *
from
      sortloaded sl
where
       not exists (select 'X'
                   from
                          unsortloaded usl
                   where
                          sl.kcol1= usl.kcol1)
Statement 'select * from sortloaded...' successfully executed in 1.242 seconds. 
- No result

Now let’s check how equal the two tables really are in terms of storage. Let’s use what we’ve learned in “MaxDB space oddities” and run:

select
      tablename,
      entrycount,
      trunc(treeindexsize/8) TREE_PAGES,
      trunc(treeleavessize/8) LEAF_PAGES
from
      tables , files
where
      schemaname=user
and fileid=tableid
and tablename in   ('SORTLOADED', 'UNSORTLOADED')
TABLENAME      ENTRYCOUNT      TREE_PAGES      LEAF_PAGES
----------------------------------------------------------
SORTLOADED          10000               1              40
UNSORTLOADED        10000               1              48

Hmm… the unsorted load takes eight pages more in this example, that’s 16% more space usage.

What is going on here?

Let’s check the storage of these tables a bit more detailed.
I use a systemtable called ‘tablestatistics’ here, but I’ve to give a warning on it: whenever you select data from this table, the data is gathered from the tables you look up. For big tables this can run very very long and may block other accesses meanwhile.

Therefore it’s better to use the ‘tablestoragedetails’ resp. ‘indexstoragedetails’ instead as these both tables deliver their data based on samples.

For the easier comparison I reformatted the result a bit:

 

SORTLOADED

UNSORTLOADED

Difference

Rows

10000

10000

0

Index levels

1

1

0

Index pages

1

1

0

Leaf  pages

40

48

8

Min separator length

3

3

0

Max separator length

4

4

0

Avg separator length

3

3

0

Min key length

3

3

0

Max key length

4

4

0

Avg key length

3

3

0

Min row length

26

26

0

Max row length

27

27

0

Avg row length

26

26

0

Min rows per page

123

135

12

Max rows per page

260

253

-7

Avg rows per page

250

208

-42

Space used in all   pages (%)

95

80

-15

Space used in index pages (%)

9

11

2

Space used in index pages (%) max

9

11

2

Space used in index pages (%) min

9

11

2

Space used in leaf  pages (%)

97

81

-16

Space used in leaf  pages (%) max

99

99

0

Space used in leaf  pages (%) min

48

53

5

Space used in root  page  (%)

9

11

2

Used  pages

41

49

8

Ok, so what do we see here?

Obviously the difference in the total number of pages is due to the difference in the number of ‘Leaf pages’ – those pages that contain the records completely.
Looking to the ‘Space used in leaf pages (%)’ row we immediately spot that for UNSORTLOADED there is 16% less space usage per block compared to the SORTLOADED table.
Of course if you don’t fill up the pages completely you will need more of them to save the same amount of data. Therefore we find 42 rows less on average in the pages of UNSORTLOADED than we do in the pages of SORTLOADED.

But why is it like this?

To answer this question, we’ve to remember how B*-Trees are build up.

I’m not going to reinterate on B*-Trees in general, but just like to point out the important facts about them, that are necessary here to explain the effect:

  • B*Trees are always balanced, that is: all leaf pages have the same distance to the root page.
  • The filling level of all pages is approximately equal
  • On leaf level, the pages are connected in ascending key order, e.g. the lowest value is the far leftmost, while the highest value is the far rightmost.

To fullfill these requirements, an insert to a B*-Tree usually is done like this:

  1. Find the page to insert the record to
  2. Check whether the filling of the current page will fit the filling of its neigbours after the INSERT has been done.
    If not, rebalance this part of the B*-Tree so that the equal filling condition is met.
    This rebalancing works by shifting rows from one page to the other and adapting index page above accordingly.
    Obviously this can only work, if there are any rows that could be moved, e.g. because the new value is ‘in-between’ the two pages.

Due to the mentioned features of  B*-Trees there is one case where no rebalancing is done, because it’s never necessary: inserts in ascending key order.

When the key value increases with each insert, than a rebalancing with the right neighbour page is not possible – there simply is no further right neighbour. And since the data is correctly sorted anyhow, all what’s left to do is to update the index pages.

That way a lot of time can be saved and the pages get filled up much denser than it would be the case with random inserts.
Therefore MaxDB developers decided to implement a kind of ‘shortcut’ – whenever you insert a row with a ‘higher’ or ‘more right’ key than the currently ‘highest’ or ‘rightmost’ key in the table, simple add the row and don’t do balancing.

This feature is also present in other DBMS, e.g. Oracle. For Oracle’s B*-Tree-Indexes there are two types of page/block-splits: the ‘usual’ 50:50 block split, which is just the standard index balancing and the 90:10 block split.

By the way: inserting the sorted data in descending order also leads to the ‘bigger’ table, since there is no special handling of sequential descending inserts.

What’s this knowledge useful for?

If you ever did a system copy with R3Load and checked the table sizes in the target system against that in the source system, then you might have asked yourself: did I loose any data? The tables are smaller than in the source system! If you then are aware of the fact that R3load reads and writes it’s data sorted by the primary key, then you now know the answer: all data is there, just saved in a denser way.

It’s also worth noting that such dense tables need more balancing operations when you start inserting data into the middle of the table. Since there is not much freespace in the pages that could be used for the rebalancing, new pages need be added to the B*-Tree and that causes the rebalancing of whole subtrees. For that reason I would not say, that having such dense B*-Trees is actually something to wish for in a OLTP system.

Hope you liked this blog as the first “Space oddities” blog from myself.

Lars

p.s. While researching for this I came across another blog entry called “A space oddity”: http://richardfoote.wordpress.com/2008/01/18/introduction-to-reverse-key-indexes-part-iii-a-space-oddity/ by Richard Foote, who seems to be really deep into Oracle index internals.
So don’t leave his blog unvisited.
Seems that this title is somehow popular for B*-Tree space management blogs.

To report this post you need to login first.

3 Comments

You must be Logged on to comment or reply to a post.

  1. Stefan Koehler
    Hello Lars,
    a really nice blog post again. I really like to read something about MaxDB internals. As you maybe know i am an oracle guy so looking beyond one’s own nose is always good.

    Everytime i read some internals about other RDBMs i try compare the content with oracle to get a better understand – so in your blog i am a little bit suprised about one part.

    > Due to the mentioned features of  B*-Trees there is one case where no rebalancing is done, because it’s never necessary: inserts in ascending key order.

    Oracle has PCTFREE on creating indexes to keep some free space within the leaf blocks for further data. If i understand you correct MaxDB does not have something like that, if you insert data sorted you have a “full” leaf block. The mentioned block splitting features on oracle (50:50 or 90:10) occurs on full leaf blocks and the kind of splitting the block (90:10 is happening on the most right end of the index).

    So i can imagine that you get in trouble if you want to insert data into that full leaf blocks after the load procedure. In this case you get a massive block splitting or does MaxDB have some features to avoid this?

    Keep on going 🙂

    Regards
    Stefan

    (0) 
    1. Stefan Koehler
      I read my post again and find one part that can lead to confusion:
      > If i understand you correct MaxDB does not have something like that, if you insert data sorted you have a “full” leaf block

      Please replace my words “if you insert data sorted” with “if you create an index”.

      All i want to know is ,if the R3load procedure on MaxDB is the same way as on oracle (creating table – inserting data – creating indexes) and if MaxDB have something like PCTFREE for creating indexes to avoid a massive block splitting.

      If the procedure is inserting the data sorted after creating the table and the indexes then i can imagine that you will get the same problem as you have no PCTFREE on creating indexes.

      Regards
      Stefan

      (0) 
      1. Lars Breddemann Post author
        Hi Stefan,
        thanks for the nice comment. Of course I know that you’re in Oracle – I guess we already had the pleasure of working together on some OSS messages, from different sides though.

        Concerning your remark: sure there will be one rather heavy rebalancing activity as soon as you start to insert new values into the ‘middle’ of your data.
        To really get ‘in trouble’ here it would be necessary to have multiple parallel inserts of several ‘middle’ values that would require parallel rebalancing of several subtrees.
        Actually it would be possible to simulate such a usage pattern at will, but afaik there aren’t much scenarios where such things happen.

        Besides: when trying to speed R3Loads up e.g. by running parallel load-Processes you don’t even get the dense B*-Trees – just like when you run parallel inserts to Oracle indexes.

        So – looking forward to read something from you again as well.

        Best regards,
        Lars

        (0) 

Leave a Reply