Skip to Content
From time to time I start to believe that there are certain problems that would not come up anymore, once the topic is documented well enough.

My colleagues from the development know that too well as I’m asking for documentation and notes all the time for everything.
However, written documentation does help nothing if it’s not read…

BACK IN TIME

Last week I got this message and I believe it will be interesting for many MaxDB users “out there”.

Customer:

“Hello SAP,

The Server XYZ has been rebooted yesterday morning.
After the reboot I started SAP, and everything seemed ok.
Unfortunately, the system is now in a state that it was two weeks ago, when a client copy was done.
[…]
The system administrator has looked on OS-level and can’t find anything wrong.
We need to know what could have happened and why there is a SAP system with a database which is two weeks old?
[…]

Best regards, …”

Ok, what’s that about?

The server got rebooted, and now the database is back at a previous state?

When I read the description of the customer I already suspected what turned out to be the actual reason for the loss of data.

To verify my assumption I only needed to check the DBM.PRT file.

This is what I found:

[...]
command db_admin -f
command auto_extend show
command auto_update_statistics show
command util_execute GET RESTART VERSION
command auto_extend show
command auto_update_statistics show
command util_execute GET RESTART VERSION
command util_execute SET LOG WRITER OFF                  <<<<<< !!!!
command util_execute GET RESTART VERSION
command auto_extend show
command auto_update_statistics show
[...]

 

There is was: the log writer of the database had been turned off.
So why is this something to note?

The short answer is:
As soon as you disable the log writer, no automatic savepoints are written anymore.

If you now ask yourself “Wait a minute – what have savepoints to do with the log writer and why does this cause a huge loss of data?” then you may want to read on.

THE CONVERTER – KEEPER OF THE DATABASE STATES… A LITTLE EXCURSE…

Ok, the first thing to understand here is the way MaxDB stores and retrieves data to and from the disks.

Unlike most other databases, MaxDB does NOT store data at s specific physical location.
Instead a row is stored on a logical page and that logical page changes its physical location in the data volumes every time the page is changed and written to disk.
This concept is called “COPY-ON-WRITE”.

One advantage of this is that by writing out data to the disks MaxDB automatically can even out the filling of the data volumes and thereby guarantees an equal distribution of I/O over all data volumes.

Of course something needs to keep track of this so that the database can actually find the correct location of each page. This something is the so called converter.

You put a logical page number into it and get a physical block location out (or vice versa).
Basically you could say when a page is in the converter, the database knows it – otherwise it does not know it.

Seen from that angle, the converter holds the state of the database at all times.

“But, wait a minute, where is the information of the converter stored?”

Good question!
In fact it’s the very key point of the customer’s problem here, so pay attention! 😉

Since the converter is so utterly important to the database (without it all the pages in the data volumes are just nonsense), MaxDB needs to save it regularly, so that it’s there the next time the database is started.
Therefore, at the end of each savepoint MaxDB writes down the converter into the data files to places that are currently not allocated by the actual user data or the old converter.
These places in turn are again spread dynamically over all data volumes.

STILL AWAKE?

Did you notice something?

We have data volumes full of pages that can only be brought into context and meaning by the converter. And this converter itself is also stored in the same cryptic way in the data volumes.

“How the heck can MaxDB figure out which pages belong to the converter when it starts up?”

Again, good question!
The trick here is the data structure used by the converter.
It’s a B-tree, so all we need to have is the root-page location to find all other pages that belong to the converter.

Unlike usual data pages, converter pages on disk do have pointers to physical locations to the other converter pages, enabling MaxDB to find and read all converter pages before having the converter available for the mapping of logical to physical pages.

Now, the last missing link, the one to the root page of the converter comes in:

The restart record.

This is a ‘magic’ data structure that is always located in the first block of the first data volume.
In that restart record, the link to the converter root page is stored as a combined volume/offset-information.

(Just for those super-techies among you readers: it’s the crConvRootBA value you see in x_diagnose.
It contains the information about the volume in the lower-byte and the offset in the upper bytes. E.g. crConvRootBA: 1164033 = 0x11C301 => Offset 0x11C3, Volume 0x01.)

Fine, to summarize:

  • We have a data structure that brings meaning the randomly scattered pages in the data volumes called CONVERTER.
  • MaxDB does only know the pages that are in the CONVERTER, since for user data only logical page numbers are used to reference between pages.
  • We find this converter during the restart by following the link from the restart record and the B-Tree links of the CONVERTER itself.

So whenever the converter is written down to the data volumes, this link to the converter root needs to be updated. The restart records have to point to the new converter root.
In fact this is done as the very last action during the savepoint.
If this fails, the old converter will be found the next time the database starts up.

By the way: this is precisely the way that SNAPSHOTS work in MaxDB.
We simply store multiple converters and protect the pages that belong to it from being overwritten.
When we then want to ‘restore’ one of the snapshots all we’ve to do is to change the restart record and load the converter by restarting the database.

UPPS!

Yes, here we have it.

During the month of work where the database was running, there seem to have been no SAVEPOINT.
How can that be?

Isn’t a SAVEPOINT expected to happen AT LEAST every 600 seconds (or in whatever interval is configured by the _RESTART_TIME parameter)?
Indeed it should, but for the SAVEPOINT to be actually triggered based on this timing, another condition has to be met:
There have to be some (5000) LOG-IO-Actions taken place since the last savepoint.

DOUBLE UUPS!

By disabling the log writer, the customer effectively turned off the automatic writing of savepoints!

Of course he could have triggered savepoints manually or by taking backups, creating indexes or by just stopping the database via a DB_OFFLINE.
All these actions would have written a savepoint and thereby saving the precious converter information in the data area, available for the next restart.

Unfortunately the customer decided, that the server needed a reboot and just ran “shutdown -r now“.
This command of course does know nothing about MaxDB or DB_OFFLINE but just KILLS all still running processes.
As a consequence of this, the restart record of course still pointed to the converter root of the converter that was written to disk a month ago.
It’s no wonder, that the database occurred as “of a month ago” when it was restarted afterwards.

The outcome of this was that the customer simply lost a month worth of work and there was no technical option to rescue his data.

WHAT A BAD MALICIOUS FUNCTION IS THAT?

One could of course argue that such a function should have no place in a DBMS, because one of the main functions of DBMS is to keep the users data safe.
Well this feature is in fact not meant for normal operations.

It’s a benchmark feature that allows certain performance tests without the effect of log writing. Also, there are specific situations where you can greatly benefit from this function.
One example is the loading of data via R3Trans during the installation of a system.
Of course the R3Trans application is aware of the effects that switching off the log writer has and triggers savepoints manually.

All in all it’s just another feature that can be used correctly or the wrong way.
And unfortunately the user this time used it the extremely wrong way.

The only morale here is: understand what a command or feature does before using it!
Read the documentation and try things out beforehand!

Links

Important concepts like the CONVERTER are of course explained and documented, so here is a list of links to the relevant resources (for those of you that still cannot get enough of tech-talking):

No-Reorganization Principle: Data Storage Without I/O Bottlenecks 

Documentation: Savepoint

Documentation: Converter

SAP support note #869267 – FAQ: MaxDB LOG area, 35 + 36

 

A LITTLE ADVERTISING

I was asked to blog about the TechEd Session I’m going to deliver, so here it is: If you’re not already booked for October 27-29 this year and you happen to stay in Vienna/Austria and you keep asking yourself what to do … then get your ticket for SAP TechED 2009 and make sure to attend my MaxDB Session

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Stefan Koehler
    Hello Lars,
    nice blog (even if i have no idea of MaxDB :-P).

    But please let me just ask one question:
    If i understand you correct MaxDB writes out the pages in a “rotating” way (round robin??) so that there is an equal distribution of I/O over all data volumes.
    If this happens and no save point is performed – there is always the old converter list on disk .. right?

    Now if i shutdown the database without performing a save point .. the link list on disk is still the old one and points the way that it was by the last startup.
    But the pages in the volumes have changed, because of MaxDB writes out the data in a “rotating” way. So i don’t have the old data .. i have corrputed data.

    I think i am totally wrong, because of i think that this can’t happen in a database ;-))

    Regards
    Stefan

    (0) 
    1. Lars Breddemann Post author
      Hi Stefan,

      nice to know that you’re still peaking over the Oracle wall …

      The answer to your question is:
      MaxDB always writes the pages only to positions that are available.
      Since changed pages will only be put to the freepage list, AFTER the savepoint occured, no data will be overwritten.

      Think of it as if we freeze all pages on the data volumes from savepoint to savepoint.
      And, yes, by that you can artificially construct situations where there are many pages released and many new pages requested and the database runs into a DBFULL situation by that.
      However this happens rather seldom and luckily we trigger savepoints more frequently when the data area fills up…

      best regards,
      Lars

      (0) 
      1. Stefan Koehler
        Hello Lars,
        thanks for the quick response.
        The Oracle wall is very high, but from time to time i can jump high enough to look over it 😉

        > Think of it as if we freeze all pages on the data volumes from savepoint to savepoint.
        Ok in this case it is clear 🙂

        *ThumbsUp*

        Regards
        Stefan

        (0) 
  2. Bernd Eckenfels
    Good text, explains most of the details of MaxDB 🙂

    One thing: shutdown -r now is supposed to go through the normal sequence of shutting down the OS which includes calling the stop scripts of the runlevel. So if the DB is setup properly it would savepoint. (now means no delay after notifying the user, but it does not mean to skip the scripts – at least not on Linux. “shutdown -n” or “reboot -f” might be a killer as well as a corrupt utmp file.)

    Of course a crash or an forced reboot would have caused the situation you described.

    (0) 
    1. Lars Breddemann Post author
      Hello Bernd,

      ok, you got me on that detail 🙂
      In fact the customer just told me that they rebooted the server.
      From the knldiag.err file it was clear that no clean shutdown happened, since no shutdown entry was present – just two consecutive startup messages with a timestamp gap of over one month…

      regards,
      Lars

      (0) 
  3. Bala Prabahar
    Hello Lars,

    Thanks for this blog. One question:

    Other databases use “DB recovery” step when it suspects that the DB didn’t go down gracefully last time. And it uses online redo logs (sorry I use oracle example) to recover the DB to a last known consistent state even when DB was operating in NOARCHIVELOG mode.
    From your blog, it appears that MaxDB stores all information necessary to do a savepoint is stored in the Memory and not in “online logs”. This is the only reason I can think of as to why MaxDB can’t do a savepoint after restarting from a abnormal shutdown.

    Thanks,
    Bala Prabahar

    (0) 
    1. Lars Breddemann Post author
      Hello Bala,

      you missed the point here.

      NOARCHIVELOG mode is like the logmode AUTOOVERWRITE.
      Still log data is written out that could be used for recovery.
      The situation I described here is a special setting where MaxDB is not allowed to write log data at all!

      If you run your Oracle database in NOARCHIVELOG mode, crash it and delete your online redolog files then you won’t even be able to start your database anymore.

      Hope that helps to make it clearer to you.
      regards,
      Lars

      (0) 

Leave a Reply