Skip to Content
When I first got in contact to the possibility to bring back a database to the state it had, when a backup was taken I was not too much impressed.
I though “Well, it’s like any other backup – copy the saved file back to the place I found it.”

Getting deeper into the tasks of a database developer (who had to take care of the database instance as well), I soon figured out that there is a much cooler feature available: point-in-time-recovery (PITR).

This feature allows to re-execute every change made to the database after the backup has been taken, so that it is possible to bring the database right back to any state it had between the time the backup has been taken and the moment where most recent change had been done.

One major opportunity of this feature that is broadly advertised by all database vendors: the ability to undo human mistakes – even if they are recognized hours or days after the mistakes happened.
Just recover the latest backup and ‘roll-forward’ just before the disastrous “DROP TABLE” was issued (for some reason this must be really the most frightening thing to the developers of the database documentation …).

So if this is that great – what is wrong with it?

The feature itself is a very nice thing that – applied correctly – brings many advantages.

However, the problem is that it is most often not applied correctly. Although the PITR guarantees to result in a transaction-consistent database, the use of it may lead to severe inconsistencies in your “data world”.

Usually database developers expect a database to be the only storage of information. When this assumption hold true, the PITR is completely nice. Unfortunately this assumption is always never true.

Synchronize with the real world

The information stored in the database represent facts we know about the reality. That could be orders, invoices, customer names and the like.
If we set back just the database to a state it had in previous point in time the facts in the real world are not sat back as well.

Even worse, the real world has no way to know that the database from which your company takes all information for its business processes is not “up-to-date” anymore.
If your real world supplier gets a second order for material he usually will not suspect that you made a mistake – he will fulfil the order and invoice you.

It is even worse – in today’s companies there is not a single database installation, there are tens, hundreds or thousands of production database running. Many of them are connected via interfaces to synchronize data and trigger the execution of processes based on this data. Now assume one central database to be set back to a point in time, let us say yesterday morning. All processes triggered and eventually finished since then, will be re-triggered.

Ask yourself, what would happen in your company if such a thing occurs.

Root cause

So what is the reason for all these problems?
Basically, the problems occur because there is no general synchronization between systems.
Systems cannot be synchronized well by timestamps as it is impossible to bring the exact same time to every system – at least the deviations will not be small enough.
Therefore, database systems rely on time-independent mechanisms to bring order to the sequence of changes made to the data.

In Oracle, this is the System-Change-Number, in MaxDB, it is called log-IO-number and there are similar concepts in the other DBMS as well. DBMS just keep counting the number of changes that had been done to the data. That way it’s possible to say: “Ok, I restore a database backup where the last change was change number 1000 and now, by using the redo-information, I roll-forward to change number 1200”.

Obviously, this change counting approach can only work within one system.
Other systems (databases, web-services, real world…) all have their own change counting in place (for the real world this would be the real time) – all of them completely separated from the other.

One way out

One way to come around this is to implement the change counting on a lower level of the IT-structure of your company: at the storage level.

If you keep all data ever stored in your company on a single storage facility (SAN, NAS, … ) than this storage-“thing” can do the change-counting to keep track of all changes made to the whole data-“world” of your company.
With a setup like this, it is of course possible to perform a PITR of your whole data-“world”.
Anyhow, even if now all data in your company is consistent again: the real world is still not set back in time. Your supplier still would deliver the order that has been sent a second time.

So, what can we do about it?

The easy answer is: Do not do point-in-time recoveries!
Do not take them as a valid option to get your data back quick and easy!

In most cases where a PITR has been done, the major effort had to be invested into making the database consistent again with other systems and with the real world afterwards.

Even the often taken approach to restore “just” one table from a database backup in a different instance up to a certain point in time, then copy the table to the target database needs huge effort to result in a consistent database again. For SAP systems, it is yet worse as referential constraints are not implemented at database level but only by the SAP work processes.
Therefore after such a partly PITR of the database one even has to run check reports (if such ones exist for the table in question) to be sure the database is consistent again.

In my view, PITR is not a feature for production databases or databases that are connected to any other system or that have any real relevance to the real world. So which one could that be?

Test systems, Training-Systems and  Q/A-Systems are the only systems that could bear a PITR.

All other systems, especially production- and development-systems should never be exposed to a PITR.
Obvious to see – these are also the systems where nobody should ever be able to manually drop a table or delete all data of a table.

But we need this feature…

Now – what should you do if a user accidentally deletes important data?
What if the user does the wrong change to the wrong data entry (e.g. to invoicing the wrong customer)?

The answer is “It depends on what you would do in the real world.” If you send an invoice to the wrong customer, you usually would try to cancel it in your system and try to prevent it from delivery.
If the wrong invoice is delivered, one would send a mail to say “Sorry, this was a mistake, please ignore the invoice. Thanks!”
Therefore, your application should give the same options to you.

If it is possible that a user makes a mistake that could be undone like any action in WORD – the application should implement UNDO functionality for that (for example keep versions of specific data).

The punch line is: complex UNDO-functions have to be implemented at the level where the changes happened – not on any low-level like the database.

So better forget or banish the PITR option from your DBA toolset.
It is much more dangerous than useful.

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Siddhesh Ghag
    Hello,

    Good blog, however I am afraid , I do not agree that it should be banished.

    Most of the customers that interface with the outside world using SAP Database have records kept about the invoices / orders sent through their systems, so notification can be sent to appropriate third parties. All this is still better then having thousands of people redo the data entry they have done after the online backup completed.

    Also, generally production servers are setup with standby databases and other kinds of failovers , so PITR is used only for creating system copies of production and it surely is a useful feature for testing..

    Regards,
    Siddhesh

    (0) 
    1. Lars Breddemann Post author
      Hi Siddesh,

      thanks for your reply.
      For many installations your assumption about the failover and standby facilities employed at production sites are correct.
      However there are many production sites that are not protected like this. There are also many production sites that run without a correct backup.
      It’s sad and it’s true.

      Of course the usual argument would now be that this is the responsibility of the DBAs to ensure that – and yes it is their responsibility.
      But shouldn’t it be the DEFAULT that the database only works if the data is correctly secured?
      The dbms currently around do not enforce backups – they just make them possible.
      To me that’s a wide range of improvement options…

      KR Lars

      (0) 
  2. Satyabrata Basu
    Hi Lars,

    Thanks for bringing this interesting point about consistency with other systems when you perform a DB restore. But, be careful, it is not exactly fault of Point-in-time-recovery (PITR).

    However, as you say in “real world” the Production database restore (PITR or not) are always an option generally followed as a last resort when all other possibilities (like UNDO a particular change etc) are exhausted. And when you have to do the database restore (i.e. Disaster recovery Scenario), a clear post-process/adjustment procedure should be (and is generally) there to make/check complete environment consistent.

    I think PITR is still a nice feature and when used safely it saves lots of manual duplicate work.

    Cheers !!

    Satya.

    (0) 
    1. Lars Breddemann Post author
      Hi Satya,

      as I wrote, PITR is nice when applied correctly.
      Believe it or not, I’ve seen far too many systems without a good backup strategy that has been tested.
      These databases belonged to “small” SAP customers, medium sized companies as well as to multi-national-mega corps.

      And far too often I had to work on messages where the customer already tried to fix an issue by a restore – only to realize that a) the problem wouldn’t be fixed by a restore and b) that now his databases is really messed up because the restore/recovery did not work as expected.

      In real life situations the restore strategy has to be rock solid. Idiot proof if you like it.
      PITR is nothing like that – it’s a complex feature with a very specific range of applications.

      KR Lars

      (0) 
  3. Markus Doehr
    …many years ago – at that time still under “ADABAS”.

    We had big hardware problems (under Windows NT 3.5 Advanced Server), because of handling mistakes our last backup was unusable and we did PITR.

    We lost only about 15 minutes but since an automatic stock was connected which was triggered by ALE/IDOC, at that time the daily invoice printout was running, production orders were not confirmed back and… you can imagine. It took us more than half a year to get the system “clean” again and to a state where the status was reliable again.

    I would *NEVER EVER* do that again.

    Markus

    (0) 

Leave a Reply