Skip to Content
Author's profile photo Klaus Liu

HANA Savepoint Analysis

Hi,

 

I would like to share some knowledge about the savepoints in HANA. And note “2100009 – FAQ: SAP HANA Savepoints” is the reference.

 

1.What are savepoints?

  • Savepoints are required to synchronize changes in memory with the persistency on disk level. All modified pages of row and column store are written to disk during a savepoint.
  • Each SAP HANA host and service has its own savepoints.
  • The data belonging to a savepoint represents a consistent state of the data on disk and remains untouched until the next savepoint operation has been completed.

 

2.When is a savepoint triggered?

  • Savepoint interval(automatic)

    During normal operations, savepoints are automatically triggered when a predefined time since the last savepoint is passed. The length of the time interval between two consecutive savepoints can be controlled with the following parameter:

    global.ini -> [persistence] -> savepoint_interval_s

    Its default value is 300, so savepoints are taken at intervals of 300 seconds (5 minutes).

  • System command (manual)

    The following command can be used to execute a savepoint manually:

    ALTER SYSTEM SAVEPOINT

  • Soft shutdown

    A soft shutdown invokes a savepoint before the services are stopped.

    A hard shutdown doesn’t trigger a savepoint. This can increase the subsequent restart time.

  • Backup

    A global savepoint is performed before a data backup is started.

    A savepoint is written after the backup of a specific service if finished.

  • Startup

    After a consistent database state is reached during startup, a savepoint is performed.

  • Snapshots

    Snapshots are savepoints that are preserved for longer use and so they are not overwritten by the next savepoint.

 

3. Helpful Views

View Details
M_SAVEPOINT_STATISTICS Global savepoint information per host and service
M_SAVEPOINTS Detailed information for individual savepoints

M_SERVICE_THREADS

M_SERVICE_THREAD_SAMPLES

HOST_SERVICE_THREAD_SAMPLES

As of SAP HANA SPS 10 savepoint details are logged for THREAD_TYPE = ‘PeriodicSavepoint’ (see SAP Note 2114710).

 

 4. Helpful SQL Script.

1969700 – SQL statement collection for SAP HANA

SQL statement Details
SQL: “HANA_IO_Savepoints Detailed information for individual savepoints
SQL: “HANA_IO_Snapshots” Snapshot information

 

5. Blocking Phase 

The majority of the savepoint is performed online without holding a lock, but the finalization of the savepoint requires a lock. This step is called the blocking phase of the savepoint. It consists of two major subphases:

Sub phase Thread detail Description
WaitForLock enterCriticalPhase(waitForLock) Before the critical phase is entered, a ConsistentChangeLock needs to be allocated by the savepoint. If this lock is held by other threads / transactions, the duration of this phase is increasing. At the same time all other modifications on the underlying table like INSERT, UPDATE or DELETE are blocked by the savepoint with ConsistentChangeLock.
Critical processCriticalPhase Once the ConsistentChangeLock is acquired, the actual critical phase is entered and remaining I/O writes are performed in order to guarantee a consistent set of data on disk level. During this time other transactions aren’t allowed to perform changes on the underlying table and are blocked with ConsistentChangeLock.

 

6. Typical savepoint issues analysis

Symptoms Thread detail Details
Long waitForLock phase  enterCriticalPhase
(waitForLock)
Long durations of the blocking phase (outside of the critical phase) are typically caused by SAP HANA internal lock contention. The following known scenarios exist
ConsistentChangeLock
Starting with Rev. 102 you can configure the following parameter in order to trigger a runtime dump (SAP Note 2400007) in case waiting for entering the critical phase takes longer than <seconds> seconds:indexserver.ini -> [persistence] -> runtimedump_for_blocked_savepoint_timeout = ‘<seconds>’
(This is not a default parameter, add this parameter manually )
Long critical phase processCriticalPhase  Delays during the critical phase are often caused by problems in the disk I/O area.

 

7. Analyze the runtime dump

indexserver_<hostname>.30003.rtedump.<timestamp>.savepoint_blocked.trc
is triggerred by the parameter runtimedump_for_blocked_savepoint_timeout.
You could check the runtime dump from the following aspects.

 

  • We could find the savepoint thread,
    Savepoint Callstack contains “DataAccess::SavepointLock::lockExclusive”

  • Other threads(SQL thread) waiting for the lock, Callstack contains: “DataAccess::SavepointSPI::lockSavepoint”

  • Runtime dump : section [SAVEPOINT_SHAREDLOCK_OWNERS]

    Always, most time the savepoint hangs because the exclusive lock is occupied by other thread. This section can helps find which thread is occupying the lock.

    SAVEPOINT_SHAREDLOCK_OWNERS Owners of shared ConsistentChangeLock locks In case a savepoint is blocked in the waitForLock phase (SAP Note 2100009), the blocking activities can be found in this section.

     

    Example:  In the following section, you could find that the thread id 298995 is blocking the shared lock which leads to the exclusive lock is blocked and hangs the savepoint.

     

    [SAVEPOINT_SHAREDLOCK_OWNERS] Owners of shared SavepointLocks: (2017-10-10 11:18:13 112 Local)
    96034[thr=298995]: JobWrk0145, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299143, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “JobWorker”, method: “”, detail: “”, command: “” at 0x00007efe63342e88 in ltt::string_base<char, ltt::char_traits<char> >::trim_(unsigned long)+0xb8 at string.hpp:683 (libhdbcs.so)
    [OK]


    After you got the thread id of the sharedlock owner, you could search the thread id and try to find its parent thread id. In this example, we could find the parent thread id is the following:

    107423[thr=299143]: MergedogMerger, TID: 4856, UTID: 1588661641, CID: -1, LCID: 0, parent: 299445, SQLUserName: “”, AppUserName: “”, AppName: “”, ConnCtx: —, StmtCtx: —, type: “MergedogMerger“, method: “”, detail: “3 of 3 table(s): SAPERP:/1LT/VF00094506“, command: “” at 0x00007efe4e645f59 in syscall+0x19 (libc.so.6)

    We got the conclusion that the merge of the table /1LT/VF00094506 is blocking the shared lock. Then we could try to find if any issue with the merge of the table.

  • Runtime dump: Section :  [STATISTICS]  M_SAVEPOINTS_

    Import the data of this view to excel, and sort by column “CRITICAL_PHASE_WAIT_TIME” and “CRITICAL_PHASE_DURATION”

    And we could see that the CRITICAL_PHASE_WAIT_TIME is over 10s, which is quite slow. This proves that there is an issue with the savepoint and also an issue with the exclusive lock.

    And if you could find long duration of “CRITICAL_PHASE_DURATION”. This means there is an issue with the I/O.

 

Hope this helps to understand the savepoint and the root cause the savepoint hanging issue.

 

Best regards,
Klaus

Assigned Tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Denys van Kempen
      Denys van Kempen

      Excellent blog, Klaus. Thanks for posting.

      Denys / SAP HANA Academy

       

      Author's profile photo Klaus Liu
      Klaus Liu
      Blog Post Author

      Thanks for your comment !  Denys !

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Nice blog post! Hope to see more interesting posts from you.

      One way to make them even more interesting could be if you focus a bit more on how you use your knowledge and what problems you solve with it. What triggered you to investigate the savepoint here?
      Is the runtimedump_for_blocked_savepoint_timeout parameter set by default or is that something the DBA has to do?
      When one finds out that bad performing I/O could be the cause of the long savepoint wait time, what do you usually do to fix that? And how would one go about to avoid such issues in the first place?

      These are just a few questions I had when reading your post and I guess that others would find answers to those interesting and helpful, too.

      Anyway, it's a nice start into blogging and I am waiting for your next post now.

       

      Author's profile photo Klaus Liu
      Klaus Liu
      Blog Post Author

      Thanks Lars! Good suggestions and questions ! I will try improving it.
      And I am preparing my new blog and looking forward to your suggestions and comments.

      Author's profile photo Adam Morva
      Adam Morva

      Great blog post. Very informative. Thank you.

      Author's profile photo Mohammed Azher Ul Haque
      Mohammed Azher Ul Haque

      Great Blog Very Very Informative highly appreciate your efforts brother Klaus Liu. Keep it up, Thanks for sharing your valuable experience.

      Author's profile photo Vivekanand Pandey
      Vivekanand Pandey

      Excellent blog, Klaus. Thanks for posting.

      Do we have a plan save point optimzation in case of IO issues

       

      Regards,

      Vivekanand Pandey

      Author's profile photo Raj Madhavan
      Raj Madhavan

      Good information, we had this issue today where the SAVEPOINT hung at CRITICAL sub-phase. The link to note 2100009 in Section 7 no longer works. Interesting to see the details on how to identify other threads with exclusive lock that caused SAVEPOINT to hang.