Skip to Content

One of the most boring topics within the area of DBMS surely is locking. 
It’s also one of the topics that most DB users try to avoid as much as possible.

Still it’s utmost important to correctly understand how locking works in the database of choice to be able to work sucessfully with it and without bad surprises.

Here are two cases of MaxDB locking behaviour that is not obvious at all (and also not well documented, yet).
Note: the first part “grew” while I was researching about it, so I split this blog into two parts. The second part will be released within the next few days.

1) When not to trust count(*)

With the advent of the filedirectory counters a neat query optimization for the popular ‘select count(*) …‘ was implemented.
Whenever the optimizer realizes that the user just wants to know the total number of rows in a table this number is retrieved from the filedirectory counter of the table and not from the table itself.

The performance gain is immense: instead of reading potential thousand of pages and count the rows that are found, MaxDB just looks up the counter and returns the number to the user.

But what happens when there is a transaction open during this query?
Let’s see:

####Session 1:
select * from locktest
THE_ROW
-------
1
10
2
3
4
5
6
7
8
9

select count(*) as cnt from locktest
CNT
---
10

explain select count(*) as cnt from locktest
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                        COUNT OPTIMIZATION                     
                                             RESULT IS COPIED   , COSTVALUE IS           2
                                        QUERYREWRITE - APPLIED RULES:          
                                           DistinctPullUp                                1
####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt from locktest
CNT
---
20 
####Session 2:
rollback
Statement successfully executed. No Result
####Session 1:
select count(*) as cnt from locktest
CNT
---
10

As we see select count(*) returned the count including the additional 10 rows, although they never were commited!
Luckily the explain tells us about the optimization by printing out the “COUNT OPTIMIZATION” line.

A way out ?

But how can we force MaxDB to actually deliver the result for committed rows?
Until writing this blog I thought the solution would simply be to trick the optimizer out of the COUNT OPTIMIZATION, e.g. by adding a dummy predicate that will evaluate to TRUE for all rows, let’s say 1=1.

Let’s see how this works:

####Session 1:
explain select count(*) as cnt from locktest where 1=1
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:          
                                           DistinctPullUp                                1

Ok, the “COUNT OPTIMIZATION” is gone, so we should expect the ‘correct’ counting of data…

####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt from locktest where 1=1
CNT
---
20

WOW – wait a minute!
Still the same wrong result?
YES!
So I’ve written nonsense above?

Nope, what I wrote is actually true.
So how come that we still see the uncommitted data?

The point here is, that there are some more optimizations build in that are all but obvious.

Clever tricks 

When applying the dummy-predicate MaxDB actually will walk down through the whole table.
So far the explain is completely correct.

But what happens with the rows read is what makes up the odd behavior.
For each row found the kernel now checks whether the row is qualified against the predicates.
Part of this is to recognize that just counting is wanted and no actual data from the table should be returned.
So the kernel takes a shortcut and reports back that the row did not fit but increments a counter intead. This is then done with all rows in the table.

The keypoint here is to know that the kernel does only check the locks of a row once it has been recognized as “qualified row”.
Therefore this (obviously old) count(*) optimization shortcuts before the lock handling thereby always delivering uncommitted read-results.

Phew… weired isn’t it? But things are getting creepier!

Another way to fool the optimizer out of the set of optimizations is to use “count(column_name)” instead of “count(*)”.
The problem with that is that for “count(column_name)” only those rows are counted, for which the column is not a NULL value.
This might not be a problem for most SAP tables, but surely can be an issue for others.

Basically only two “safe” ways exist to force the committed read-count.
In both of the following cases the kernel really reads all rows and really needs to check locking but checking for entries in the lock lists.
The two ways are:

1) using “count(pk_column_name)”

In our example column “THE_ROW” forms the primary key, so let’s check this counting approach:

####Session 2:
insert into locktest (select the_row+20 from locktest)
More than one row updated or deleted. Affected Rows:  10
###Session 1:
select  count(the_row) as cnt from locktest
CNT
---
10
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:          
                                           DistinctPullUp                                1

If we change the table and add a new column which might contain NULL values we can observe how fragile this approach can become:

####Session 2:
rollback
alter table locktest add (the_row2 varchar(20))
commit

Now set some values…

update locktest set the_row2='x' where the_row >='5'
More than one row updated or deleted. Affected Rows:  5
commit
select * from locktest
THE_ROW  THE_ROW2
-------  --------
1        ?
10       ?
2        ?
3        ?
4        ?
5        x
6        x
7        x
8        x
9        x
 

If we now just use THE_ROW2 for counting we will get the number of columns that are NOT NULL, regardless of the locking state of the rows:

select count(the_row2) as cnt from locktest
CNT
---

And, of course, locking would be adhered in this case.
That means that without the commit after the update a second session doing the count(the_row2) would simply be blocked until the commit has been issued (when we assume to work at least in isolation mode 1 – committed read).

2) using multiple aggregate-functions at once e.g. “count(*), count(*)”

####Session 2:
insert into locktest (select the_row+20, the_row2 from locktest)
More than one row updated or deleted. Affected Rows:  10
####Session 1:
select count(*) as cnt_1, count(*) as cnt_2 from locktest
CNT_1   CNT_2
-----   -----
   10      10
  
SCHEMANAME  TABLENAME  COLUMN_OR_INDEX  STRATEGY                                PAGECOUNT
------------------------------------------------------------------------------------------
LARS        LOCKTEST                    TABLE SCAN                                       1
                                             RESULT IS COPIED   , COSTVALUE IS           3
                                        QUERYREWRITE - APPLIED RULES:          
                                           DistinctPullUp                                1

 

NOTE: Here, session 1 is also hanging, until session 2 commits or roles back!

That’s it (for now)

That’s it about the count(*) locking effects that may occur with MaxDB.
Quite strange stuff going on here, isn’t it?

So why aren’t there problems with MaxDB and the SAP products all the time?
That’s simply because the NetWeaver application servers use uncommited read (a.k.a. isolation level 0) all the time!

The data access layer is designed to know that rows that are found in the database can already be outdated by then. Since there is a whole business transaction handling engine implemented in the NetWeaver stack (“above” the DBMS) it really doesn’t matter at all for the work processes.
Whenever a user needs to ‘really’ put his hands on a row the application needs to get a enqueue anyhow and that ensures that the correct values are retrieved and now other users can change them while the user is working on the data.

And oh, yes, this is just another big argument against direct database access …

continue reading about locking behavior in part II of this blog, coming up this week.

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Thorsten Franz
    Lars,
    Yes, you did win my interested. I’m anxiously waiting for the next part and hoping that the next part will show how to bypass all these problems. In most cases I expect we should be on the safe side using SAP JPA and pessimistic locking using the locking service. I expect with a @version field optimistic locking should work as well – but what about those cases when single record access is not sufficient and selects oder updates over entire tables are needed? I hope the next part will come soon (and contain some Java code).
    Cheers,
    Thorsten
    (0) 
    1. Lars Breddemann Post author
      Hello Thorsten,

      I’m happy to have such a prominent audience but pretty sad on the other hand to dissapoint you now up-front.
      The next part will “just” cover deadlock detection – specifically when deadlocks are not detected…

      So, I guess, no Java code then.

      However, yes, with the transaction model employed by the current ABAP/J2EE stack, you are pretty safe.
      When you select data from multiple tables in a statement you can be  sure to get a consistent view of the data as of the beginning of the statement – that’s the same as with e.g. Oracle from the user side.

      The problems merely come up, when you, as a developer, are not aware of the locking behaviour and assume characteristics of your data that cannot be guaranteed.

      And, let’s be clear about it, locks are performance/scaleability killers. That’s one of the reasons for the ABAP stack using uncommited reads. Otherwise everybody would spend MUCH time waiting to get a share lock for reading some data.

      Before I forget it: if you’ve got a specific question (say, a case where you need help with), just let me know and I’ll see whether I can do something about it.

      cheers,
      Lars

      (0) 
      1. Thorsten Franz
        Lars,
        Thanks for your quick response. I’m sure the next part will be as interesting as this one, so no problem if it goes into a different direction. 🙂
        I know pretty well how the OpenSQL behaves in the ABAP stack, but I have (naturally) much less experience with how well the AS Java handles database abstraction.
        Maybe some day you could write a blog on how to stay on the safe side and know that you always work with transaction-consistent data even if you go cross-platform.
        Say you have an application that contains some JPA entities, a JPA query, and even some JDBC (using OpenSQL). What do you need to do in order to always keep a consistent picture?
        Another hot topic, I assume, is moving database applications from the AS Java’s system database to a non-system database connected through a custom datasource, say when the database gets so big that you need a differently parametrized DB schema. What do you need to do in order for the application to behave consistently? As far as I know, you can’t run OpenSQL against a non-system database connection, so you would have to use Vendor SQL or Native SQL. How would SAP JPA behave differently?
        I fear that all these questions must appear quite stupid to someone with your database experience. Let it be said to my excuse that the ABAP stack’s DB isolation layer works so perfectly well that in the previous 10 years I never had to worry much about avoiding DB-specific problems. 🙂
        Cheers,
        Thorsten
        (0) 
        1. Lars Breddemann Post author
          Hi Franz,

          the questions are really not stupid at all.
          Unfortunately you put the finger on a spot that is a rather bad one.
          “landscape-consistent” data access is something that basically does not exist in the form of a consistent single query-statement.
          Whenever you cross-borders of the transaction management you’re using (DBMS, application server, transaction engine, choose one…) you are forced to add synchronization mechanisms yourself. And getting these implemented correctly is in fact really difficult.

          At least there is one thing I can calm you with: with secondary database connections you can have consistent database transactions even with non-system databases. With the current versions of NetWeaver this is heavily used e.g. by BI or SCM (sometimes even to the same database… don’t ask… ;-)).

          Well, I see this topic would be worth a chat in persona. Let’s see whether we can meet @TechED Vienna!

          cu Lars

          (0) 

Leave a Reply