Skip to Content

Depends on whom you ask, I am not often wrong.   This time I was ….and foolishly put my $$$ where my mouth was.   At ISUG this year, I was presented several sessions – one of which focused on data structures and segued into data compression and data encryption – with the idea of showing why compression and encryption could have the impacts that they do and when to best use (or not use them).   A bit of a heady topic to jam into 2 hours.   However, during my development of the materials, I decided to take on the challenge that my friend Cory Sane had issued last year at the enhancements panel:  why doesn’t expected row size behave the way we would like it to?

For those of you who have no idea of expected row size, shame!!!   You really must like running reorgs…..and exclusive table locks.

Back nearly 20 years ago when ASE added datarows locking, a number of features were added to alleviate common locking contention points.   For example, the concept of using non-transactional latches vs. locks for indexes.   One of those features was the notion of an “expected row size”.   The problem with row level locking was that if you updated the row and the row expanded, often we were forced to forward the row to a new page as the new size impinged on the next row on the page.   While we could have simply locked the rest of the rows on the page, for common situations such as heap tables, this would lead us right smack back into the problem we were trying to solve – hence row forwarding.   To prevent row forwarding as much as possible, we added a table attribute “expected row size” that can be set via “create table” or via sp_chgattribute.   When set, this preserves enough space on the page so that full rows could be written….but didn’t reserve the space within the row.   As a result, there were some situations in which some row forwarding still occurs.   Cory’s challenge was we should fix it.   Engineering’s response was that it would be extremely difficult to near-impossible……

So, partly due to curiosity and partly to answer the challenge, I decided to figure out why.   The answer was in the row format.   To make the expected row size attribute work as desired, we would have to add another 2 bytes to the row format – which presents a nasty issue during upgrades as this would involve re-writing every data page with datarows locking….and due to the row expansion, likely would involve page overflows/new allocations – which would in turn lead to index RID updates, etc.   Ugh.  No one wants an upgrade that takes 30 days…..

In the session, I did point out that it is clearly documented that if you have forwarded rows and run reorg forwarded_rows – or any other reorg that purportedly works in “small transactions” to avoid contention, that as soon as the reorg hit a forwarded row, it escalated to an exclusive table lock for the remainder of that “small transaction”….doesn’t sound so bad….until you realize that those 3000 forwarded rows becomes 3000 exclusive table locks – just one of the many reasons that using exp_row_size to dramatically reduce forwarded rows is much better than running reorgs to begin with….or at least upgrade to ASE 15.7 sp100+ and use one of the online reorgs.

However, as part of the discussion, I demonstrated how expected row size was multiple orders of magnitude better than tables without the attribute set – showing that for my example, it reduced row forwarding from nearly every row to just low thousands (or high hundreds) instead of 100,000.   I also pointed out that this is where I was typically lazy – rather than trying to compute the row size as documented in the manuals (x bytes of overhead, plus x bytes for each variable length column, etc.), I simply tried to set the exp_row_size attribute to something stupidly large (e.g. 4000 bytes) and ASE would whine at me about the size being larger than the maximum of ### bytes…..and so I would then simply reset it to the maximum as computed by ASE.

At that point, things went a bit tricky.   Kevin Sherlock challenged me with whether this was the logical or physical rowsize being returned in the error from ASE.  I had assumed the physical (to include all row/column overhead) vs. the logical (just the datatype lengths – no row overhead) and said so.   He pointed out that if it was the logical vs. the physical, it might explain the little row forwarding that I ended up with……meh…..maybe…..maybe not (actually, not as we still have to deal with contiguous vs. non-contiguous free space and the role of the housekeeper – which we didn’t get into)…..and knowing that, I challenged him right back.   Bad move….Jeffy….bad move.    He accepted with the loser paying for lunch sometime as the penalty (we are big spenders at ISUG).   The problem is that this is so easily disproven – even in my example, it was obvious that the answer was logical – I just wasn’t paying attention.

Of course, now I have to figure out if engineering takes the ‘logical’ row size and computes a physical one that is used internally……I have an idea how I can prove that….and if so….then maybe he will owe me lunch after all.

…next time I am in Omaha…one of us will be eating steak and the other crow…..I just hope it tastes like chicken.

Now, about that “magic go faster” button.   During the keynote, it was my pleasure to demonstrate the combined power of SNAP (Simplified Native Access Plans – aka “compiled queries”); Latch-free B-trees (see my blog at http://scn.sap.com/community/ase-custom-applications/blog/2015/03/13/latch-free-b-trees-coming-soon-to-a-server-near-you for explanation); and Lock-less Buffer Manager.    To set the stage, the SPEED team and QA in engineering had set up two identically configured HW boxes – both with 80 cores and 512GB of memory.   On one, we were running ASE 16 sp01 – hot off the presses with all the advantages for contention in 16GA over 15.7….where most of you are still running.   On the other, we were running the latest engineering build of ASE 16 sp02.    Both were configured identically with 80 engines and gobs of memory – and identical data cache configurations, etc.   The only difference, of course was that the ASE 16 sp02 system had all three of the XOLTP features we were discussing enabled.   The benchmark we ran was an internal one from one of our internal systems used in real life every day (our case management system) – we simply scaled it from 1000 to 5000 users and plotted the results in both response time and throughput.    For those of you who wimped out on ISUG and didn’t get to see the demo, here is a screen snapshot of the results from one of the many practice runs:

keynote_demo 2.jpg

Okay, so it might be a tad small….but it shows that sp02 had 7x( !!! …that’s 700%) better throughput and 90x improved response times!!!   Whoa!!!!   When is the last time you saw anything greater than 2-3x – without any code changes?????   Of course, the issue is contention – which isn’t always consistent – for example, one of the other practice runs had a slightly different result with:

/wp-content/uploads/2015/04/keynote_demo_678477.jpg

Still – nearly 4x better with sp02 vs. sp01.    I wanted to show this last chart for a reason.   Contention is often a matter of timing – and as beautifully illustrated by the red sp01 line in the bottom chart (response time), contention can be rather imprecise and unpredictable – sometimes worse than others – which is why it is such a nuisance as you may think you have fixed it…and presto – it comes right back at you.

How realistic was the test???   Very – I have frequently observed high cache contention in a number of customer cases – especially in FSI customers where one of the most common queries is for recent stock pricing.   Last June for example, I was proving to one FSI customer in London that the threaded kernel was indeed faster than the process kernel – and in their case what we quickly learned was that the single biggest scalability problem in their application was cache contention.   I could only say “sorry…..but ASE 16 will fix that”….and it does as shown above.   Even without user data entering into the picture, for one customer, we determined that the biggest impediment to scaling the application was cache contention on sysusers – even though bound to a separate named cache (just that one system table)…all due to the fact the app connected via a single login (even if it used several, due to page size, it still would have been an issue).   For those of you who insist on not listening to my rants about using a dedicated named cache for system tables (and indexes) and still have system tables in default data cache….welllll…..let’s just say that if you really want a fast way to reduce your default data cache spinlock contention, create that named cache as I have said many times over…..

….and then in sp02, tweak it by enabling the lockless buffer manager to eliminate all cache contention.

In the next months, myself and my colleagues will be taking to this space to bring you more details about the Corona features in sp02.   Topics will include things such as Transactional Memory, SNAP, lockless buffer manager, HADR, capture replay (I got to demo that too…..definitely cool!!)……so stay tuned.

To report this post you need to login first.

11 Comments

You must be Logged on to comment or reply to a post.

  1. Kevin Sherlock

    Jeff, if anyone is in deep water here, it’s me.  Almost 3 years ago I started considering presenting a session on storage management for that year’s conference.  I discussed that with Cory and mentioned the exp_row_size anomaly, and he took that issue on for me.  So, the idea of stepping into that deep water with you was running through my mind when I sat there with my mouth opened ,hesitating to “challenge” you. 

    I need to go look at my notes from that exercise before I say anything more.  And, as usual, my understanding is probably outdated by now.  But I seem to remember the relationship between exp_row_size and a couple of counters on the page header wrt non-contiguous free space, and space at the end of the page.   And most certainly that the maximum value allowed for exp_row_size setting is indeed the “minimum” max row size possible for a variable length row.

    Last week, when you told an especially inquisitive attendee that DBA’s need not concern themselves necessarily with page layout details – I bristled.  Something in my past told me that I needed to pursue that further, and then … the “bet” happened.   I’m sure I’ll regret it. And yes, if you find yourself in Omaha anytime soon, there’s a particularly fine restaurant “The Drover” that I’d like to introduce you to! 😉

    (0) 
    1. Jeff Tallman Post author

      Careful – the question being asked by the inquisitive individual was more about the specifics of the rowformat overhead (status bits, etc. and where they were within the row) and not about the page format.   His question (and he was more an academic) was more about the make-up of the rowformat elements and their position within the row.   Quite bluntly, it simply doesn’t matter whether we store the length of variable length columns in the row header or after the fixed column data – it won’t impact your query speed.   Whether the length of variable length cols is stored a byte offset 3 of the row or byte offset 60 (due to fixed length cols), we KNOW where it is and can simply read that byte to find out where to find the variable length column offset table.   No extra effort – we don’t need to read the fixed length data just to find the length of variable length cols.    Secondly, my point was that DBMS’s overall – Oracle, MSSQL, DB2, ASE – all share similar row formats – largely driven by data space efficiency.   While there are variations – there are no significant differences.   Worrying about how efficient the row format is (in my mind) an even more trite argument than the age old battle over the char(20) not null vs. varchar(20) null wrt to performance.  Fretting about char(20) not null just for a few nanoseconds difference better access as a fixed length col over varchar(20) is sooo backwards considering the factoring in of blankpadding and potential extra IO costs those blanks cause as well as the string comparison aspect of ignoring trailing blanks (and the ANSI vs. non-ANSI behavior of same).    So my point to him was not to worry about the INTERNALS (e.g. where we place the status bits, the variable length col size, etc.) of the rowformat – but focus instead on the bigger issues – the same individuals who fret about this sort of stuff are almost always the same individuals who 1) don’t set the exp_row_size at all; 2) don’t pay attention to which index should be clustered vs. non clustered; 3) ignore good indexing based on query considerations and as a result often refuse to add indexes where needed and won’t drop unused indexes; 4) never consider the index sort order when creating the index….I could go on – but essentially, they rarely see the forest due to focusing so hard on the bark on the tree in front of them…..when the fact they are lost in the forest and aren’t using the compass in their hand to guide them out.    People have a lot bigger issues to solve than how/where we store rowformat overhead elements such as the status bits, etc.   So, I standby my comment – people should NOT worry about the rowformat storage – instead they should worry about row forwarding, page splits (APL tables), impact of compression/encryption, proper indexing, etc. – things they CAN affect vs. something which (unless there is a bug) has no affect on them and they can’t really change.

      (0) 
      1. Jeff Tallman Post author

        I should also slightly correct the above to point out the person also asked questions about the makeup of the page header – e.g the page id, the partition id (vs. object id), etc. – my comment to him was to NOT worry about the page header internals – as with the rowformat internals not a lot to concern.  The guy certainly asked a lot of questions – a very inquisitive individual, you might say.   Unfortunately, it was a 2 hour session and not a week long class, so in that context we didn’t have time to address academic discussions not germane to the focus.   But I sure wish I had a nickel for every interruption…..

        However, one point that may have been lost due to the interruptions was that the index row format – which normally is similar to the datarow format had to change in order for prefix compression to work.   So the entire discussion about rowformats really was to address the major factor in that by changing the rowformat to achieve better/faster compression for indexes, we ended up resulting in wider row formats – which might (in some cases) negate the index compression – and that *IS* a valid concern to look out for.   It also (as a segue) addressed why we used latches for indexes vs. row locking indexes as nonunique indexes don’t have rows for every entry.

        So….the purpose of the internals discussion was aimed at addressing some FAQ questions:

        1) why index compression sometimes doesn’t work so well

        2) why we use page latching for indexes vs. rll (which drives the focus around the latch-free b-tree we demo’d as the solution vs. even attempting rll for indexing).

        3) how the in-memory DRC will differ from IMDB

        ….but my point stands – where in the rowformat or pageformat we store bits and metadata is irrelevant and of no concern – it is more of an academic discussion – and it is taught in classes, but largely irrelevant.   What is relevant is how differences in row formats can impact things you do have control over (e.g. DRC vs. IMDB ….or index compression vs. uncompressed indexes)…or why encrypting column data can result in more storage usage

        (0) 
      2. Kevin Sherlock

        Ok, ok, ok.   I’ve poked the bear here.   Typo on my part.  I meant to say that he was asking about row format, not page format.   And I didn’t understand it to be a question about where each component was, but whether or not we should necessarily be concerned with becoming familiar with it.  This indeed was off-topic for what you were discussing … it just fired off a few beat-up synapses left in my brain about exp_row_size and forwarding issues…

        Mea culpa –

        (0) 
  2. Kevin Sherlock

    I’ll also mention that this research was part of the motivation for my enhancement request #3708 on the ISUG site requesting a built-in function to retrieve the physical size of a row.  The text of that request is below.  I have no idea what status the ISUG-TECH enhancement process carries anymore, but it is still there.

    “Provide a new, or enhance the existing datalength() builtin to retrieve the size of a row in a table.  Currently, the datalength() builtin requires an expression or a column name as an argument.  With this enhancement, an asterisk “*” could be allowed to return the total stored length of a row.  This would help DBA’s automate procedures to manage storage attributes for tables such as “rowsperpage”, or “exp_row_size” in an effort to reduce the likelyhood of page splits or forwarded rows. Currently, one has to construct complex query to determine rowsize by using datalength() builtin on every column and summing their sizes together. Row size is already available on the page via the offset table, so no additional work is needed other than providing either a new builtin, or modifying the existing datalength() builtin.  Permissions would be the same consideration as column level permissions today, ie, user would have to have permission to read all columns, etc.”

    (0) 
    1. Bret Halford

      There is at least one open feature request CR for a rowlength() function in ASE engineering, CR 721995.  It is currently assigned to a developer, but no ETA on when it will actually be developed.

      -bret

      (0) 
        1. Jeff Tallman Post author

          It is only the *average* size of the row – which is computed based on the number of rows/ (number of pages * page size)…. used more for IO estimates than for page layouts.  

          (0) 
          1. Siddhartha Sarkar

            Hi Jeff.

            I just ran this test in my 15.0.3 test server and getting datarowsize = 6 instead of a fraction. What am I doing wrong. I do estimates based on this column.

            create table mytest

            (

                abc int not null

            )

            go

            insert mytest

            select 1

            go

            update statistics mytest

            go

            select datarowsize from systabstats where id = object_id(‘mytest’)

            go

            drop table mytest

            go

            Thanks,

            Sid.

            (0) 
            1. Bret Halford

              Hi Sid,

              Your table is fixed length, with 4 bytes for the INT and 2 bytes of overhead.  You will only see datarowlength with fractional component if you have variable-length rows with some actual variation in the length in some of the rows.  Even then you may occasionally get an average length that is a whole number.

              -bret

              (0) 
      1. Kevin Sherlock

        Thanks for the info Bret.  My enhancement request for this was submitted Sept 2012.  Would be nice to see this and a corollary change to allow setting the exp_row_size to the maximum possible physical row size (which gets interesting for larger page sizes, encryption, compression, etc).

        (0) 

Leave a Reply