Depends on whom you ask, I am not often wrong. This time I was ….and foolishly put my $$$ where my mouth was. At ISUG this year, I was presented several sessions – one of which focused on data structures and segued into data compression and data encryption – with the idea of showing why compression and encryption could have the impacts that they do and when to best use (or not use them). A bit of a heady topic to jam into 2 hours. However, during my development of the materials, I decided to take on the challenge that my friend Cory Sane had issued last year at the enhancements panel: why doesn’t expected row size behave the way we would like it to?
For those of you who have no idea of expected row size, shame!!! You really must like running reorgs…..and exclusive table locks.
Back nearly 20 years ago when ASE added datarows locking, a number of features were added to alleviate common locking contention points. For example, the concept of using non-transactional latches vs. locks for indexes. One of those features was the notion of an “expected row size”. The problem with row level locking was that if you updated the row and the row expanded, often we were forced to forward the row to a new page as the new size impinged on the next row on the page. While we could have simply locked the rest of the rows on the page, for common situations such as heap tables, this would lead us right smack back into the problem we were trying to solve – hence row forwarding. To prevent row forwarding as much as possible, we added a table attribute “expected row size” that can be set via “create table” or via sp_chgattribute. When set, this preserves enough space on the page so that full rows could be written….but didn’t reserve the space within the row. As a result, there were some situations in which some row forwarding still occurs. Cory’s challenge was we should fix it. Engineering’s response was that it would be extremely difficult to near-impossible……
So, partly due to curiosity and partly to answer the challenge, I decided to figure out why. The answer was in the row format. To make the expected row size attribute work as desired, we would have to add another 2 bytes to the row format – which presents a nasty issue during upgrades as this would involve re-writing every data page with datarows locking….and due to the row expansion, likely would involve page overflows/new allocations – which would in turn lead to index RID updates, etc. Ugh. No one wants an upgrade that takes 30 days…..
In the session, I did point out that it is clearly documented that if you have forwarded rows and run reorg forwarded_rows – or any other reorg that purportedly works in “small transactions” to avoid contention, that as soon as the reorg hit a forwarded row, it escalated to an exclusive table lock for the remainder of that “small transaction”….doesn’t sound so bad….until you realize that those 3000 forwarded rows becomes 3000 exclusive table locks – just one of the many reasons that using exp_row_size to dramatically reduce forwarded rows is much better than running reorgs to begin with….or at least upgrade to ASE 15.7 sp100+ and use one of the online reorgs.
However, as part of the discussion, I demonstrated how expected row size was multiple orders of magnitude better than tables without the attribute set – showing that for my example, it reduced row forwarding from nearly every row to just low thousands (or high hundreds) instead of 100,000. I also pointed out that this is where I was typically lazy – rather than trying to compute the row size as documented in the manuals (x bytes of overhead, plus x bytes for each variable length column, etc.), I simply tried to set the exp_row_size attribute to something stupidly large (e.g. 4000 bytes) and ASE would whine at me about the size being larger than the maximum of ### bytes…..and so I would then simply reset it to the maximum as computed by ASE.
At that point, things went a bit tricky. Kevin Sherlock challenged me with whether this was the logical or physical rowsize being returned in the error from ASE. I had assumed the physical (to include all row/column overhead) vs. the logical (just the datatype lengths – no row overhead) and said so. He pointed out that if it was the logical vs. the physical, it might explain the little row forwarding that I ended up with……meh…..maybe…..maybe not (actually, not as we still have to deal with contiguous vs. non-contiguous free space and the role of the housekeeper – which we didn’t get into)…..and knowing that, I challenged him right back. Bad move….Jeffy….bad move. He accepted with the loser paying for lunch sometime as the penalty (we are big spenders at ISUG). The problem is that this is so easily disproven – even in my example, it was obvious that the answer was logical – I just wasn’t paying attention.
Of course, now I have to figure out if engineering takes the ‘logical’ row size and computes a physical one that is used internally……I have an idea how I can prove that….and if so….then maybe he will owe me lunch after all.
…next time I am in Omaha…one of us will be eating steak and the other crow…..I just hope it tastes like chicken.
Now, about that “magic go faster” button. During the keynote, it was my pleasure to demonstrate the combined power of SNAP (Simplified Native Access Plans – aka “compiled queries”); Latch-free B-trees (see my blog at http://scn.sap.com/community/ase-custom-applications/blog/2015/03/13/latch-free-b-trees-coming-soon-to-a-server-near-you for explanation); and Lock-less Buffer Manager. To set the stage, the SPEED team and QA in engineering had set up two identically configured HW boxes – both with 80 cores and 512GB of memory. On one, we were running ASE 16 sp01 – hot off the presses with all the advantages for contention in 16GA over 15.7….where most of you are still running. On the other, we were running the latest engineering build of ASE 16 sp02. Both were configured identically with 80 engines and gobs of memory – and identical data cache configurations, etc. The only difference, of course was that the ASE 16 sp02 system had all three of the XOLTP features we were discussing enabled. The benchmark we ran was an internal one from one of our internal systems used in real life every day (our case management system) – we simply scaled it from 1000 to 5000 users and plotted the results in both response time and throughput. For those of you who wimped out on ISUG and didn’t get to see the demo, here is a screen snapshot of the results from one of the many practice runs:
Okay, so it might be a tad small….but it shows that sp02 had 7x( !!! …that’s 700%) better throughput and 90x improved response times!!! Whoa!!!! When is the last time you saw anything greater than 2-3x – without any code changes????? Of course, the issue is contention – which isn’t always consistent – for example, one of the other practice runs had a slightly different result with:
Still – nearly 4x better with sp02 vs. sp01. I wanted to show this last chart for a reason. Contention is often a matter of timing – and as beautifully illustrated by the red sp01 line in the bottom chart (response time), contention can be rather imprecise and unpredictable – sometimes worse than others – which is why it is such a nuisance as you may think you have fixed it…and presto – it comes right back at you.
How realistic was the test??? Very – I have frequently observed high cache contention in a number of customer cases – especially in FSI customers where one of the most common queries is for recent stock pricing. Last June for example, I was proving to one FSI customer in London that the threaded kernel was indeed faster than the process kernel – and in their case what we quickly learned was that the single biggest scalability problem in their application was cache contention. I could only say “sorry…..but ASE 16 will fix that”….and it does as shown above. Even without user data entering into the picture, for one customer, we determined that the biggest impediment to scaling the application was cache contention on sysusers – even though bound to a separate named cache (just that one system table)…all due to the fact the app connected via a single login (even if it used several, due to page size, it still would have been an issue). For those of you who insist on not listening to my rants about using a dedicated named cache for system tables (and indexes) and still have system tables in default data cache….welllll…..let’s just say that if you really want a fast way to reduce your default data cache spinlock contention, create that named cache as I have said many times over…..
….and then in sp02, tweak it by enabling the lockless buffer manager to eliminate all cache contention.
In the next months, myself and my colleagues will be taking to this space to bring you more details about the Corona features in sp02. Topics will include things such as Transactional Memory, SNAP, lockless buffer manager, HADR, capture replay (I got to demo that too…..definitely cool!!)……so stay tuned.