Note: opinions and statements made in this blog are mine and not necessarily the official position of SAP.
Recently a colleague of mine (okay, my boss’s boss) was asking me to clarify some of my SRS internals in some PowerPoint diagrams he was planning on using to discuss SRS with a technical audience. He knows SRS, but I think he finds my drawings a little confusing as I tend to add a bit more detail than the docs include. I gladly complied, because I feel that “push-button” DBA’s who don’t understand what they are working with are more of a hazard than a help to any organization. It really doesn’t take much effort to learn stuff…just the time to dedicate….oh, yeah, “training time”….hmmm…that is so 1990’s.
Anyhow, it came out over several discussions that the customer he was presenting SRS to was planning on implementing replication with Murex™. Those of you who are familiar with trading systems have likely heard of it – it is a fairly common financial trading package from Paris….if you have to go somewhere for training, and can take the spouse along….. Oooppss, I digressed. Anyhow, I mentioned to him that while the internals are nice, what he should be discussing with them is the features we added to SRS years ago that were designed with Murex™ in mind….although they work extremely well with other applications as well. Now this is the point about differentiation that is important. Yep, there are other data movement products on the market. And after three previous attempts, Oracle finally got a viable solution in its acquisition of GoldenGate™ – and I do readily admit that GoldenGate™ is the closest data movement product on the market to SRS capabilities. I also have some 100+ PowerPoint slides that show in gross detail how lacking it is compared to SRS – and the features we added for Murex™ are clearly differentiators that are examples of this.
The features were added thanks to the experience of one customer about a decade ago. Now, that customer wasn’t the first customer to implement SRS for Murex™ – they were just the first customer whose account executive was on the ball enough to engage our team when the product didn’t meet the requirements out-of-the-box. The situation was that they wanted to have a standby system that could be online in less than 30 minutes…and a cold standby wasn’t a solution as the DBA’s (wisely) wouldn’t certify that bringing ASE up after a hard crash was possible in 30 minutes. Just after a clean shutdown, it was taking 15 minutes just to allocate all the memory….and then a few more to clear the whopping huge tempdb’s (this was in the years before SSD’s)…and so on. In investigating the latency, we quickly established that part of the problem was that many of the tables in the Murex™ schema did not have any indexes at all….and others had only some medium cardinality nonunique indexes. As a result, when replicating updates and deletes, the replicate ASE was forced to do tablescans. Note that I said the replicate ASE. This was not (yet) an SRS problem. I often characterize SRS performance as “pushing rope”…..you can only push things out as fast as the guy on the other end reels it in. Otherwise it spills to disk…errr the floor.
Happily, we learned that Murex™ put an identity column (M_IDENTITY) on practically every table. This made life easy – all we did is simply added a unique index on that column and declared that column as the primary key for the repdefs. Okay, it wasn’t all that simple – yep, we had to add to the failover process the consideration to drop these indexes as no one had tested how much these “additional” indexes would impact insert speed on the primary. I had an insert slammer test courtesy of another customer’s trading system, but since it didn’t exactly mimic their transactions, it was considered insufficient of a test to use. Quite frankly, adding one index is rarely the problem (and the DBA agreed with me)…it is adding 5 or 6…..but in an effort to satisfy those doomsayers that naysay without first testing….we added the drop logic. Took seconds.
That single change dropped the latency from 20+ hours to ~2-3 hours. Yayyyy!!!!! We can go home. NOT! The customer had a goal of 30 minutes, remember? Ouch. Sooooooo……. that is when we had to look a little closer at what was happening. That took a bit of fun thanks to the fact we had to use MDA tables to try to capture what tables were being manipulated as the volume through SRS made using the normal SRS traceflags such as DSI, DSI_BUF_DUMP impractical – the volume would have filled the filesystem in mere minutes. Thankfully, MDA was possible…and easily workable….but a bit more difficult than today’s AOBJ counters in SRS Monitor Counters (and now you know where they came from…feature #1 in the list). Oh, by the way, you will only get these counters if you enable DSI, DSI_WORKLOAD trace in SRS 15.6 and 15.7 (do NOT use that trace in earlier versions!)….which is why those of you who have dealt with me will see that trace in the capture script – and will hear from me if you leave it out. I hate flying blind.
It took a bit of ingenuity, but eventually we learned that Murex™ had a consolidation process for consolidating positions (and other activities) that had a SQL transaction similar to:
delete from tablename — no where clause
insert into tablename (select * from othertable)
Now, this was a trading system for a major bank. The above regularly whacked 10’s of millions of rows and reinserted 10’s millions of rows….multiple times for multiple tables…and multiple times a day. Okay, I must admit, my first question was “A delete without a where clause??? What goofy person did that without considering ‘truncate table’ instead??” Then I learned why. It was simple….you can’t put ‘truncate table’ in a transaction….and like all really good programmers, Murex™ had made the above process recoverable – e.g. if the system crashed in the middle, you would be restored to your previous state. Grumble. Yes, these were the same deletes that were doing 20 million plus table scans….. Grrrrrrr…. Thankfully, the customer determined that recoverability for this transaction would be managed manually in the event of a failover and they simply added a ‘truncate table’ prior to the transaction. But this brought on feature #2 – SQLDML. Yepper, the ability to replicate high impact SQL statements as SQL instead of the gazillions of rows impacted. And like always, engineering took it one step further and determined not only when it was safe to do so, but also to send to SRS both the replicated SQL command as well as the rows affected and then based on the type of target (standby or reporting, etc.) whether to apply just the SQL command or the to send the replicated row images as SQL. And to make it more controllable, not only could you specify a configurable threshold before it would replicate the SQMDML, engineering also added an ASE session setting to enable/disable it within any session and SRS engineering provided the ability to exclude it on specific tables via the repdef. In other words….lots of control. Oh yeah. I love control.
If you are keeping score, that is SRS: 2; competition: 0. Nosirreee. None of the competitors offer the ability to track insert/update/deletes/execs per table/stored proc over time (so simple) or the ability to flexibly replicate the SQL statement or the logical rows – certainly not when replicating from Sybase ASE – and most often not even from their own native DBMS. See, in order to do so, you have to convince the DBMS engineering staff to add the flexibility to log SQL statements as well as row images….and all too often, DBMS engineers look in askance at data movement products and thumb their noses at such requests. Which is why the competition can’t replicate stored procedure executions natively – but that is a different topic. For SAP Sybase ASE, it was simple – we had a mechanism to log proc executions and we simply borrowed that – which meant we also knew what rows were modified by the statement as we had both the statement begin and end points……versus a subsequent modification in the log.
That took care of the delete. In our case, the next problem was the inserts. Now here is where we ran into a few…shall we say…snags? SRS was sending the inserts just as fast as it could. ASE was just as promptly parsing, compiling, optimizing and then executing every one of them. Hmmmmm….did you know that for simple SQL statements, it takes longer to parse, compile and optimize than to execute??? You did???!!! You are ssooooo on top of the game. Thanks to SRS’s flexibility and since this was ASE 12.5.x, we simply wrote a proc to wrapper the insert statement and then changed the SRS apply logic to invoke the proc instead of the SQL insert and to do so as an RPC call instead of SQL command. Now, this is again some SRS differentiation the Other guys don’t have. Yes, you can have a few stored procs called in pre-statement or post-statement, but often you cannot execute something instead of that operation. That statement will be executed – perhaps with different values as a result of the return params from those procs, but you cannot avoid sending that statement. Kinda like death and taxes. You must (in their world) apply the replicated operation….or write some C code as the only way to truly customize the delivery is to call out to some DLL you write. Even the simple aspect such as to have the option of using RPC over language is a bit…..unthought-of for them. Their intent in the up to 20(!!!…unimpressive) procs you can use to pre/post processing of the replicated operation is to look up decoded values or to log the failure as a form of conflict detection (vs. conflict resolution) in bidirectional implementations and not to allow the replicated operation to be bypassed altogether. A far cry from the nearly unlimited capabilities of SRS to customize delivery.
It turns out that going back to language vs. RPC worked best as then SRS was able to send huge chunks of inserts in one glob vs. one at a time – as it is the optimization step that is so beastly and not the parsing. This brought up SRS new features #3 and #4 – first, we added the ability to use bulk inserts (yes, within a transaction…what do you think bcp -b does?) for large inserts and also added the ability to use fully prepared statements (aka dynamic SQL) instead of waiting for ASE to parse, compile and optimize. Remember, this was in the days before ASE 15.0, so we didn’t have that nice little statement cache literal auto parameterization thingy to help us out – we actually had to change SRS to do what was right. Now, I will admit that some of our competition do support bulk inserts….but ONLY during materialization. Some even support using fully prepared statements as well….but hey, the real slick trick was leveraging the array inserts in blklib to do transactional bulk inserts even with text/image data…and convincing ASE engineering to add fast bulkcopy ability to tables with indexes (yeeehaawww!!!).
Now this is where the competition does something that it just a bit thoughtless. When faced with such situations, they simply use a hash on a column (e.g. the identity) and use parallel processes to replicate that data and show comparisons to SRS without using any of the above features and nothing in parallel. Kinda like a snake oil salesman. They don’t mention, of course, that to do so violates transaction integrity as well as transactional serialization – two very critical aspects if database consistency is of paramount concern….and in which standby system would database consistency not be? But given SQLDML and dsi_bulk_copy, we can flatten them performance wise without requiring 40 cores to do so….and our Multi-Path Replication (MPR) features are soooo better than theirs.
Since we didn’t have these two neat features yet (they were added in SRS 15.0.1), we were forced to rely on our stored proc trick to bypass the optimizer as much as we could. Using this trick we got things down to 30 minutes…..barely. Remember, now, SRS only replicates committed transactions – so one thing we were fighting against was the fact that we had to wait for the entire transaction to get inside SRS before SRS would even start to apply it. Now, some of you are saying…. “but wait a second…in a WS, because the transaction is larger than dsi_large_xact_size, SRS will start applying ahead of the commit and just roll it back if it has to”. And, you would be so right. We did exploit the large transaction handling at the replicate, but, you see, we were not using WS. We were using MSA. Why? Two very big reasons. First, the standby system was a tad bit of a distance away…but even that was workable. The second and most pressing reason was the worktables.
Like any real application, Murex™ has its share of temporary work tables. Now, you could put them in tempdb, but then if a system crash happened, you have no option but to restart whatever process it was from the beginning. With a semi-persistent worktable, you simply have to see where you left off and restart from that point. Otherwise, you end up with the situation that some batch processes find themselves in where 7 hours into the 9 hour nightly run, there is a failure….and they have to restart and the system isn’t ready by market open the next morning. Bad, bad, bad. Ideally, any high availability solution isn’t so much as 5 9’s on the individual DBMS server, but rather the ability to take failures without the business even being aware of it…..and repeating batch jobs is not conducive to doing that.
However, these nasty batch jobs were clogging the inbound queue with billions and billions of rows that simply were not needed. Unfortunately, because Murex™ creates new tables at the whim of the end-user, you need to mark the entire database for replication using sp_reptostandby. At the time, this replicated everything – all DDL, all system procedures, all table modifications, blah blah blah. Including the work tables. Using MSA gave us this neat little feature that we could specify in the database repdef not to replicate a list of tables. In our case….about 70 tables. This brought along feature #5 – the ability to exclude tables from the RepAgent via a new setting of sp_setreptable <tablename>, ‘NEVER’. Not sure if this was back ported to the ASE 12.5.x servers, but it was added to ASE 15.0.x and has been available ever since. Now the competition may point out that you can simply not put the tables you don’t want to replicate in the EXTRACT config file….but what they don’t point out is to not put that table in, you have to first enter the other 3000 tables…using vi. Yeah…no object definition language for them – just a crude text editor that makes alterations a bit of like finding a needle in the haystack.
At this point, things were going well….until we noticed we were getting flooded with a ton of updates at one specific point. We traced it down to one custom stored procedure that was written for FX trading based on that customers unique requirements. Simply put, the procedure had bad logic. As soon as I saw the procedure code and noticed what should have been a simple update within a cursor loop, I knew immediately that the developer was not familiar with relational theory and definitely not familiar with cursors. First, hearing that cursors should be used vs. a single SQL statement on a lot of rows (to avoid table lock escalation) and then second hearing that cursors should be opened ‘read-only’ for performance reasons, the developer did so – but then proceeded to use the primary keys in a subsequent update statement for every row. Hmmmm…..update where current of cursor would have been far better. Why??? Because the table really didn’t have a primary key. Instead, it turns out the two columns being used had an average of 500 rows per value pair. Why did the developer choose to use just those two columns? Because they were the only two in the only index (see my earlier discussion on lack of indexing). So for every update, they were re-updating 499 rows they had already updated once already and not just the current row. A simple update of 26,000 row table exploded into 13,000,000 updates.
Initially, the customer wanted to simply replicate the procedure instead of the 13 million updates. Yes, we had that option – the competition doesn’t. Nothing new here. However, the procedure – thanks to the cursor and overkill updates – ran for hours. One thing I do not replicate is long running stored procedures ….at least until Multi-Path Replication came out. The reason is simple. If it runs for 5 hours at the primary, it will likely run for 5 hours at the replicate. And while it is running, everything else simply waits inside SRS – thanks to striving to maintain transaction serialization. Thankfully, the updates were not in a single transaction, so we only had to contend with ~500 or so per transaction….which would have tablescanned without our nice little index on the M_IDENTITY column (now why the developer didn’t use that, not sure). Anyhow, what we did instead was to simply rewrite the procedure to only update each row one time vs. overwriting 499 times – result was 26,000 updates to replicate – at which point replicating the procedure was a more viable option. We ended up (I think…been too long to remember…memory – the first thing to go with age) simply not replicating the proc as the row updates were so few it didn’t make a difference.
For those of you keeping score:
1. DML activity statistics per source object (insert/update/delete/writetext/exec proc)
2. Ability to flexibly replicate the affected rows or the SQL command (SQLDML)
3. Ability to use fully prepared statements to minimize replicate optimization overhead
4. Ability to use bulk inserts within a replicated transaction based on configurable thresholds of contiguous inserts
5. Ability to exclude tables easily despite replicating the entire database otherwise
These, of course are just the new features added. If we include some of the other things we have mentioned, the list becomes:
6. Ability to replicate stored procedures
7. Flexibility to invoke procedures via database RPC or as language commands
8. Flexibility in replacing the default replicated operation with whatever SQL we feel like vs. a limited number of pre/post processing routines
9. Ability to customize the delivery without resorting to C code.
10. Early application of large transactions without voiding transaction serialization.
We don’t even need to get into the plethora of other problems they have (e.g. routing). The point is that while other data movement products may be able to replicate data, SRS has some clear distinct product features that differentiate it from the pack. And while GoldenGate™ may be the closest thing to SRS that Oracle has to offer, it is like saying Mars is the closest planet to Earth….it is still 50 million to 400 million (orbital variances) kilometers away. I am just giving it the due respect that it should have compare to the previous solutions….and the others out there.
…and if we had to do that customer over again….sure, no problem…..30 minutes would have been 29 or more too many.