Direct load materialization for HANA in SRS 15.7.1 SP100
SRS has long been capable of both atomic and bulk materialization of data when initializing database replication. However, both are not without their issues. For example, consider what happens with atomic materialization at a highlevel:
1) SRS puts a marker in the primary log – data prior to this marker from the table being subscribed to can be skipped by SRS
2) SRS does a select from the primary database using either the rs_select or rs_select_with_holdlock functions (more on this in a minute)
3) SRS suspends the DSI
4) The data from step #2 is placed into a materialization queue
5) The materialization queue is inserted into the replicate database applying any denormalization functions or leveraging bulk methods as appropriate
6) When finished with the select, SRS puts a second marker in the log – data after this point is replicated as is – rows between the two markers are treated as if autocorrection (or this can be induced manually, of course)
7) SRS marks the subscription as valid once the marker is seen
….this is sort of the simplified “Cliff Notes” version of what happens, but fairly accurate. Now then, there is a problem with this approach. If the table being subscribed to is very large, the DSI could be suspended for quite some time. As a result, backlog can build up in the outbound queue. The poses two really nasty issues. First, if the backlog continues too long, the primary log could be impacted – which would affect the subscription validation. Ugh. To add insult to injury, if creating multiple subscriptions, you may have to wait for the queue to drain before doing the second. Obviously, this technique works best with smaller tables – but with dsi_bulk_copy, small still could easily be in the low millions or higher.
To avoid this, most DBA’s simply choose to use bulk materialization. In bulk materialization, the subscription is created without materialization and then the data is bulk exported from the primary and bulk inserted into the replicate. During the process, the DBA simply sets autocorrection (or now dsi_command_convert) on for the desired table’s repdef – and there is no need to stop the primary application. There is a bit of a gotcha, however, that few DBAs have experienced so many are unprepared for it. Due to the DSI suspension issue above, the DSI is still active in this case with no backlog…..yet. But what happens when a DML statement affecting the table being subscribed to is hit??? The answer is it depends on a number of factors such as table locking (allpages, datarows) and which utility. If we are using bcp or someother bulk load utility without a batchsize either we will block on a conflicting lock or we will hit a duplicate key error and the DSI will suspend anyhow. If we suspect a minimal of activity, we could simply zap that command and resume the DSI (after logging the transaction so we have a record of the data values involved)….but this would be impractical for any large system with a lot of activity. The other approach would be to use the -b option with bcp to commit on smaller batches (e.g. every 1000 rows). However, this might allow the DSI to apply a row that the bulk load hasn’t got to yet – a very real possibility when DML statements are likely affecting recent data and bulk load likely starts with historical data. This could cause the bulk load to fail. To prevent this, you would need to use -e table.err and -m 99999999999 to skip the errors during the bulk load and then use the pkeys from the error file to do any consistency checks via rs_subcmp or DataAssurance.
Of course, with a bit of ingenuity, the two methods could be combined to allow the ease of #1 with the speed of #2 – for example – you could modify rs_select to only retrieve data *after* a certain date, create a view of data up to that date – and then use bulk load to copy the data out of the view to move the bulk of the data and then create the subscription with atomic materialization to get the rest. Because the atomic subscription would only be materializing data *after* the date/key sequence or whatever, it should be fairly fast. A second alternative uses function strings and a shadow table to record the DSI applied changes and then suspending the DSI when the bulk load finishes, perform your own merge via insert/select, update w/ join, delete w/ join as appropriate and then remove the function strings and resume the DSI.
All of this is a bit unwieldy. A new approach is coming in RS 15.7.1 SP100 that I really like – it is called direct load. Initally, it will be supported from any source as long as HANA is the target – although nothing prevents it from being used with any database supporting an ExpressConnect or ASE….except of course that it hasn’t been certified with targets other than HANA. The direct load approach is similar to what many of us tried to do manually and found things a bit fun to manage:
1) SRS puts a marker in the primary log as before
2) SRS does a select from the primary and then uses parallel threads to apply it at the replicate using bulk commands
3) The normal DSI continues unsuspended
4) Any DML affecting the table for which the subscription is being created is shunted off to a “catch-up” queue
5) After the select is finished, SRS starts applying the changes from the catch-up queue
6) When the catch-up queue is nearly finished, SRS puts the second marker in the log
7) When SRS sees the second marker, it finishes the catchup queue and removes it – data now flows into the normal DSI
As with both the atomic and bulk materialization methods, perfectly zero down time is required. Recently, I had the experience of using this to replicate the 500+ tables from the SAP schema from ASE to HANA to support the HANA Analytics Framework. Needless to say, I was very impressed as all of the subscriptions materialized in just a few minutes – and some of the tables had quite a few rows in them.
The syntax looks like the following:
create subscription IDS1_2_HAN_ADR6_sub
for IDS1_SAPSR3_ADR6_rd
with replicate at HAN_DB.SAP_ECC
without holdlock direct_load user sapsa password Sybase1234
subscribe to truncate table
go
There are a few restrictions and gotcha’s as I found out:
1) When specifying the login name/password to the direct load command, it is best to use the login that owns the schema (e.g. SAPSR3).
2) If you can’t do #1, then things can get interesting if the table is marked with owner_off. SRS uses the repdef to find the specified table and if the owner is off and the owner is not the direct load login or dbo, it won’t find the table. As a result, one consideration would be to simply mark the table with owner_on and then specify the owner in the primary table clause of the repdef
3) Primary key changes are not allowed during subscription materialization
4) You need to be careful how many concurrent subscriptions you create – each subscription adds about 10 or so threads to the SRS requirement as well as a queue….depending on the number of threads/queues SRS is configured for, you might hit that limit rather quickly.
….now if we could only get this certified for ASE as a target, we would have a slam dunk completely zero downtime migration method from ORA, MSSQL to ASE for SAP applications – no whacky migmon, r3load nightmares necessary….and certainly no system cloning, no temporary freezes, no freezing of tables, no high impact from triggers, etc. Yeeehaw!!! It should work perfectly if migrating to HANA though….hmmmmm……