I always state that if the SAP SRS (SAP Replication Server) is up and running with no connections down, then I am assured that my data matches on the Primary and Standby sides. However, due to a “resume connection ….. skip transaction” command or a mis-located process deleting or modifying records on the Standby side; the records on the Standby do not match the records on the Primary side. Once discovered we need to have a re-syncing strategy depending upon our appetite for outages on both the Primary and Standby sides. For example we might have a system that is 24 X 7 on both the Primary and Standby where we run our reports. Or perhaps the database itself is terabytes in size making a dump and load scenario somewhat time-consuming. We are even restricted by the under-lying replication configuration architecture. It’s up to you to judge the complexity verses time scenario. Lets walk through some scenarios and the ways we can re-sync our replicated data.
For some of these options I am introducing the autocorrection concept. Those of you who know the SAP SRS platform historically know that autocorrection was originally designed for table by table replication working with the Table Replication Definition declaration. In later versions of the SAP SRS product this autocorrection concept can be applied on the database or connection level. Check out the ‘alter connection …. set dsi_convert…’ syntax. For the sake of argument we can have any Inserts and Updates on the primary side being transposed on the standby side as a Delete followed by an Insert. (Note: In some cases even this autocorrection has its faults, but let keep this discussion very simple). I am presenting a limited number of options to open up a discussion on this topic and to give some of you further design insights on how to approach a replication architecture. There are more options available and check out my previous blog topic for the newer option of MSA re-syncing: Consultant Field Note – Re syncing methods for MSA-aware databases.
Option 1: Rebuild the MSA/warm-standby database pair connection using a database dump and load. This option certainly will solve any underlying replication issues that led to the databases being out of sync (except for errant Standby data modification connections). Simply put: a drop connection command is done followed by a create connection using a dump marker. The dump of the primary database and a load into the standby database and lastly a resume connection from the SRS to the Standby database is done. The Primary side does not need to be down during this operation and you can understand the Standby is non-operational until we reach a steady state where the load is completed and all replicated records contained in the outbound queue have been applied to the Standby database. This is good for environments where we can afford the Standby side to be out of commission for this time period. This depends upon the size of the database and the volume of transactions being replicated compared to the time the business can afford until the Standby side is operational.
Option 2: Re-sync the MSA/warm-standby database pair connection using rs_subcmp. Every installation of the SAP SRS binaries comes with the rs_subcmp client program. The rs_subcmp program operates on a table basis and issues select statements on the Primary ASE Server, gathers and resolves the out of sync records in the Standby tempdb and issues an I/U/D statement on the Standby database side for the table in question. There are lots of reverse-engineering scripts that create runnable rs_subcmp programs by reading the database system tables. Depending upon the failure we can read a subset of the data (select * from table where field inbetween (range) order by <primary key>) or we can consider all the rows in the table (select * from table order by <primary key>). This program is fast and efficient. Personally its my top ‘go-to’ strategy to try first. I would suspend the connection to the Standby side when I execute the command. Autocorrection is a must here. This is good for environments where we can isolate the missing records to a table or a small number of tables. Its also good for low volume replication instances.
Option 3: Stealth re-sync the MSA/warm-standby database pair connection using update statements. This option really takes advantage of the autocorrection we have set on the SRS to Standby connection. It only works with a certain scenario where we have a datetiime field to work with and we can afford to change the time of the datetime field by 1 second. The reason as to why we choose datetime as a likely candidate as we can easily change it back in the next transaction to the end result is the same record being on the Primary and Standby side. If the issues an update statement with setting the field equal to the original value; that record would not be replicated, we must effect a change no matter how small an increment. For example:
- Step 1. update primary record set datetime field = datetime + 1 second
- Result: update gets replicated and applied as a delete followed by an insert
- Step 2. update primary record set datetime field = original datetime
- Result: update gets replicated and applied as a delete followed by an insert of the original row.
The difficulty in using this option is finding the right environment. This might be used where we have an unlimited time for a re-sync and the select statements cannot be done on the Primary side. We would involve a cursor functionality that could walk down the Primary table row-by-row and effect the changes. Its good for low volume replication instances and the obvious fact is we need to do some prep work to have this be a viable strategy. As you can imagine, not very common but it certainly can be used in a controlled scenario.
Option 4. Stealth re-sync the MSA/warm-standby database pair connection using update statements with bit manipulation. This option really is a variation on option 3. Same concept but we have added a bit field to the table. This field is used just for the sole purpose of allowing an update to occur with in turn forces the record to be replicated from Primary to Standby. This is not a re-sync strategy but perhaps business-driven to choose certain records to be replicated at a certain point in time. Such an option would have been applied at the modeling stage of the replication environment architecture.
Option 1 and 2 will be the most common of your re-syncing strategies. Option 3 and 4 are special case strategies. Some might even say 3 and 4 are a bit of a design stretch.
There is one obvious architecture we need to mention that makes options 1 to 4 unnecessary: the SRS Data Assurance option. SAP Replication Server Data Assurance Option is licensed through SySAM license manager and is available on multiple platforms.
Replication Server Data Assurance Option compares row data and schema between two or more Adaptive Server® Enterprise databases, and reports discrepancies. Replication Server Data Assurance Option is a scalable, high-volume, and configurable data comparison product. It allows you to run comparison jobs even during replication thereby eliminating any down time.SAP Replication Server with the Data Assurance option:
- Compares rows and schemas
- Creates script for reconciliation
- Creates Data Manipulation Language (DML) commands for automatic reconciliation
- Checks job status and generates reports
I plan to discuss the Data Assurance option in a later blog.
This Field Notes Series is dedicated to observations from the field taken from personal consulting experiences. The examples used have been created for this blog and do not reflect any existing SAP Customer configuration. Some material featured here was copied from: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc01636.1571/doc/html/nro1290742490319.html