In the A Deeper Look At Sybase(i.e. the non-mobile products), in this episode we’ll take a look at Sybase Replication Server.
Sybase Replication Server -or just RepServer for its friends- is one of the hidden gems of Sybase.
The reason you may not have heard of RepServer is that this is one of those pieces of infrastructural software you tend to forget about because it is always running just fine. Database Administrators may have many issues to deal with every day, but Sybase RepServer tends not to to be among the things they worry about.
What doesn’t help its name recognition is that RepServer does not perform a particularly exciting job in the eyes of most people. Sybase Replication Server is a ‘data movement’ product, which, well, moves data from one place to another. More specifically, it captures database transactions in one database and then applies these to another database. Even to many IT folks, that may sound as interesting as watching paint dry. Myself, I do actually find replication quite an exciting topic (but maybe that just says a lot about me).
Reasons for using replication
This sort of requirement was in fact the inspiration for RepServer in the first place: A Deeper Look At Sybase: History of ASE approached Sybase in the early 1990’s to see how their requirements could be met, and this led to the product known as Sybase Replication Server (I believe the first RepServer version appeared in 1992).
Because RepServer captures only data changes, there is no impact on the applications already running on that database. This means that RepServer can be used in pretty much any database system. Some common uses of RepServer are to facilitate migrations of databases or applications that require significant downtime; by setting up a replica with Repserver, this downtime can be almost fully eliminated.
Because of its application transparency, Replication may also be seen as a mechanism to integrate different applications, by synchronising the data changes.
An increasingly popular use case for replication is for enabling real-time Business Intelligence. Business systems are typically OLTP-oriented with primary business events in the form of database transactions. Analytics typically does not co-exist well with OLTP workload in the same database, so a separate analytics database is used for that purpose; A Deeper Look At Sybase: Sybase IQ, Sybase’s column-oriented database optimized for analytics, is a classic example.
Architecturally, the challenge is to move that transactional data into the analytics database — preferably in (near) real-time, since the days when it was acceptable to run analytics on last week’s data are long gone.
In this context, RepServer has traditionally been used by many customers to replicate transactional data from Sybase ASE (Sybase’s OLTP database, which, BTW,
) to a data mart or data warehouse powered by A Deeper Look At Sybase: Sybase IQ. The problem here has been that IQ is optimized for bulk operations and does not particularly excel at inserts/updates of individual rows, which is precisely the type of workload that RepServer sends to the replicate database. The common solution for this problem was to replicate (with RepServer) those primary transactions into an intermediate ASE database that accumulates the incoming transactions; periodically, (e.g. every N minutes) those changes were then bulk-loaded into IQ from this ASE staging database (the latter step does not involve RepServer).
This has worked very well but things have become decidely more elegant with the latest and greatest RepServer version 15.6: here, the accumulation of replicated transactions is done by RepServer itself, and RepServer also initiates the periodic bulk-uploading of those changes into IQ. This removes the need for a separate ASE staging database as well as for an external process to regularly import the data into IQ: these tasks are now all transparently handled by the RepServer process itself, making real-time Business Intelligence a lot easier to achieve.
(NB: should you wish to discuss this topic with a Sybase/SAP representative, ask about Replication Server’s Real-Time Loading option (RTL). Incidentally, this RTL option is also available for Oracle).
How RepServer works
RepServer replicates committed database transactions. In Sybase ASE, a built-in replication agent continuously monitors the database transaction log for newly committed transactions that have been marked for replication. When it finds one, the replication agent sends it to the RepServer process that runs somewhere outside ASE. RepServer will then forward the transaction to the designated replicate database and apply it.
It should be noted that that replicate database does not have to be Sybase ASE. In fact, neither does the primary. One of RepServer’s design strengths is that the replication agent sends the transaction to RepServer in a database-independent internal format. When a replicated transaction reaches the replicate database, the internal format is converted to the SQL dialect of that specific database. This way, you can replicate from, say, a primary DB2 database to a replicate Oracle database as well as to a replicate Microsoft SQL Server database — and vice versa, since biderectional replication is also supported.
Supported primary database types include DB2, Oracle and MS SQL Server, where the replication agent will be a separate process running externally to these databases. These same databases can also be the target of replication, as can be SQL Anywhere, Sybase IQ, and some others. Perhaps not suprisingly, for best overall performance and minimal latency, here at Sybase we recommend, well, Sybase ASE as the preferred primary database, and ASE or IQ as the replicate database.
RepServer uses a publish-subscribe model. On the primary side, the entity to be replicated is published, and this can be subscribed to by one or more replicate databases. Such an entity can be an individual database table, or an entire database. The latter is often used for ‘warm standby’ configurations where all changes in a database are replicated to a standby. In addition, a stored procedure can be marked for replication, causing the execution of that procedure to be replicated with its actual parameter values.
As you would expect, by default an insert on the primary side is replicated into exactly the same insert on the replicate side. Perhaps less expected is that RepServer allows you to map a primary event to any custom action on the replicate side. For example, that primary insert could be replicated as the invocation of a custom stored procedure on the replicate side, which receives the values specified in the original insert as its parameter values. Or a delete on the primary side can be converted into the same delete plus an insert into a history table on the replicate side. RepServer just gives you total freedom on what replicate actions can be defined — as long as it can be expressed in SQL.
RepServer performs asynchronous replication. This means that when the database transaction in the primary database commits, it will be picked up and applied to the replicate database shortly afterwards, decoupled from the primary transaction. Consequently, when using asynchronous replication, there is always somelatency: some time will pass before a primary transaction gets been applied to the replicate side The challenge is to keep this latency as short as possible. In a well-designed and well-tuned replication system, latencies of no more than single-digit seconds are typical, even when the replicate database is on a WAN.
An unwillingness to accept latency is sometimes used as an argument in favor of a synchronous replication approach. Here, the idea is that the primary transaction does not commit until the transaction has committed in the replicate database as well. The big downside is that the replicate site must always be available: if it is not, transactions cannot commit at the primary site. In practice, this makes the overall system more vulnerable, since there are now two points of failure. RepServer’s asynchronous replication offers an optimal compromise between latency and availability.
Another aspect worth discussing briefly is that a typical replication system operates in a networked environment. For example, let’s assume the primary database runs on an HP-UX server in New York; the RepServer runs on a Linux box in the same data center; one replicate database run on an AIX server in Houston, Texas; and another replicate database runs on Linux in London, UK.
RepServer is designed under the assumption that it is normal for network connections to temporarily fail, and then come back into service. Once a connection fails, RepServer will automatically keep retrying until the connection is back, and then proceed replication where it left off.
RepServer vs. storage replication
It is important to mention here that RepServer performs transactional replication, which captures database transactions and applies these to the replicate or standby database. This should be distinguished from storage-level replication solutions (like EMC’s SRDF) which replicate changes to the disk devices. While both are ‘replication’ and, from a high level, kinda do the same thing, it is important to realise that these are two fundamentally different approaches towards replication.
Storage replication can potentially be a bit simpler to set up and operate than transactional replication with RepServer, but there are two main implications you need to be aware of. First, with storage replication, the replicate side is essentially a cold standby: those replicated storage devices cannot be used for any purpose, including running a database server, until you decide to switch over to the replicate side. In contrast, with RepServer your replicate side is an actively running database server that you can use for other purposes while it is being replicated to. For example, it is not uncommon to see an ASE database replicated with RepServer to a standby database for high-availability purposes, and use that standby database to also run some reporting that would have been too much load to run on the primary database.
A second difference is that, should any device corruption occur on the primary side, this corruption will be literally replicated to the replicate storage as well. This may not be what you want, since now your standby database is corrupted too (the sort of thing that can happen thenis described here). With transactional replication by RepServer, this problem will not occur because the replicate database is updated through SQL statements so any device-level corruptions will not be duplicated.
After singing RepServer’s many praises you may wonder if there may be any downsides?
Well, frankly, there are: I think pretty much everyone agrees that RepServer is not the easiest product to configure and manage. While that can also be an attraction -personally, I like RepServer ‘cos it forces me to think– I think everyone also agrees that RepServer needs to be made easier to use. The good news is that this is already happening, with the past few RepServer releases introducing new features for ease of management. It is Sybase’s intention to follow through on this direction in future RepServer releases.
If you are like me, and you feel all fired up now about this great piece of software technology, you could check out the Replication Server Design Guide which discusses a number of different possible architectures for a replication system. Or just check out some RepServer whitepapers and success stories . Or just do as many people do: forget about Replication Server’s existence in the confidence that it sits there at the foundations of many IT systems, working away perfectly all the time — as long as you tell your boss about RepServer first.
Next episode: A Deeper Look At Sybase: PowerBuilder and PowerDesigner