This blog post is the first part of a two-blog series on SAP Replication Server Data Assurance (DA). Here, I’ll focus on partitions and boundary keys in Data Assurance, and provide an example of creating DA partitions using multiple SELECT statements. You’ll want to read up on key data assurance concepts, such as jobs, comparesets, comparison objects, connections, and connection profiles before you really get into partitions and boundary keys.
In the second part of this series, you’ll learn how to use partitions to compare large tables.
What are Partitions in Data Assurance?
A Data Assurance partition is essentially a subset of contiguous table rows ordered by the DA compareset key columns. The DA compareset key columns are typically mapped to the table primary key, or to uniquely indexed columns.
Note – Don’t confuse DA partitions with database partitions. DA neither depends on nor leverages database partitions.
When dividing a table into two or more partitions, DA attempts to create each partition of equal size. That is, each partition will contain approximately the same number of rows. A table is logically partitioned at query-time using a SELECT statement with a WHERE clause. Here, an ORDER BY clause is optional as DA can sort the rows using its external sort.
What’s a Boundary Key?
A partition is defined by 1 or 2 boundary keys. A boundary key is a compareset key (table row key), which identifies a position in a table where a partition might begin or end. DA stores a small set of boundary keys for each compareset for this purpose.
The boundary keys are used as the arguments to the WHERE clauses. The first and last partitions require a single boundary key (unbounded at the start and end of the table) and all other partitions require two boundary keys. The boundary keys are always taken from the source table. The boundary key values are sampled for a compareset when a comparison that uses the compareset is run for the first time.
These boundary samples are stored in the Data Assurance System Database (DASD). Each time a comparison is run, DA collects new boundary samples for its compareset, and updates the DASD with these samples. Alternatively, you can initialize or replace boundary keys explicitly using the CREATE BOUNDARY command.
Example (Creating Partitions using Multiple SELECT Statements)
Let’s assume that a table, t, has a key column, k, and three non-key columns, x, y and z. This table contains 100 rows and the key values are the integers from 1 to 100.
DA has a compareset for this table and has stored 7 boundary keys (recall, each compareset stores a small set of boundary keys).
[13, 26, 39, 51, 64, 76, 88]
This example shows how DA selects 3 (here, 26, 51, and 76) of the 7 boundary keys to divide the table into four partitions:
SELECT k, x, y, z FROM t WHERE k < 26 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 26 AND k < 51 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 51 AND k < 76 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 76 ORDER BY k
In the example above, each partition contains 25 rows.
Partitions are Not Static
The contents of a table aren’t static, so partitions can’t be static either. A partition exists for the duration of each query. Using the previous example, let’s assume the table is now modified as follows:
- Rows 11 to 20 are deleted (10 rows are deleted)
- Rows 31 to 35 are deleted (5 rows are deleted)
- Rows 61 and 67 are updated (0 rows are inserted/deleted)
- Rows 101 to 115 are inserted (15 rows are inserted)
When the four partition queries are executed again (remember, each partition previously contained 25 rows), the first partition will now contain 15 rows, the second partition will contain 20 rows, the third partition will still contain 25 rows, and the last partition will contain 40 rows.
Automatic Boundary Adjustment
As DA uses partitions of equal size, each time it reads keys from the four queries, it automatically adjusts these boundaries. After the second run, the new boundary keys are as follows:
[23, 41, 53, 66, 78, 91, 103]
When you run the JOB/COMPARISON commands for third time, the queries are run against the source and target databases, and the new adjusted boundary values are used:
SELECT k, x, y, z FROM t WHERE k < 41 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 41 AND k < 66 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 66 AND k < 91 ORDER BY k
SELECT k, x, y, z FROM t WHERE k >= 91 ORDER BY k
Now each partition will again contain 25 rows again until the table is modified further.