Replication Server Performance Tuning: Part 2 — Counter Data Analysis
This blog is in continuation to our blog on Identifying Bottlenecks, and is the second in a series of three blogs in the Replication Server Performance Tuning series. |
Counter Data Analysis
After Identifying Bottlenecks, the next step in the Replication Server Performance Tuning process is to analyze the collected counter data to draw out an effective tuning plan. If counter data is too low, or too high, it implies that some settings or configurations in Replication Server require tuning.
During a normal run of Replication Server, counter data is collected in the form of monitoring data. With the help of this monitoring data, we can figure out what configurations or settings in Replication Server require tuning.
The primary function of counters is to identify the time distribution of components. For instance, counters tell you if a component is in a pending, busy, or wait state, if it is waiting upstream or downstream, or waiting for a memory resource. The eventual tuning plan is based around this discovery.
Analyzing Counter Data
An effective performance tuning plan is based on, and built around the time consumed or blocked by a bottleneck component. Some approaches to combat decreased performance are to increase parallelism, or the apply rate. You also need to be watchful of flow control. If the bottleneck component is frequently in a flow control state, check the flow control threshold, and increase it if it is set too low. For more information, see the Replication Server Administration Guide on SAP Help.
Different components have different counter data, and therefore, different configuration sets. From the performance tuning perspective, this implies that each component requires a specific tuning plan.
What is a component?
The replication path starts from the source (or primary) database, and ends in the target (or replicate) database. There are several internal Replication Server modules along the replication path. Each of these modules has a different set of functionalities, and take on different responsibilities to facilitate the movement of data from primary to replicate. These modules are considered components, and are referred to by different component names, such as Capture, Distributor, SQT, SQM, DSI, and so on.
Counters and Time Duration Settings
Replication Server categorizes counters according to the following time duration settings:
- Busy time: Improves overall process efficiency (parallelism, read from cache), thereby reducing busy time.
- Wait upstream time: Increases the flow control threshold, thereby reducing the overall flow control time. If this setting is high, you need to check the time distribution of the upstream component — whether it is busy, in a flow control state, or waiting for its own upstream.
- Flow control time (wait downstream): Within the replication path, components can either be upstream or downstream. Components stop processing new data when the downstream is slow, and when the data cached in memory reaches a flow control threshold. The duration of time in this flow control state is the flow control time.
- Memory control time: Increase the memory_limit or reduce the flow control threshold to avoid memory control, thereby reducing memory control time.
Using Collected Counter Data to Tune Replication Server
The primary function of counters is to identify the time distribution of components. For instance, counters tell you if a component is in a pending, busy, or wait state, if it is waiting upstream or downstream, or for a memory resource. The tuning plan is also based on this discovery.
EXEC Module
The following table describes important time counters in the Executor category:
Counter category | Counter | Description |
Busy time | RepAgentRecvTime | The amount of time, in milli-seconds, spent receiving network packets or language commands. |
RepAgentParseTime | The amount of time, in milli-seconds, spent parsing commands. | |
RepAgentNrmTime | The amount of time, in milli-seconds, spent normalizing commands. | |
RepAgentPackTime | The amount of time, in milli-seconds, spent packing commands. | |
PRSNRMParseTime | The amount of time, in milli-seconds, spent parsing commands by PRS threads. | |
PRSNRMNrmTime | The amount of time, in milli-seconds, spent normalizing commands by NRM threads. | |
PRSNRMPackTime | The amount of time, in milli-seconds, spent packing commands by Normalization (NRM) threads. | |
Memory control time (wait time) |
RAWaitMemTime | The amount of time, in milli-seconds, the RepAgent thread spent waiting for memory usage under the memory control threshold. |
Flow control time (wait downstream) |
RAWriteWaitsTime | The amount of time, in milli-seconds, the RepAgent spent waiting for the SQM Writer thread to drain the number of outstanding write requests to get the number of outstanding bytes to be written under the threshold. |
Capture Module
The following table lists important time counters in the Capture (CI) category:
Counter category |
Counter | Description |
Busy time | PrsTotalTime | The amount of time, in milli-seconds, a parser thread spent processing packages. |
Memory control time | CAPMemWaitTime | The amount of time, in milli-seconds, Capture spent waiting memory usage under the memory control threshold. |
Flow control time (wait downstream) | WriteWaitsTime | The amount of time, in milli-seconds, taken by the Message Delivery (MD) Module of the Distributor Thread (DIST) to wait for SQM writes. |
Wait upstream time | RecvTime | The amount of time, in milli-seconds, spent receiving packages from the Client Interface (CI) stream. |
SQT
The following table lists important time counters in the Stable Queue Transaction Manager (SQT) category:
Counter category |
Counter | Description |
Busy time | SQTAddCacheTime | The time taken by a Stable Queue Transaction Manager thread (or the thread running the SQT library functions) to add messages to the SQT cache, measure in milli-seconds. |
SQTDelCacheTime | The time taken by a SQT thread (or the thread running the SQT library functions) to delete messages from the SQT cache, measured in milli-seconds. | |
SQTParseTime | The amount of time, in milli-seconds, spent by SQT in parsing commands, measured in milli-seconds. | |
SQTXactHashSearchTime | The time taken by a SQT thread for searching transaction id in hash table, measured in milli-seconds. | |
SQTXactProfileTime | The time taken by a SQT thread to do profiling of transactions, measured in milli-seconds. | |
Wait upstream time | SQTReadSQMTime | The time taken by a SQT thread (or the thread running the SQT library functions) to read messages from SQM. It includes the wait for upstream time (if no data is available), and the time to read the data out, measured in milli-seconds. |
DIST
The following table lists important time counters in the Distributor (DIST) category:
Counter category | Counter | Description |
Wait upstream time | DISTReadTime (parallel_dist off) |
The amount of time taken by the Distributor Thread to read a command from SQT cache, measured in milli-seconds. |
DISTSQTTranWaitsTime (parallel_dist on) |
The amount of time taken by the poll task of the Distributor Thread (DIST) to wait for SQT transaction ready, measured in milli-seconds. | |
Flow control time (wait downstream) |
DISTMDWriteWaitsTime | The amount of time taken by the Message Delivery (MD) Module of the Distributor Thread (DIST) to wait for SQM writes, measured in milli-seconds. |
DSI/S
The following table lists important time counters in the DSI – Scheduler (DSI/S) category:
Counter category | Counter | Description |
Busy time | DSILoadCacheTime | Time spent by the DSI – Scheduler (DSI/S) in loading the SQT cache, measured in milli-seconds. |
DSIThrdCmmtMsgTime | Time spent in handling a Thread Commit message from its associated DSI/S threads, measured in milli-seconds. | |
DSIThrdSRlbkMsgTime | Time spent by the DSI/S in handling a Thread Single Rollback message from its associated DSI/S threads, measured in milli-seconds. | |
DSIThrdRlbkMsgTime | Time spent by the DSI/S in handling a ”Thread Rollback” message from its associated DSI/S threads, measured in milli-seconds. | |
Wait upstream time | DSISqmMsgQWait | Time spent by the DSI/S in handling a ”SQM notify Message Read Wait Time” message from its associated DSI/S threads, measured in milli-seconds. |
Wait downstream time | DSIDSIeMsgQWait | Time spent by the DSI/S in handling a ”DSIe Message Read Wait Time” message from its associated DSI/S threads, measured in milli-seconds. |
Memory control time | DSIWaitMemTime | Time spent by the DSI/S waiting for memory usage below the memory control threshold specified for DSI, measured in milli-seconds. |
Up next!
The third and final blog in the Replication Server Performance Tuning series is: Modifying Memory-Related Configurations & Settings based on Counter Data Analysis.