Understanding Project Recovery, Part 2
To help you prepare for unexpected project failures, here is the second blog of a two-part series about recovery. The previous blog post covered the basics: log stores, memory stores, checkpointed data, and how to use log stores to restore the state of your windows. Now it’s time to tackle how data is replayed through a project, and how memory store windows are recovered. This blog will also cover how CCLScript data structures are restored in the replay process.
The full recovery process happens in three steps:
- Windows with log stores are restored to their previous state using the checkpointed data in the store. This was covered in Understanding Project Recovery, Part 1.
- Checkpointed data is replayed through the project to restore windows with memory stores.
- Uncheckpointed data is replayed through the project to restore the project status.
Anything not stored in a log store, other than uncheckpointed data, is lost when the project stops. This includes things like pattern matching, which is held in memory.
Replaying Checkpointed Data
Checkpointed data replays through the project until it hits streams, which stops the flow. Any rows received by a memory store window will be inserts, not updates or deletes.
Memory store window recovery scenarios
Memory store windows fed by log store windows are recovered with the data that the log store windows replay.
Streams can’t recover windows, and they block the replay of checkpointed data. If a stream is the only source feeding a window, that window won’t be recovered by replayed data even if there is a log store window further upstream. If you need a stream-fed window to be recovered, consider assigning it a log store.
Windows that are fed by multiple windows – and that don’t have a log store – receive data from each source window one at a time, in a nondeterministic order.
Replaying Uncheckpointed Data
Once the checkpointed data replay is complete, uncheckpointed data replays through the project. In the last blog, I talked about checkpointed data, which is data committed to the log store. On the other hand, uncheckpointed data is data that has entered an input window with a log store, but wasn’t committed to the log store before shutdown. When this data gets replayed through the project, it’s treated as brand new, and doesn’t get stopped at streams, like checkpointed data does. It’s replayed in its original order, and retains its original opcode.
CCLScript Data Structure Recovery
CCLScript data structures can be recovered fully or partially, depending on the project. You should assign a log store to the input window that feeds your flex operator. Data structures that are defined in DECLARE blocks (variables, dictionaries, vectors, and eventCaches) are held in memory and are not directly recovered after a restart.
CCLScript Data Structure Scenarios
Here are some examples. In all of the following scenarios, there is an input window with a log store that feeds a flex operator. The window has a retention policy of one hour, and the Flex operator has data structures.
- If there is a dictionary in the Flex operator that holds data for longer than an hour, the dictionary will not be fully restored after the restart. It may be partially restored as it receives and processes the rows held in the window during the replay process.
- If there is a dictionary that holds a last value received for every key value, and values are received more frequently than once per hour for every key value, the dictionary will be fully rebuilt during recovery.
- If there is a 5 minute eventCache in the Flex operator that only holds records (not events), it will hold what data was replayed as part of the recovery process in the last 5 minutes.