Understanding Project Recovery, Part 1

aaron_patkau · ‎06-16-2017

To help you better prepare for unexpected failures in streaming projects, this blog is the first in a series about recovery. Recovery happens in two main processes: restoring and replaying data. In this blog, we’ll go over the basics of data recovery, and the elements involved in restoring data. We’ll start by covering concepts such as memory stores, log stores, and checkpointed data. Then, we’ll go over how restoring works, as well as provide some best practices.

What are log stores and memory stores?

When working with projects in streaming, you can use two types of stores to retain data: memory stores and log stores:

Memory Stores	Log Stores
Hold all data in memory, but data is lost when the project stops	Hold all data in memory, and log all data to the disk
Are accessed more quickly	Are accessed more slowly
Are assigned to windows by default	Can be assigned to windows, but not to streams because streams are stateless Exception: guaranteed delivery (GD) output streams can be assigned to a log store, which holds all output events until subscribers confirm receipt of them

What is checkpointed data?

Data that has been committed to a log store is considered checkpointed data. You can configure checkpointing options for your projects such as consistent recovery and auto checkpoint. Consistent recovery is recommended for projects with multiple log stores, as it coordinates all the log stores to checkpoint in sync. This prevents the possibility of windows being restored to different points in time. Auto checkpoint is ideal for projects with windows that don’t change very often because it lets you customize the checkpoint interval. You can lower the number of input transactions that trigger a checkpoint, for example. These options help ensure that you save as much data as possible in the event of a restart.

When a project is restarted, windows with log stores are restored to their previous state using the data preserved in the store.

Using multiple log stores

When using multiple log stores, make sure to only assign log stores to source windows, or assign all windows in the same stream path to the same store. This will ensure you don’t create a log store loop. In the image below, for example, Window1 in Logstore1 feeds Window2 in Logstore2, which feeds Window3 in Logstore1. This creates a loop, where the same data is being used to restore at two points in a data path. Windows that occur consecutively down a stream may share a log store (such as Window1 and Window3 here), but once a new log store is used on a path you cannot reuse a log store from earlier in the path. To correct the example below, Window2 should be assigned to Logstore1, or Window3 should be assigned to Logstore2.

To learn about recovery of windows not assigned to log stores, uncheckpointed data, and CCLScript data structures, see Understanding Project Recovery, Part 2.