Log Store Guidelines

Special considerations for using log stores.

General Guidelines

  • Locate log stores on a shared drive accessible to all the machines in the cluster.
  • Keep streams and windows that change at substantially different rates in different log stores. If a log store contains a large but nearly-static stream and a small but rapidly changing stream, each cleaning cycle must process large amounts of data from the static stream. Keeping streams separate optimizes cleaning cycles.
  • Put into a log store any window fed by stateless elements (streams and delta streams).
  • Put into a log store any window fed by more than one upstream source in the project data flow. This is necessary for recovery because the arrival order of rows is not preserved.
  • Put into a log store any window that cannot produce the same result before and after a disruptive event such as a server crash, based on data replayed during the recovery process.
  • Log stores use window names internally for identification. Start a new file for a log store when renaming a window it is attached to.
  • Variables and SPLASH data structures (dictionaries, vectors, and event caches) do not persist in log stores and thus cannot be recovered after a failure. Use these structures with log stores only when:
    • You can provide logic to reconstruct the structures on restart, or
    • Processing will not be affected if the structures are missing after a restart.

Guidelines for Guaranteed Delivery

All the general guidelines above apply to log stores for windows with guaranteed delivery. In addition:
  • Because copies of events are kept in the same log store the window is assigned to, the log store for a guaranteed delivery window must be significantly larger than the log store for a similar window without guaranteed delivery. Ensure that the log store for every guaranteed delivery window is large enough to accommodate the required events. If the log store runs out of room, the project server shuts down.

  • Put into a log store any window on which GD is enabled and all input windows that feed GD windows. You can put windows located between the input and GD windows in a memory store if upon restart they can be reconstructed to exactly the same state they were in before the server went down. If an intermediate window cannot be reconstructed to its previous state, put it in a log store.
    • If consistent recovery is not enabled, put the GD windows and all their feeder windows into the same log store. Note, however, that placing many windows in the same log store adversely affects performance.
    • If consistent recovery is enabled, you can employ as many log stores for your GD and feeder windows as necessary.