The run log
Every flow run has a run log: a single compressed checkpoint file holding everything needed to resume the run on a fresh worker. What is in it:- One entry per completed step, keyed by step name: input (secrets censored), output, status, duration, and error message for failed steps.
- Loop iterations and router branches are recorded with the same shape, nested under their parent step.
- Run-level tags.
- Once at the start of the run, before the first step executes.
- Every 15 seconds during execution, from a background loop that snapshots whatever has completed since the last write.
- Once on the final state (success, failure, or pause).
Replay and skip
Resume is not a special path. Every time a worker starts executing a run, it walks the flow graph from the trigger and at every step asks: is the output of this step already in the log?- If yes, and the step completed (
SUCCEEDEDorPAUSED), the engine returns the cached output and moves on. - If no, the engine executes the step, records its output, and continues.
What triggers a resume
Every kind of interruption resolves through the same replay path; only the trigger differs.- Worker crash or deployment. The queue reassigns the run to another worker, which loads the log and replays.
- Paused step. The piece creates a waitpoint. When the waitpoint fires, a resume job is enqueued and a worker replays the run.
- Retry from failed step. The same log is reused; the run is re-queued and a worker replays from the failure point.
- Normal progression within one worker. Same replay model, without leaving the process.