Marathon: long-running agent tasks (CLI)

Marathon is the Runtype CLI’s harness for long-running, multi-session agent tasks. It runs a saved agent across many sessions in your terminal with real-time streaming, local file and shell tools, automatic context compaction, checkpoints between sessions, and resumable state on disk.

Starting a marathon

$runtype marathon <agent-id-or-name> -g "Refactor the auth module and add tests"

Useful flags:

  • --max-sessions <n> and --max-cost <usd>: budget caps (defaults: 50 sessions)
  • --model <modelId>: override the agent’s configured model
  • --name <task>: task name used for the state files (defaults to the agent name)
  • --resume [message] / --fresh: continue from saved state, or start over
  • --sandbox <provider>: enable sandboxed code execution
  • --session-search / --no-session-search: session context indexing and the search_session_history local tool are enabled by default; pass --no-session-search to opt out

State persists under ~/.runtype/projects/<hash>/marathons/: a <task>.json snapshot, a <task>.tree.jsonl session tree (see below), a <task>.events/ directory of raw stream events, and a <task>/outputs/ artifact store for large offloaded tool outputs referenced by the context ledger. Keep the outputs directory with the .json and .tree.jsonl files when moving or backing up a marathon task.

Steering while the agent works

Press Enter at any time while the agent is streaming to open the steering composer:

  • Enter queues your message. The in-flight session wraps up at the next tool call and your message is delivered at the start of the next session, usually within seconds.
  • Tab toggles delivery timing between “next turn” and “after all work” (a follow-up that fires when the task would otherwise finish).
  • Esc closes the composer and keeps your draft.

A counter above the status bar shows how many messages are queued. Note that early in a run the agent may be in a planning phase where file writes are restricted to the plan; a steer that asks for immediate writes takes effect once the plan is updated.

To interrupt the agent entirely, press Esc twice (outside the composer) while it works. The in-flight session aborts immediately — cost and progress observed so far are preserved — and the marathon lands at a “stopped” checkpoint. Any queued steering messages and composer draft are restored into the checkpoint input, so you can edit them and press Enter to continue with new instructions, or press Enter on an empty input to exit. A stopped task resumes later with --resume.

Checkpoints

After each session (and at the end of the task), marathon pauses at a checkpoint. Press Enter to continue, type a message to steer the next session, or use a slash command:

CommandWhat it does
/modelChange the model for subsequent sessions
/toolsToggle local tools on or off
/sandbox <provider>Switch the sandbox provider
/plan, /statusShow the task plan and subtask completion
/revert <file>Restore a file from its checkpoint
/reflectOpen the reflection editor to reassess the approach
/treeBrowse session branches and switch between them
/forkBranch the conversation from an earlier message
/copy, /copy-trimmedCopy the session JSON to the clipboard
/openOpen the state file in your editor
/stopStop the marathon and save state
/helpShow all commands and shortcuts

Session branches: /tree and /fork

Marathon records the conversation as a tree in <task>.tree.jsonl, so steering in a new direction never loses the original timeline.

  • /fork lists the user messages on the current branch. Pick one, then type your new instruction: the conversation branches from that point and the next session runs with the truncated history plus your new steer. The original branch stays in the tree.
  • /tree shows the session tree with branches indented, the current head marked, and ledger metadata rows such as artifacts or compaction summaries labeled as metadata. Select a checkpoint row to switch the conversation head: the next session continues from that checkpoint with that branch’s history.

A typical flow: the agent went down the wrong path two sessions ago. At the next checkpoint, type /fork, pick the message where things were still on track, describe the better approach, and continue. If the new branch turns out worse, /tree switches you back.

Resuming

runtype marathon <agent> --name <task> --resume continues a saved task with its full history. Marathon shows a resume checkpoint first so you can steer, switch branches, or change settings before the next session starts.

Playbooks

A playbook replaces marathon’s default workflow (research, planning, execution) with milestones you define: their instructions, models, completion rules, and runtime guardrails. Pass one with --playbook:

$runtype marathon <agent> -g "Write release notes for the last sprint" --playbook release-notes

Marathon resolves the name as an exact file path first, then .runtype/marathons/playbooks/<name>.yaml|yml|json|ts|mts in the current project, then the same paths under ~/.runtype/.

Milestones

Each milestone is a phase the agent works through in order. A minimal playbook:

1name: release-notes
2policy:
3 allowedWriteGlobs: ['docs/releases/**']
4 requirePlanBeforeWrite: true
5milestones:
6 - name: research
7 model: claude-haiku-4-5
8 instructions: |
9 Survey the recent changes in this repo.
10 Then write a short outline to the plan file: {{planPath}}
11 completionCriteria:
12 type: evidence
13 minReadFiles: 2
14 transitionSummary: 'Outline done. Moving to {{nextPhase}}.'
15 - name: write
16 model: claude-sonnet-4-6
17 fallbackModels: [claude-opus-4-8]
18 instructions: Write the release notes under docs/releases/ following the outline.
19 recovery:
20 afterEmptySessions: 1
21 message: You stopped calling tools. Write a file under docs/releases/ now.
22 canAcceptCompletion: true

Milestone fields:

FieldWhat it does
instructionsPrompt for the milestone. Supports {{key}} interpolation from run state, for example {{planPath}}.
toolGuidanceExtra guidance lines added to the tools section of the prompt.
model, fallbackModelsModel for this milestone, plus the fallback chain used on errors and stall escalation.
completionCriteriaWhen the milestone auto-advances (see below).
recoveryMessage injected into the next session after afterEmptySessions consecutive sessions with no tool calls (default 2).
transitionSummaryMessage emitted when the milestone completes. Supports {{nextPhase}}.
canAcceptCompletionWhether the agent’s TASK_COMPLETE signal is accepted here. Set it on the final milestone, or the run can never finish normally (the loader warns about this).

Top-level fields beyond milestones: policy (below), stallPolicy (below), verification (require a passing check command before completion), rules (free-form standards applied to all milestones), and plugins (below).

Completion criteria

TypeAdvances when
evidenceThe agent has read at least minReadFiles files (default 1)
sessionsAt least minSessions sessions ran in this milestone (default 1)
planWrittenThe agent wrote its plan artifact
neverOnly the agent’s TASK_COMPLETE signal can finish (requires canAcceptCompletion: true)

The type can also be a hook reference (see below) for fully custom logic.

Policies

The policy block narrows what the agent can do at runtime. Policies only restrict: they never override global safety denies (for example, .env files and private keys stay blocked). The matching guidance is added to the agent’s prompt automatically, so the model is told the rules instead of discovering them through blocked calls.

FieldWhat it does
allowedReadGlobsIf set, file reads outside these globs are blocked
allowedWriteGlobsIf set, file writes outside these globs are blocked (the plan file is always writable)
blockedToolsTool names to block entirely
blockDiscoveryToolsBlock broad repo discovery tools
requirePlanBeforeWriteBlock product-file writes until the plan exists. Once the plan is written, writes unlock in the same session
requireVerificationRequire a passing verification command before TASK_COMPLETE
outputRootFor creation tasks: confine new files to this directory

Stall policy

A session counts as empty when it makes no tool calls, even if the model produced text. The stallPolicy block controls what happens as empty sessions accumulate:

ThresholdEffect when reached
nudgeAfterInject a corrective “you must call a tool” message into the next session
escalateModelAfterSwitch the milestone to its next fallbackModels entry and restart
stopAfterEnd the run as stalled (default 3)

Milestone recovery messages trigger on the same counter, so a model that narrates intent without acting still gets corrected.

Hook references

Any behavior slot accepts a registered hook name instead of inline content. Names under builtin: expose the default workflow’s behaviors, so a playbook can reuse slices of the default instead of rebuilding them:

1milestones:
2 - name: research
3 instructions: builtin:research-instructions
4 completionCriteria: { type: builtin:research-complete }
5 intercept: builtin:research-guard

Referencing an unknown hook, or wiring a hook into the wrong slot, fails when the playbook loads rather than mid-run.

Plugins

YAML playbooks load custom hooks from JavaScript modules listed under plugins. Paths are relative to the playbook file and must stay inside its directory:

1plugins:
2 - ./marathon-hooks.mjs
marathon-hooks.mjs
1export default function register({ registerWorkflowHook }) {
2 registerWorkflowHook('acme:strict-recovery', {
3 kind: 'recovery',
4 fn: () => 'Stop narrating. Your next response must contain a tool call.',
5 })
6}

Hooks register under your own namespace (acme: here) and are referenced from any slot, for example recovery: acme:strict-recovery. Plugins run with your user privileges when the playbook loads, the same trust level as the verification commands a marathon already runs.

TypeScript playbooks

Playbooks can be TypeScript modules (.ts or .mts). They load at runtime with no build step, and every behavior slot accepts a plain function, so custom logic needs no plugin or hook registration:

.runtype/marathons/playbooks/my-task.ts
1import { definePlaybook, type RunTaskStateSlice } from '@runtypelabs/sdk'
2
3export default definePlaybook({
4 name: 'my-task',
5 stallPolicy: { nudgeAfter: 1, stopAfter: 4 },
6 milestones: [
7 {
8 name: 'build',
9 instructions: (state: RunTaskStateSlice) => `Build it. Plan: ${state.planPath}`,
10 recovery: (state) =>
11 `You went ${state.consecutiveEmptySessions ?? 0} sessions without a tool call. Write a file now.`,
12 canAcceptCompletion: true,
13 },
14 ],
15})

definePlaybook comes from @runtypelabs/sdk (install it as a devDependency for editor types). It is optional: a plain object export with the same shape works without the package installed. To register named hooks instead, export a factory: export default ({ registerWorkflowHook }) => ({ ... }).