Marathon: long-running agent tasks (CLI)

Marathon is the Runtype CLI’s harness for long-running, multi-session agent tasks. It runs a saved agent across many sessions in your terminal with real-time streaming, local file and shell tools, automatic context compaction, checkpoints between sessions, and resumable state on disk.

Starting a marathon

$ runtype marathon <agent-id-or-name> -g "Refactor the auth module and add tests"

Useful flags:

--max-sessions <n> and --max-cost <usd>: budget caps (defaults: 50 sessions)
--model <modelId>: override the agent’s configured model
--max-tokens <n> and --temperature <n>: override the agent’s per-session output-token budget and sampling temperature (0-2). Applied to every phase; playbook milestones can override either per phase (see below).
--name <task>: task name used for the state files (defaults to the agent name)
--resume [message] / --fresh: continue from saved state, or start over
--sandbox <provider>: enable sandboxed code execution
--session-search / --no-session-search: session context indexing and the search_session_history local tool are enabled by default; pass --no-session-search to opt out

State persists under ~/.runtype/projects/<hash>/marathons/: a <task>.json snapshot, a <task>.tree.jsonl session tree (see below), a <task>.events/ directory of raw stream events, and a <task>/outputs/ artifact store for large offloaded tool outputs referenced by the context ledger. Keep the outputs directory with the .json and .tree.jsonl files when moving or backing up a marathon task.

Steering while the agent works

Press Enter at any time while the agent is streaming to open the steering composer:

Enter queues your message. The in-flight session wraps up at the next tool call and your message is delivered at the start of the next session, usually within seconds.
Tab toggles delivery timing between “next turn” and “after all work” (a follow-up that fires when the task would otherwise finish).
Esc closes the composer and keeps your draft.

A counter above the status bar shows how many messages are queued. Note that early in a run the agent may be in a planning phase where file writes are restricted to the plan; a steer that asks for immediate writes takes effect once the plan is updated.

To interrupt the agent entirely, press Esc twice (outside the composer) while it works. The in-flight session aborts immediately — cost and progress observed so far are preserved — and the marathon lands at a “stopped” checkpoint. Any queued steering messages and composer draft are restored into the checkpoint input, so you can edit them and press Enter to continue with new instructions, or press Enter on an empty input to exit. A stopped task resumes later with --resume.

Checkpoints

After each session (and at the end of the task), marathon pauses at a checkpoint. Press Enter to continue, type a message to steer the next session, or use a slash command:

Command	What it does
`/model`	Change the model for subsequent sessions
`/tools`	Toggle local tools on or off
`/sandbox <provider>`	Switch the sandbox provider
`/plan`, `/status`	Show the task plan and subtask completion
`/revert <file>`	Restore a file from its checkpoint
`/reflect`	Open the reflection editor to reassess the approach
`/tree`	Browse session branches and switch between them
`/fork`	Branch the conversation from an earlier message
`/copy`, `/copy-trimmed`	Copy the session JSON to the clipboard
`/open`	Open the state file in your editor
`/stop`	Stop the marathon and save state
`/help`	Show all commands and shortcuts

Session branches: /tree and /fork

Marathon records the conversation as a tree in <task>.tree.jsonl, so steering in a new direction never loses the original timeline.

/fork lists the user messages on the current branch. Pick one, then type your new instruction: the conversation branches from that point and the next session runs with the truncated history plus your new steer. The original branch stays in the tree.
/tree shows the session tree with branches indented, the current head marked, and ledger metadata rows such as artifacts or compaction summaries labeled as metadata. Select a checkpoint row to switch the conversation head: the next session continues from that checkpoint with that branch’s history.

A typical flow: the agent went down the wrong path two sessions ago. At the next checkpoint, type /fork, pick the message where things were still on track, describe the better approach, and continue. If the new branch turns out worse, /tree switches you back.

Resuming

runtype marathon <agent> --name <task> --resume continues a saved task with its full history. Marathon shows a resume checkpoint first so you can steer, switch branches, or change settings before the next session starts.

Playbooks

A playbook replaces marathon’s default workflow (research, planning, execution) with milestones you define: their instructions, models, completion rules, and runtime guardrails. Pass one with --playbook:

$ runtype marathon <agent> -g "Write release notes for the last sprint" --playbook release-notes

Marathon resolves the name as an exact file path first, then .runtype/marathons/playbooks/<name>.yaml|yml|json|ts|mts in the current project, then the same paths under ~/.runtype/.

Milestones

Each milestone is a phase the agent works through in order. A minimal playbook:

1 name: release-notes
2 policy:
3   allowedWriteGlobs: ['docs/releases/**']
4   requirePlanBeforeWrite: true
5 milestones:
6   - name: research
7     model: claude-haiku-4-5
8     instructions: |
9       Survey the recent changes in this repo.
10       Then write a short outline to the plan file: {{planPath}}
11     completionCriteria:
12       type: evidence
13       minReadFiles: 2
14     transitionSummary: 'Outline done. Moving to {{nextPhase}}.'
15   - name: write
16     model: claude-sonnet-4-6
17     fallbackModels: [claude-opus-4-8]
18     instructions: Write the release notes under docs/releases/ following the outline.
19     recovery:
20       afterEmptySessions: 1
21       message: You stopped calling tools. Write a file under docs/releases/ now.
22     canAcceptCompletion: true

Milestone fields:

Field	What it does
`instructions`	Prompt for the milestone. Supports `{{key}}` interpolation from run state, for example `{{planPath}}`.
`toolGuidance`	Extra guidance lines added to the tools section of the prompt.
`model`, `fallbackModels`	Model for this milestone, plus the fallback chain used on errors and stall escalation.
`maxTokens`, `temperature`	Output-token budget and sampling temperature (0-2) for this milestone’s model. Takes precedence over the `--max-tokens` / `--temperature` flags and the top-level defaults.
`fallbackOnEmpty`	Opt-in (default off). When `true`, the `fallbackModels` chain also fires when the model finishes successfully but returns no visible text (e.g. a “thinking” model that spends its budget reasoning), not just on errors. Needs `fallbackModels`.
`completionCriteria`	When the milestone auto-advances (see below).
`recovery`	Message injected into the next session after `afterEmptySessions` consecutive sessions with no tool calls (default 2).
`transitionSummary`	Message emitted when the milestone completes. Supports `{{nextPhase}}`.
`canAcceptCompletion`	Whether the agent’s `TASK_COMPLETE` signal is accepted here. Set it on the final milestone, or the run can never finish normally (the loader warns about this).

Top-level fields beyond milestones: policy (below), stallPolicy (below), verification (require a passing check command before completion), rules (free-form standards applied to all milestones), maxTokens / temperature (defaults applied to every milestone that doesn’t set its own — lower precedence than the --max-tokens / --temperature flags), fallbackOnEmpty (default empty-output recovery for every milestone that doesn’t set its own), and plugins (below).

Completion criteria

Type	Advances when
`evidence`	The agent has read at least `minReadFiles` files (default 1)
`sessions`	At least `minSessions` sessions ran in this milestone (default 1)
`planWritten`	The agent wrote its plan artifact
`never`	Only the agent’s `TASK_COMPLETE` signal can finish (requires `canAcceptCompletion: true`)

The type can also be a hook reference (see below) for fully custom logic.

Policies

The policy block narrows what the agent can do at runtime. Policies only restrict: they never override global safety denies (for example, .env files and private keys stay blocked). The matching guidance is added to the agent’s prompt automatically, so the model is told the rules instead of discovering them through blocked calls.

Field	What it does
`allowedReadGlobs`	If set, file reads outside these globs are blocked
`allowedWriteGlobs`	If set, file writes outside these globs are blocked (the plan file is always writable)
`blockedTools`	Tool names to block entirely
`blockDiscoveryTools`	Block broad repo discovery tools
`requirePlanBeforeWrite`	Block product-file writes until the plan exists. Once the plan is written, writes unlock in the same session
`requireVerification`	Require a passing verification command before `TASK_COMPLETE`
`outputRoot`	For creation tasks: confine new files to this directory

Stall policy

A session counts as empty when it makes no tool calls, even if the model produced text. The stallPolicy block controls what happens as empty sessions accumulate:

Threshold	Effect when reached
`nudgeAfter`	Inject a corrective “you must call a tool” message into the next session
`escalateModelAfter`	Switch the milestone to its next `fallbackModels` entry and restart
`stopAfter`	End the run as stalled (default 3)

Milestone recovery messages trigger on the same counter, so a model that narrates intent without acting still gets corrected.

Hook references

Any behavior slot accepts a registered hook name instead of inline content. Names under builtin: expose the default workflow’s behaviors, so a playbook can reuse slices of the default instead of rebuilding them:

1 milestones:
2   - name: research
3     instructions: builtin:research-instructions
4     completionCriteria: { type: builtin:research-complete }
5     intercept: builtin:research-guard

Referencing an unknown hook, or wiring a hook into the wrong slot, fails when the playbook loads rather than mid-run.

Plugins

YAML playbooks load custom hooks from JavaScript modules listed under plugins. Paths are relative to the playbook file and must stay inside its directory:

1 plugins:
2   - ./marathon-hooks.mjs

marathon-hooks.mjs

1 export default function register({ registerWorkflowHook }) {
2   registerWorkflowHook('acme:strict-recovery', {
3     kind: 'recovery',
4     fn: () => 'Stop narrating. Your next response must contain a tool call.',
5   })
6 }

Hooks register under your own namespace (acme: here) and are referenced from any slot, for example recovery: acme:strict-recovery. Plugins run with your user privileges when the playbook loads, the same trust level as the verification commands a marathon already runs.

TypeScript playbooks

Playbooks can be TypeScript modules (.ts or .mts). They load at runtime with no build step, and every behavior slot accepts a plain function, so custom logic needs no plugin or hook registration:

.runtype/marathons/playbooks/my-task.ts

1 import { definePlaybook, type RunTaskStateSlice } from '@runtypelabs/sdk'
2 
3 export default definePlaybook({
4   name: 'my-task',
5   stallPolicy: { nudgeAfter: 1, stopAfter: 4 },
6   milestones: [
7     {
8       name: 'build',
9       instructions: (state: RunTaskStateSlice) => `Build it. Plan: ${state.planPath}`,
10       recovery: (state) =>
11         `You went ${state.consecutiveEmptySessions ?? 0} sessions without a tool call. Write a file now.`,
12       canAcceptCompletion: true,
13     },
14   ],
15 })

definePlaybook comes from @runtypelabs/sdk (install it as a devDependency for editor types). It is optional: a plain object export with the same shape works without the package installed. To register named hooks instead, export a factory: export default ({ registerWorkflowHook }) => ({ ... }).

Next steps

What are Agents? — the Agents marathon runs across sessions
Creating and configuring agents — build the Agent you pass to runtype marathon
Agent tools — the tools an Agent uses while it works
Manage agents as code — version-control Agent definitions