Ralph - Autonomous Development Loop for Claude Code

Brief

Claude Code is excellent at ad-hoc tasks: open file, fix bug, write component, close. But when you need an agent to run for hours on a single objective - refactor 50 files, mass-migrate a scraper, generate 109 articles - you hit four problems: the agent occasionally trips a rate limit, the agent occasionally claims "done" while it isn't, the agent occasionally loops, and I'd have to babysit it.

Ralph is the loop that babysits it for me. Open source, repo at github.com/ondrejknedla/ralph-claude-code. Bash watchdog + Node session manager + on-disk manifest. Result: I run ralph start --task=docs.md before bed and find the work done in the morning. I've used it in production on an 8-hour DokladBot batch (109 SEO articles), on the Krtek scraper migration, and on a mass type migration across a monorepo.

Architecture

ralph/
├── bin/ralph                ← bash entrypoint, watchdog
├── src/
│   ├── manager.ts           ← Node session manager, spawns claude-code
│   ├── detector.ts          ← end-of-task heuristics
│   ├── rate-limit.ts        ← exponential backoff, retry state
│   └── manifest.ts          ← on-disk state JSON
└── runs/
    └── <session-id>/
        ├── manifest.json    ← session state
        ├── transcript.log   ← live tail from claude-code
        └── checkpoints/     ← progress snapshots

The bash watchdog is thin - it spawns the Node manager, listens for SIGINT, and logs. All the logic lives in Node, where it makes sense to handle JSON state, AbortControllers, and exponential backoff timers.

End-of-task detection

This is the hardest part and where Ralph earns its keep over while true; do claude; sleep 60; done. Claude occasionally says "Done. Task completed." mid-multi-step task because it just finished a subtask. If the watchdog believes it, it ends the session prematurely and the rest of the work doesn't happen.

Ralph uses a 3-step heuristic:

// src/detector.ts
const COMPLETION_PATTERNS = [
  /\bdone\.?\s*$/im,
  /\btask completed\.?\s*$/im,
  /\ball (?:done|finished)\.?\s*$/im,
  /\bnothing (?:more|left) to do\b/im,
];
 
export async function isReallyDone(state: SessionState): Promise<boolean> {
  // 1. text pattern
  const lastChunk = state.transcript.slice(-2000);
  const matchesPattern = COMPLETION_PATTERNS.some((re) => re.test(lastChunk));
  if (!matchesPattern) return false;
 
  // 2. idle timeout - no new output for 90 seconds
  const idleMs = Date.now() - state.lastOutputAt;
  if (idleMs < 90_000) return false;
 
  // 3. recursive self-check - ask the agent again
  const verification = await sendToClaude({
    sessionId: state.sessionId,
    prompt: `Are you actually finished with the original task? Re-read your initial brief and answer YES or NO with one sentence why.`,
  });
 
  return /^yes\b/i.test(verification.trim());
}

Step 3 is the deal-breaker. The LLM cheerfully says "done" mid-task, but if you explicitly force it to revisit the original brief and verify, it often replies "no, I still need to handle X". That extra round trip costs ~5 seconds and $0.01 - vs losing 6 hours of work because the loop bailed early.

Rate limit handling

The Anthropic API returns standardized rate-limit headers:

anthropic-ratelimit-requests-remaining: 0
anthropic-ratelimit-tokens-reset: 2026-01-25T22:43:11Z
retry-after: 47

Ralph reads them and instead of blind retry persists state and waits exactly:

// src/rate-limit.ts
export async function handleRateLimit(
  err: AnthropicAPIError,
  manifest: Manifest,
): Promise<void> {
  const retryAfter = Number(err.headers['retry-after'] ?? 60);
  const resetAt = new Date(err.headers['anthropic-ratelimit-tokens-reset']);
  const waitMs = Math.max(retryAfter * 1000, resetAt.getTime() - Date.now());
 
  // pure exponential backoff only if retry-after is missing
  const backoffMs = retryAfter ? waitMs : Math.min(2 ** manifest.retryCount * 1000, 5 * 60_000);
 
  manifest.retryCount += 1;
  manifest.nextRetryAt = Date.now() + backoffMs;
  manifest.lastError = err.message;
  await writeManifest(manifest);
 
  log.info(`rate-limited, sleeping ${(backoffMs / 1000).toFixed(0)}s (retry ${manifest.retryCount})`);
  await sleep(backoffMs);
}

Key: after 5 retries Ralph doesn't keep doubling, it stops and alerts. Better to wake me up at 03:00 than try four more times in a row.

Session continuity

The manifest is the single source of truth for session state. When Ralph crashes (host reboot, OOM kill, manual Ctrl+C), ralph resume <session-id> reads the manifest and continues:

{
  "sessionId": "ralph-2026-01-25-dokladbot-batch",
  "task": "Generate 109 SEO articles from articles-batch.csv",
  "createdAt": "2026-01-25T22:00:00Z",
  "status": "in-progress",
  "currentStep": 47,
  "totalSteps": 109,
  "lastCheckpointAt": "2026-01-26T03:18:42Z",
  "retryCount": 2,
  "claudeSessionId": "abc123-resume-token",
  "outputDir": "/home/george/dokladbot/articles/batch-2026-01-25"
}

claudeSessionId is critical - Claude Code has a --resume <token> flag, so Ralph reattaches to the same session instead of starting a fresh one with empty context.

Checkpointing runs after each completed sub-task (article, file, migration step). On-disk JSON sync with O_DSYNC, so a reboot mid-write doesn't corrupt state.

Concrete use cases

Use case	Duration	Steps	Result
DokladBot batch	8h 12min	109 articles	107 published, 2 manual review
Krtek scraper migration	3h 40min	28 files refactored	0 type errors
Monorepo type migration	5h 20min	51 files touched	1 manual fix
`prace` content generation	6h 5min	18 case studies + 12 blog posts	shipped

The DokladBot batch is the best example. I started it at 22:00, went to bed. By 06:30 it was done. Overnight, Ralph weathered 1 rate-limit pause (~12 min) and 2 self-check verifications where Claude said "done" and Ralph replied "no, you have 62 articles left."

Open source

Repo: github.com/ondrejknedla/ralph-claude-code. MIT license, ~2,000 LoC, dependencies: node:^20, claude-code CLI. The README has a quickstart for the first 5 minutes plus concrete recipes (batch generation, monorepo refactor, scraper migration).

Pull requests welcome. Top wishlist: Slack/Discord notifications on end-of-task, multi-agent fan-out (Ralph spawning 5 parallel claude sessions), a web dashboard instead of bash logs.

Lessons

End-of-task detection is the hardest problem. The naive version (pattern matching) has a 35 % false-positive rate on real-world tasks. The self-check round trip pulled FP down below 3 %.
Deterministic checkpoints beat LLM-judged completion. Ralph never "knows" whether the task is fully done - it only has heuristics. But currentStep / totalSteps in the manifest is deterministic and is the only thing that actually decides whether the loop continues or stops.
You have to listen to rate-limit headers. The first version used a flat 60s sleep after 429. Sometimes that wasn't enough (token reset 4 minutes away), sometimes it wasted time. Reading retry-after and anthropic-ratelimit-tokens-reset cut wasted wait time to zero.
Bash + Node combo. Bash is only 30 lines (process spawn, signal handling, log file). All real logic is in TypeScript. Two languages in one project sounds weird, but each does what it's good at - bash for shell-level lifecycle, TS for state and heuristics.
Open source from day one. Ralph is public because issues from other people are the best tester. Someone reported a bug in the rate-limit handler I'd never hit on my own batches (Anthropic updated the header format and I missed it).