Ralph - Autonomous Development Loop for Claude Code
Open-source autonomous loop with smart end-of-task detection, rate limiting, and session management.
Open-source contribution
Brief
Claude Code is excellent at ad-hoc tasks: open file, fix bug, write component, close. But when you need an agent to run for hours on a single objective - refactor 50 files, mass-migrate a scraper, generate 109 articles - you hit four problems: the agent occasionally trips a rate limit, the agent occasionally claims "done" while it isn't, the agent occasionally loops, and I'd have to babysit it.
Ralph is the loop that babysits it for me. Open source, repo at github.com/ondrejknedla/ralph-claude-code. Bash watchdog + Node session manager + on-disk manifest. Result: I run ralph start --task=docs.md before bed and find the work done in the morning. I've used it in production on an 8-hour DokladBot batch (109 SEO articles), on the Krtek scraper migration, and on a mass type migration across a monorepo.
Architecture
ralph/
├── bin/ralph ← bash entrypoint, watchdog
├── src/
│ ├── manager.ts ← Node session manager, spawns claude-code
│ ├── detector.ts ← end-of-task heuristics
│ ├── rate-limit.ts ← exponential backoff, retry state
│ └── manifest.ts ← on-disk state JSON
└── runs/
└── <session-id>/
├── manifest.json ← session state
├── transcript.log ← live tail from claude-code
└── checkpoints/ ← progress snapshots
The bash watchdog is thin - it spawns the Node manager, listens for SIGINT, and logs. All the logic lives in Node, where it makes sense to handle JSON state, AbortControllers, and exponential backoff timers.
End-of-task detection
This is the hardest part and where Ralph earns its keep over while true; do claude; sleep 60; done. Claude occasionally says "Done. Task completed." mid-multi-step task because it just finished a subtask. If the watchdog believes it, it ends the session prematurely and the rest of the work doesn't happen.
Ralph uses a 3-step heuristic:
// src/detector.ts
const COMPLETION_PATTERNS = [
/\bdone\.?\s*$/im,
/\btask completed\.?\s*$/im,
/\ball (?:done|finished)\.?\s*$/im,
/\bnothing (?:more|left) to do\b/im,
];
export async function isReallyDone(state: SessionState): Promise<boolean> {
// 1. text pattern
const lastChunk = state.transcript.slice(-2000);
const matchesPattern = COMPLETION_PATTERNS.some((re) => re.test(lastChunk));
if (!matchesPattern) return false;
// 2. idle timeout - no new output for 90 seconds
const idleMs = Date.now() - state.lastOutputAt;
if (idleMs < 90_000) return false;
// 3. recursive self-check - ask the agent again
const verification = await sendToClaude({
sessionId: state.sessionId,
prompt: `Are you actually finished with the original task? Re-read your initial brief and answer YES or NO with one sentence why.`,
});
return /^yes\b/i.test(verification.trim());
}Step 3 is the deal-breaker. The LLM cheerfully says "done" mid-task, but if you explicitly force it to revisit the original brief and verify, it often replies "no, I still need to handle X". That extra round trip costs ~5 seconds and $0.01 - vs losing 6 hours of work because the loop bailed early.
Rate limit handling
The Anthropic API returns standardized rate-limit headers:
anthropic-ratelimit-requests-remaining: 0
anthropic-ratelimit-tokens-reset: 2026-01-25T22:43:11Z
retry-after: 47
Ralph reads them and instead of blind retry persists state and waits exactly:
// src/rate-limit.ts
export async function handleRateLimit(
err: AnthropicAPIError,
manifest: Manifest,
): Promise<void> {
const retryAfter = Number(err.headers['retry-after'] ?? 60);
const resetAt = new Date(err.headers['anthropic-ratelimit-tokens-reset']);
const waitMs = Math.max(retryAfter * 1000, resetAt.getTime() - Date.now());
// pure exponential backoff only if retry-after is missing
const backoffMs = retryAfter ? waitMs : Math.min(2 ** manifest.retryCount * 1000, 5 * 60_000);
manifest.retryCount += 1;
manifest.nextRetryAt = Date.now() + backoffMs;
manifest.lastError = err.message;
await writeManifest(manifest);
log.info(`rate-limited, sleeping ${(backoffMs / 1000).toFixed(0)}s (retry ${manifest.retryCount})`);
await sleep(backoffMs);
}Key: after 5 retries Ralph doesn't keep doubling, it stops and alerts. Better to wake me up at 03:00 than try four more times in a row.
Session continuity
The manifest is the single source of truth for session state. When Ralph crashes (host reboot, OOM kill, manual Ctrl+C), ralph resume <session-id> reads the manifest and continues:
{
"sessionId": "ralph-2026-01-25-dokladbot-batch",
"task": "Generate 109 SEO articles from articles-batch.csv",
"createdAt": "2026-01-25T22:00:00Z",
"status": "in-progress",
"currentStep": 47,
"totalSteps": 109,
"lastCheckpointAt": "2026-01-26T03:18:42Z",
"retryCount": 2,
"claudeSessionId": "abc123-resume-token",
"outputDir": "/home/george/dokladbot/articles/batch-2026-01-25"
}claudeSessionId is critical - Claude Code has a --resume <token> flag, so Ralph reattaches to the same session instead of starting a fresh one with empty context.
Checkpointing runs after each completed sub-task (article, file, migration step). On-disk JSON sync with O_DSYNC, so a reboot mid-write doesn't corrupt state.
Concrete use cases
| Use case | Duration | Steps | Result |
|---|---|---|---|
| DokladBot batch | 8h 12min | 109 articles | 107 published, 2 manual review |
| Krtek scraper migration | 3h 40min | 28 files refactored | 0 type errors |
| Monorepo type migration | 5h 20min | 51 files touched | 1 manual fix |
prace content generation | 6h 5min | 18 case studies + 12 blog posts | shipped |
The DokladBot batch is the best example. I started it at 22:00, went to bed. By 06:30 it was done. Overnight, Ralph weathered 1 rate-limit pause (~12 min) and 2 self-check verifications where Claude said "done" and Ralph replied "no, you have 62 articles left."
Open source
Repo: github.com/ondrejknedla/ralph-claude-code. MIT license, ~2,000 LoC, dependencies: node:^20, claude-code CLI. The README has a quickstart for the first 5 minutes plus concrete recipes (batch generation, monorepo refactor, scraper migration).
Pull requests welcome. Top wishlist: Slack/Discord notifications on end-of-task, multi-agent fan-out (Ralph spawning 5 parallel claude sessions), a web dashboard instead of bash logs.
Lessons
- End-of-task detection is the hardest problem. The naive version (pattern matching) has a 35 % false-positive rate on real-world tasks. The self-check round trip pulled FP down below 3 %.
- Deterministic checkpoints beat LLM-judged completion. Ralph never "knows" whether the task is fully done - it only has heuristics. But
currentStep / totalStepsin the manifest is deterministic and is the only thing that actually decides whether the loop continues or stops. - You have to listen to rate-limit headers. The first version used a flat 60s sleep after 429. Sometimes that wasn't enough (token reset 4 minutes away), sometimes it wasted time. Reading
retry-afterandanthropic-ratelimit-tokens-resetcut wasted wait time to zero. - Bash + Node combo. Bash is only 30 lines (process spawn, signal handling, log file). All real logic is in TypeScript. Two languages in one project sounds weird, but each does what it's good at - bash for shell-level lifecycle, TS for state and heuristics.
- Open source from day one. Ralph is public because issues from other people are the best tester. Someone reported a bug in the rate-limit handler I'd never hit on my own batches (Anthropic updated the header format and I missed it).