Long-Running Research Agents
Configure an agent, start execution, wait for results. Infrastructure reliability is handled.
Note: Everruns is under active development and not yet publicly available.
The Problem
AI agents can now work autonomously for hours or days. Research from METR shows task completion capabilities doubling every 7 months. Week-long autonomous tasks are expected within 2-4 years.
But long-running execution introduces infrastructure challenges:
- Host machines crash or restart
- Network connections drop
- External APIs hit rate limits or have outages
- Memory limits get exceeded
- LLM providers experience downtime
When a task runs for 6 hours and fails at hour 5, you lose all that work. Current agent frameworks assume reliable infrastructure that doesn’t exist in practice.
How Everruns Helps
Everruns uses a managed event loop composed from atoms for durable execution. Every step is persisted, so agents resume from where they left off after any failure.
- Configure — Define your agent: model, system prompt, tools, constraints.
- Start — Fire off the execution. Everruns handles the rest.
- Monitor — Real-time streaming shows progress. Check back anytime.
- Survive failures — Crashes, restarts, timeouts, API outages - execution continues from the last checkpoint.
Technical Context
Current solutions focus on making agents smarter within sessions. Anthropic’s multi-session harness addresses context window limitations with initializer and coding agents that maintain progress across sessions. OpenAI’s Deep Research handles multi-step web research. These solve the intelligence and memory problems.
Infrastructure reliability is different. When the underlying compute fails, session-level solutions don’t help. Durable execution guarantees require workflow orchestration at the infrastructure level.
Use Case Examples
- Literature review — Agent searches, reads, and synthesizes papers over several hours
- Competitive analysis — Agent monitors and compiles data from multiple sources over days
- Code migration — Agent refactors a large codebase incrementally, surviving machine restarts
- Data processing — Agent processes large datasets with external API calls that may rate-limit or fail
Further Reading
- Measuring AI Ability to Complete Long Tasks — METR research on task duration trends
- Effective Harnesses for Long-Running Agents — Anthropic’s multi-session approach
- Introducing Deep Research — OpenAI’s long-running research agent