# Long-Running Research Agents

Configure an agent, start execution, wait for results. Infrastructure reliability is handled.

[← Back to use cases](/use-cases/)

## The Problem

AI agents can now work autonomously for hours or days. Research from [METR](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) shows task completion capabilities doubling every 7 months. Week-long autonomous tasks are expected within 2-4 years.

But long-running execution introduces infrastructure challenges:

- Host machines crash or restart
- Network connections drop
- External APIs hit rate limits or have outages
- Memory limits get exceeded
- LLM providers experience downtime

When a task runs for 6 hours and fails at hour 5, you lose all that work. Current agent frameworks assume reliable infrastructure that doesn't exist in practice.

## How Everruns Helps

Everruns uses a managed event loop composed from atoms for durable execution. Every step is persisted, so agents resume from where they left off after any failure.

1. **Configure** — Define your agent: model, system prompt, tools, constraints.
2. **Start** — Fire off the execution. Everruns handles the rest.
3. **Monitor** — Real-time streaming shows progress. Check back anytime.
4. **Survive failures** — Crashes, restarts, timeouts, API outages - execution continues from the last checkpoint.

## Technical Context

Current solutions focus on making agents smarter within sessions. Anthropic's [multi-session harness](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) addresses context window limitations with initializer and coding agents that maintain progress across sessions. OpenAI's Deep Research handles multi-step web research. These solve the intelligence and memory problems.

Infrastructure reliability is different. When the underlying compute fails, session-level solutions don't help. Durable execution guarantees require workflow orchestration at the infrastructure level.

## Use Case Examples

- **Literature review** — Agent searches, reads, and synthesizes papers over several hours
- **Competitive analysis** — Agent monitors and compiles data from multiple sources over days
- **Code migration** — Agent refactors a large codebase incrementally, surviving machine restarts
- **Data processing** — Agent processes large datasets with external API calls that may rate-limit or fail

## Further Reading

- [Measuring AI Ability to Complete Long Tasks](https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) — METR research on task duration trends
- [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) — Anthropic's multi-session approach
- [Introducing Deep Research](https://openai.com/index/introducing-deep-research/) — OpenAI's long-running research agent