6 Background AI Agents for Async Development

DEV Community

Jules scored 51.8% on SWE-bench Verified while Claude Code reached 80.8% on the same benchmark, per Morph's tracking, so async convenience still trades against raw resolve rate (verified 2026年06月07日).

Every agent here runs with your repository credentials inside a cloud VM, which makes the agent a non-human identity your security team has to govern.

About this article:

Researched and written: June 2026. Last fact-checked: 2026年06月07日.
Author hands-on experience with these tools: partial. We reviewed each tool's documentation and ran several of them against sample GitHub issues; benchmark numbers are quoted from vendor pages and public trackers, not re-run by us.
AI assistance: used for drafting, reviewed and edited by the named author.
Conflicts of interest: none. SSOJet (the publisher's sponsor) is not one of the agents ranked here.
All pricing and benchmark claims verified on 2026年06月07日 against vendor documentation and public leaderboards.

How Did We Evaluate the 8 Cloud Coding Agents?

We ranked these agents on six criteria, weighted toward the things that decide whether you can actually trust an agent to ship a PR unattended:

True async cloud execution (mandatory). The tool must run in a hosted environment and return a pull request, not just suggest code in your editor. This is the entry ticket.
PR reviewability. Whether the output is a clean PR with commit messages, a plan, and test runs you can read in 60 seconds, versus an opaque dump.
Pricing predictability. Flat or bundled pricing scores higher than per-ACU or per-token meters that make a long run a budget surprise.
Verifiable performance. A published SWE-bench or Terminal-Bench number, or an honest "no separable score," beats a marketing claim.
Model flexibility. Whether you are locked to one provider or can route tasks to different frontier models.
Governance surface. How the agent authenticates to your repo, secrets, and CI, since an autonomous PR-opener is a non-human identity. See AI agent security risks for the enterprise.

The evaluation was performed by GrackerAI Editorial in June 2026 and fact-checked on 2026年06月07日 (see the disclosure block above). Two caveats we will not hide: cloud-agent pricing is changing fast (GitHub, OpenAI, and Cursor all repriced in the first half of 2026), and several vendors quote benchmark numbers on different model versions, so a clean apples-to-apples score is rarely possible.

We considered but did not include:

Sweep: pivoted from autonomous GitHub PR generation to a JetBrains IDE assistant, so it no longer fits the async cloud-PR category.
Claude Code: primarily a local terminal agent; its cloud and background runs are a feature rather than a standalone hosted async-PR product. We cover it in our Claude Code alternatives piece.
Lovable: an app builder that generates whole projects from a prompt rather than an agent that opens PRs against an existing repo.

Which Are the 8 Best Cloud Coding Agents, Ranked?

1. Devin

The agent that defined the category, and still the closest thing to delegating a ticket to a junior engineer who works in the cloud.

Best for: teams that want to assign a whole task and review a finished PR, not babysit a session.

Starting price: Core at 20ドル/month plus pay-as-you-go at 2ドル.25 per ACU; Team at 500ドル/month with 250 ACUs at 2ドル.00 each (verified 2026年06月07日).

Key differentiator: a true fire-and-forget agent with its own cloud workspace, browser, and shell.

Devin from Cognition runs in a hosted environment, plans a task, writes and tests code, and opens a PR, per the Devin pricing page. The honest performance picture is mixed: Devin 2.0 posts around 45.8% on SWE-bench Verified in unassisted evaluation, well below the top terminal agents, and independent testing puts real-world success in the 15% to 30% range. In our own trial against a sample bug-fix issue, Devin's session log read like a real engineer narrating work:

> cloning repo into workspace... done
> plan: reproduce failing test, patch parser, add regression test
> running: pytest tests/test_parser.py -> 1 failed
> editing src/parser.py (+9 -2)
> running: pytest -> 12 passed; opening PR #441

The honest limitation: ACU billing makes cost unpredictable. One ACU is roughly 15 minutes of active work, so a task that loops on a hard bug can quietly burn several dollars before it gives up or succeeds.

2. Google Jules

The most GitHub-native async agent on the list, and the one whose output is always a clean, reviewable pull request.

When to choose Google Jules:

Your workflow is GitHub-first and you want every result as a PR against a branch in your own repo, with a plan, commit messages, and a plain-language summary.
You want to start free: the Standard tier gives 15 daily tasks and 3 concurrent tasks at no cost (verified 2026年06月07日).
You run batches of small, parallelizable changes and want to come back to a queue of PRs rather than sit in a session.

When to avoid Google Jules:

You need the highest raw resolve rate: Jules scored 51.8% on SWE-bench Verified versus Claude Code's 80.8%, per Morph's Jules breakdown.
You want IDE integration; Jules is GitHub-only and does not plug into your editor.

Best for: GitHub-native teams that want async PRs with a generous free tier.

Starting price: Free (Standard); Google AI Pro at 19ドル.99/month for about 75 daily tasks; Ultra at 124ドル.99/month for 300 daily tasks (verified 2026年06月07日).

Key differentiator: every task ends as a pull request you can review, with a plan you can edit before execution.

Jules clones your repo into a secure cloud VM, writes a plan, executes multi-file changes, and opens a PR, per the official Jules site. It now runs Gemini 3 Pro on paid tiers. The honest limitation is scope: it is built for batch async work, so it is excellent at queued tickets and weak as a real-time pair programmer.

3. GitHub Copilot Coding Agent

The path of least resistance if your team already lives in GitHub, since the agent works inside the platform you already pay for.

Best for: organizations standardized on GitHub that want an agent inside Issues, PRs, and Actions.

Starting price: Copilot Pro at 10ドル/month; Business at 19ドル/user/month; Pro+ at 39ドル/month (verified 2026年06月07日).

Key differentiator: it lives where your code review already happens, assigning issues and opening PRs natively.

The Copilot coding agent picks up an assigned issue, works in a GitHub Actions environment, and opens a draft PR for review, per the GitHub Copilot plans page. The big 2026 change is billing: per the GitHub blog, Copilot moves to usage-based billing with GitHub AI Credits on June 1, 2026, where 1 AI Credit equals 0ドル.01, code review carries a model multiplier of 13, and code review workflows consume GitHub Actions minutes. The honest limitation: the new metered model makes a heavy month harder to forecast than the old flat seat, and autonomous runs lean on your Actions budget. For teams weighing other options, see our GitHub Copilot alternatives roundup. SSOJet covered the agent-mode launch in its Copilot agent-mode report.

4. Cursor Cloud Agent

The cloud counterpart to the Cursor editor, best when your team already runs Cursor and wants long tasks off the local machine.

Best for: existing Cursor users who want to offload long agentic runs to the cloud.

Starting price: Pro at 20ドル/month (cloud agents included on Pro and above), with cloud runs billed separately (verified 2026年06月07日).

Key differentiator: the same models and context conventions as the Cursor IDE, running async in a separate cloud environment.

Cursor's background agent, now officially the Cloud Agent, runs agentic tasks asynchronously in a separate cloud environment rather than on your machine, per the Cursor pricing page. In our test, a 50-step refactor task ran to completion and produced a branch, and the cost matched Cursor's published estimate: roughly 0ドル.30 to 0ドル.60 at Claude Sonnet rates for a 50-step task, with complex tasks reaching 4ドル to 5ドル. Three things stood out:

Cloud Agents bill separately from subscription credits and require MAX mode, which adds a 20% surcharge on every run.
Pro includes 20ドル of model usage, Pro+ 70,ドル and Ultra 400,ドル so heavy cloud use pushes you up the tier ladder fast.
Because it shares your Cursor account, context handoff between local and cloud work is the smoothest of the IDE-linked agents.

The honest limitation: the surcharge plus separate metering means a few large cloud runs can cost more than a flat Devin Team seat, so price out your real task mix first.

5. OpenAI Codex Cloud

The strongest option for running many tasks in parallel, since Codex spins up isolated sandboxes and works on several projects at once.

Best for: developers who want to fan out background tasks across multiple repos in parallel.

Starting price: included with ChatGPT Plus at 20ドル/month; Pro at 200ドル/month; usage is token-metered (verified 2026年06月07日).

Key differentiator: native parallel background execution in OpenAI-managed cloud sandboxes.

Codex cloud is a cloud-based autonomous agent powered by GPT-5.5-family models that runs multi-step tasks in isolated sandboxes and can work in parallel, per the OpenAI Codex cloud docs. It creates PRs from its work and can run reviews when you tag it on a pull request. The pricing shift matters: on April 2, 2026, Codex moved to token-based pricing aligned with API usage, and OpenAI's rate card estimates 100ドル to 200ドル per developer per month for active use. The honest limitation is the same one that hits every token-metered agent: a chatty model on a large codebase burns tokens quickly, so parallelism is a double-edged sword for the bill. Our AI coding agents compared for 2026 guide puts Codex next to the local-first tools.

6. Factory Droid

The enterprise-leaning autonomous agent built around the idea of "droids" that own end-to-end engineering work.

Best for: organizations that want agent-native software development with dedicated compute.

Starting price: free BYOK tier; Pro at 20ドル/month with dedicated compute and frontier models; Enterprise custom (verified 2026年06月07日).

Key differentiator: an agent-native platform design rather than an assistant bolted onto an editor.

Factory's droids plan and execute engineering tasks and open PRs, with paid plans starting at 20ドル/month plus a prepaid credit system for extra usage (a 10ドル minimum), per the Factory pricing page. On performance, Factory's Droid running GPT-5.3-Codex posts 77.3% on Terminal-Bench 2.0, a strong number for a hosted agent, per Morph's SWE-bench Pro tracking. Factory raised a 150ドルM Series C in 2026, a sign of enterprise traction. The honest limitation: the credit-on-top-of-subscription model means real autonomous workloads will exceed the 20ドル base quickly, so treat the Pro price as a floor, not a ceiling.

7. Codegen

The agent that meets your team where the tickets already are: Slack, Linear, Jira, and GitHub.

Best for: teams that want to trigger an agent from a Linear ticket or a Slack message and get a PR back.

Starting price: usage-based with custom team pricing; the agent integrates via MCP across your tools (verified 2026年06月07日).

Key differentiator: deep workflow integration, so the agent reports progress and asks for clarification in the channels you already use.

Codegen describes itself as "the SWE that never sleeps," and its MCP support extends the agent to GitHub, Slack, Linear, Jira, and custom tools so it can update statuses, link PRs to issues, and ask for clarification in Slack, per the Codegen overview docs. The differentiator is workflow placement: instead of a separate console, you tag the agent in the tracker your team lives in. The honest limitation is that this convenience makes governance murkier, because the agent now acts across multiple SaaS tools on your behalf, which is exactly the non-human-identity sprawl that JWTs for AI agents and scoped tokens exist to contain.

8. Tembo

The orchestrator, not the agent: Tembo runs other coding agents in the background and coordinates pull requests across repos.

How does Tembo differ from the others on this list? It does not ship its own model. You tag @tembo in Slack, Linear, or GitHub, and it executes the task using your choice of Claude Code, Codex, Cursor, OpenCode, or Sourcegraph Amp, per Tembo's coding-agent roundup.

Best for: platform teams that need one task to open coordinated PRs across multiple repositories.

Starting price: usage-based; you bring the underlying agent and pay its costs (verified 2026年06月07日).

Key differentiator: multi-repo coordination, where a single task opens coordinated PRs across several repos at once.

Tembo's edge is the multi-repo case: when a shared contract changes or a dependency has to roll across service boundaries, one Tembo task can open coordinated PRs in every affected repo. The honest limitation is that Tembo is only as good as the agent you point it at, and routing across multiple agents and repos multiplies the credentials and secrets in play. For parallel-execution patterns more broadly, see our parallel sub-agent coding toolsguide.

How Do You Choose the Right Cloud Coding Agent?

Start from how you pay and how your team files work, then let the benchmark break ties.

If you want true fire-and-forget delegation and can stomach metered cost, choose Devin, and budget for ACU burn on hard tasks. If your workflow is GitHub-first and you want free, reviewable async PRs, start with Google Jules and its 15 free daily tasks, upgrading to Pro at 19ドル.99/month only when you hit the limit. If your team already pays for GitHub or Cursor or ChatGPT, the cheapest move is the agent you already own: GitHub Copilot coding agent at 10ドル/month, Cursor Cloud Agent on Pro, or Codex cloud on Plus.

If you need parallel background tasks across many repos, Codex cloud and Tembo are the two answers, with Codex best for raw parallelism and Tembo best for coordinated multi-repo PRs. For enterprise autonomous workloads with dedicated compute, Factory Droid is the strongest fit, and for teams that run on Linear and Slack, Codegen meets you there. For a model-first rather than tool-first view, our best AI models for coding comparison ranks the engines underneath all of these.

One security point worth stating plainly: every agent here opens PRs using credentials and runs inside a VM with access to your code, secrets, and CI. That makes each agent a non-human identity. The GitHub Actions compromise that exposed CI/CD secrets in over 23,000 repositories is the cautionary tale here: an automated identity with broad repo access is a high-value target. As more of your commits come from agents, the question shifts from "which agent codes best" to "how is single sign-on evolving for autonomous AI agents," which SSOJet covers in its piece on securing non-human identities in the age of agentic automation.

Frequently Asked Questions

What is the best cloud coding agent in 2026?

It depends on your priority. Devin is the most autonomous fire-and-forget option but bills per ACU at 2ドル.25 each, while Google Jules is the most GitHub-native and starts free with 15 daily tasks (verified 2026年06月07日). If you already pay for GitHub, Cursor, or ChatGPT, the cheapest strong option is the cloud agent bundled with the plan you already own.

Which cloud coding agent opens pull requests automatically?

Devin, Google Jules, GitHub Copilot coding agent, OpenAI Codex cloud, Factory Droid, Codegen, and Tembo all return work as pull requests. Google Jules and the GitHub Copilot coding agent are the most GitHub-native: every task ends as a reviewable PR against a branch in your own repo, with a plan and commit messages.

How much do cloud coding agents cost?

Pricing ranges from free to enterprise custom. Google Jules starts free (15 daily tasks) and Copilot Pro is 10ドル/month, while Devin meters at 2ドル.25 per ACU on top of a 20ドル base and OpenAI's rate card estimates 100ドル to 200ドル per developer per month for active Codex use (verified 2026年06月07日). Usage-based agents like Devin and Codex are harder to forecast than flat or bundled plans.

Are async cloud agents as good as a human engineer?

Not yet on hard tasks. Devin 2.0 resolves about 45.8% of SWE-bench Verified issues unassisted and Google Jules scored 51.8%, while the strongest agents reach the 80s, per Morph's tracking (verified 2026年06月07日). They are most reliable on well-scoped, testable tickets and least reliable on ambiguous, cross-cutting changes, so review every PR before merge.

Do cloud coding agents need access to my private repositories?

Yes. Every agent on this list clones your repository into a cloud VM and acts with credentials, which makes it a non-human identity with access to your code, secrets, and CI. Scope its permissions tightly, use short-lived scoped tokens, and audit its actions the same way you would any automated service account.

What is the difference between a cloud coding agent and an IDE assistant?

An IDE assistant suggests code inline as you type and runs on your machine, while a cloud coding agent runs async in a hosted environment, executes a whole task unattended, and returns a pull request. The cloud agent is for delegating work you come back to later; the IDE assistant is for real-time pairing.

Final Thoughts

The honest summary is that cloud coding agents have crossed from demo to daily driver for well-scoped tickets, but resolve rates in the 45% to 52% range for the async leaders mean a human still reviews every PR before merge. Pick the agent that fits your billing model and where your team files work, then watch the pricing pages, because GitHub, OpenAI, and Cursor all repriced in the first half of 2026.

If you're shipping these agents into a product and need to gate them and their non-human identities behind enterprise-grade authentication, start a 30-day free trial of SSOJet and go live in days.