Back to news

AI Tools

Browser-use + Vercel agent-browser: Browser automation compared.

Two approaches to giving AI agents web-browsing capabilities. We compare features, architectures, and ideal use cases.

AI Kick Start editorial image for Browser-use + Vercel agent-browser: Browser automation compared.

Decision

Pilot

Choose one repeated workflow with a visible owner and enough weekly volume to prove the saving.

Risk to watch

Faster mistakes

Keep a review queue and scoped credentials until the workflow has survived real production runs.

Proof to collect

Time baseline

Measure the manual run time, exception rate, approval time, and weekly hours returned.

TL;DR

TL;DR: Two approaches to giving AI agents web-browsing capabilities. We compare features, architectures, and ideal use cases.

Key takeaways

  • Briefing: Give an AI agent a job that lives on the web, book a flight, pull a report out of a portal, fill in a supplier form, and it hits the same wall a new employee does on day one: it needs a browser, and it needs to know how to drive one.
  • Browser-use: The Full-Browser Approach: [Browser-use](https://github.com/browser-use/browser-use) drives a **real Chromium browser** through Playwright.
  • Vercel Agent-Browser: The CLI Approach: Vercel's [agent-browser](https://github.com/vercel-labs/agent-browser) is a command-line tool built for AI agents to use, Claude Code, Codex, Cursor, and the like.
  • Feature Comparison: A caution on the numbers below.
  • When to Choose Which: **Choose Browser-use when**: You need full browser capability (JavaScript apps, complex interactions) Visual understanding of page layout is important You're self-hosting or using dedicated servers Session persistence and authentication matter You need file downloads and uploads You're building research or analysis tools **Choose Vercel Agent-Browser when**: You're running an AI coding agent that works from the terminal You want a Rust CLI a daemon keeps warm in the background You're comfortable in the Vercel ecosystem and may want the Sandbox option You want annotated screenshots for the model to reason over Local execution with the option of an isolated VM run suits your setup

Briefing

Give an AI agent a job that lives on the web, book a flight, pull a report out of a portal, fill in a supplier form, and it hits the same wall a new employee does on day one: it needs a browser, and it needs to know how to drive one. That single requirement has turned into a small arms race among open-source projects, and two of them have pulled ahead.

The first is Browser-use, which drives a full copy of Chromium and has built a large following on GitHub. The second is Vercel's agent-browser, backed by the company behind Next.js and the Vercel hosting platform. Both let an AI agent see and operate a web page. They go about it in very different ways.

For an Australian business team weighing one against the other, the practical question is simple: where does the browser run, and who has to babysit it? The answer shapes your costs, your scaling, and how much infrastructure your team ends up owning. Here's how the two compare, and one place where the marketing around them has run ahead of reality.

A note before we dig in: the original version of this comparison described agent-browser as a lightweight serverless tool that runs on Vercel's edge network. That framing turns out to be wrong. Per the project's own README, agent-browser is a native Rust command-line tool that launches a full local Chrome through a client-daemon setup, and it explicitly does not run natively on Vercel Edge Functions. We've corrected the relevant sections below rather than repeat the error.

Browser-use: The Full-Browser Approach

Browser-use drives a real Chromium browser through Playwright. It's built for accuracy and flexibility, and it runs on your own machine or a dedicated server.

Architecture

  • Browser: Full Chromium via Playwright with JavaScript rendering
  • Perception: DOM parsing + screenshot analysis for visual understanding
  • Planning: LLM-based action planning (click, type, scroll, wait)
  • Execution: Direct browser control with action confirmation
  • Environment: Local process or Docker container

Strengths

Visual Understanding: Browser-use reads the DOM and looks at screenshots, so it understands where things sit on the page, not just how the markup is structured. On modern web apps, where position on screen often carries meaning the HTML doesn't spell out, that helps.

Full Browser Capability: It runs a real browser, so JavaScript-heavy sites, single-page apps, and fiddly interactions work without workarounds.

Session Persistence: Cookies, local storage, and login state carry over between actions. Sign in once and the agent stays signed in.

File Downloads: It can download files, process them, and fold the results back into a workflow.

Extensibility: A plugin system lets you add custom actions and perception modules.

Ideal For

  • Complex multi-step web workflows
  • Data extraction from JavaScript-heavy sites
  • Applications requiring authentication persistence
  • Local or dedicated server deployments
  • Research and analysis tasks

Vercel Agent-Browser: The CLI Approach

Vercel's agent-browser is a command-line tool built for AI agents to use, Claude Code, Codex, Cursor, and the like. According to the project README, it's written mostly in Rust and runs a full local Chrome (Chrome for Testing) through a client-daemon architecture. It is not a lightweight headless browser running on an edge network; the README is explicit that it doesn't natively run on Vercel Edge Functions, because it needs a real browser.

Architecture

  • Language: Native Rust CLI (~86% Rust per the repo)
  • Browser: Full local Chrome / Chromium (Chrome for Testing)
  • Model: Client-daemon, a background daemon holds the browser, the CLI talks to it
  • Used by: AI coding agents that call it as a tool
  • Deployment: Local by default; can also run alongside Chrome in an ephemeral Vercel Sandbox microVM

Strengths

Agent-Native Design: It's built to be driven by an AI agent from the command line, which makes it a natural fit for coding agents that already work in a terminal.

Annotated Screenshots: It can capture screenshots, including annotated ones with numbered labels on elements (--annotate), which gives a multimodal model something concrete to reason about visually.

Multi-Tab and Downloads: It supports multiple tabs (agent-browser tab new) and file downloads to a chosen path (--download-path).

Sandbox Option: Per the README, you can run agent-browser plus Chrome inside an ephemeral Vercel Sandbox microVM via @vercel/sandbox, a VM pattern, not an edge-function one.

Optional AI Chat: It can optionally route AI chat through the Vercel AI Gateway, a separate service. Worth flagging: it is not built on the Vercel AI SDK, despite earlier claims to that effect.

Ideal For

  • AI coding agents that operate from a terminal
  • Teams already comfortable in the Vercel ecosystem
  • Local workflows, and sandboxed VM runs when you need isolation
  • Cases where annotated visual reasoning helps the model

Feature Comparison

A caution on the numbers below. The star counts are taken from the original article and look outdated against the live GitHub pages as of June 2026: Browser-use sits closer to ~99.5k than the 86,000 listed, and agent-browser closer to ~36.4k than 27,000. Treat the figures as rough scale, not precise tallies. We've also corrected several rows in the agent-browser column that the original got wrong.

FeatureBrowser-useVercel agent-browser
GitHub Stars (as stated; outdated)86,000 (~99.5k live)27,000 (~36.4k live)
Language~98% Python~86% Rust
Browser TypeFull ChromiumFull local Chrome (Chrome for Testing)
JavaScript RenderingFullFull (real browser)
Visual UnderstandingYes (DOM + screenshots)Yes (annotated screenshots)
HostingSelf-hosted / DockerLocal CLI / Sandbox microVM
AuthenticationSession persistenceSession via real browser
File DownloadsYesYes (--download-path)
Vercel IntegrationVia MCPOptional Vercel AI Gateway
Local DeploymentYesYes (by design)
Multi-tab SupportYesYes (tab new)
Custom ActionsPlugin systemCLI commands

When to Choose Which

Choose Browser-use when:

  • You need full browser capability (JavaScript apps, complex interactions)
  • Visual understanding of page layout is important
  • You're self-hosting or using dedicated servers
  • Session persistence and authentication matter
  • You need file downloads and uploads
  • You're building research or analysis tools

Choose Vercel Agent-Browser when:

  • You're running an AI coding agent that works from the terminal
  • You want a Rust CLI a daemon keeps warm in the background
  • You're comfortable in the Vercel ecosystem and may want the Sandbox option
  • You want annotated screenshots for the model to reason over
  • Local execution with the option of an isolated VM run suits your setup

Hybrid Approaches

Some teams reach for both, agent-browser as a tool their coding agent calls in the terminal, and Browser-use for longer, scripted Python workflows. Since both speak to AI agents, you're not locked into one.

The choice isn't strictly either/or. The MCP standard keeps making it easier to swap browser tools or run more than one. As the space settles, expect interfaces that hide more of the plumbing underneath.

The Future

Both projects are moving fast. Browser-use has introduced a layered design, a Python API on top of a Rust core on top of the browser harness, with v0.13 shipping a beta agent powered by that Rust core (see the project repo). Agent-browser, for its part, keeps building out its CLI and sandbox options.

For anyone building agents, two strong choices beats one. Browser-use leans into a Python-first, full-browser workflow; agent-browser gives terminal-based coding agents a fast, Rust-built way to drive Chrome. Pick the one that matches where your agents already live.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick one repeated workflow with a clear owner and weekly volume.
  2. Automate the preparation step first, then keep human approval for important actions.
  3. Measure time saved, errors reduced, and response speed for four weeks.

Want help applying this? Explore our AI automation services.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Browser-use + Vercel agent-browser: Browser automation compared

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call