AI Tools

Browser-use + Vercel agent-browser: Browser automation compared.

Two approaches to giving AI agents web-browsing capabilities. We compare features, architectures, and ideal use cases.

Daniel Fleuren2026-05-2710 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for Browser-use + Vercel agent-browser: Browser automation compared.

Decision

Pilot

Choose one repeated workflow with a visible owner and enough weekly volume to prove the saving.

Risk to watch

Faster mistakes

Keep a review queue and scoped credentials until the workflow has survived real production runs.

Proof to collect

Time baseline

Measure the manual run time, exception rate, approval time, and weekly hours returned.

TL;DR

TL;DR: Two approaches to giving AI agents web-browsing capabilities. We compare features, architectures, and ideal use cases.

Key takeaways

Briefing: Give an AI agent a job that lives on the web, book a flight, pull a report out of a portal, fill in a supplier form, and it hits the same wall a new employee does on day one: it needs a browser, and it needs to know how to drive one.
Browser-use: The Full-Browser Approach: [Browser-use](https://github.com/browser-use/browser-use) drives a **real Chromium browser** through Playwright.
Vercel Agent-Browser: The CLI Approach: Vercel's [agent-browser](https://github.com/vercel-labs/agent-browser) is a command-line tool built for AI agents to use, Claude Code, Codex, Cursor, and the like.
Feature Comparison: A caution on the numbers below.
When to Choose Which: **Choose Browser-use when**: You need full browser capability (JavaScript apps, complex interactions) Visual understanding of page layout is important You're self-hosting or using dedicated servers Session persistence and authentication matter You need file downloads and uploads You're building research or analysis tools **Choose Vercel Agent-Browser when**: You're running an AI coding agent that works from the terminal You want a Rust CLI a daemon keeps warm in the background You're comfortable in the Vercel ecosystem and may want the Sandbox option You want annotated screenshots for the model to reason over Local execution with the option of an isolated VM run suits your setup

Briefing

Give an AI agent a job that lives on the web, book a flight, pull a report out of a portal, fill in a supplier form, and it hits the same wall a new employee does on day one: it needs a browser, and it needs to know how to drive one. That single requirement has turned into a small arms race among open-source projects, and two of them have pulled ahead.

The first is Browser-use, which drives a full copy of Chromium and has built a large following on GitHub. The second is Vercel's agent-browser, backed by the company behind Next.js and the Vercel hosting platform. Both let an AI agent see and operate a web page. They go about it in very different ways.

For an Australian business team weighing one against the other, the practical question is simple: where does the browser run, and who has to babysit it? The answer shapes your costs, your scaling, and how much infrastructure your team ends up owning. Here's how the two compare, and one place where the marketing around them has run ahead of reality.

A note before we dig in: the original version of this comparison described agent-browser as a lightweight serverless tool that runs on Vercel's edge network. That framing turns out to be wrong. Per the project's own README, agent-browser is a native Rust command-line tool that launches a full local Chrome through a client-daemon setup, and it explicitly does not run natively on Vercel Edge Functions. We've corrected the relevant sections below rather than repeat the error.

Browser-use: The Full-Browser Approach

Browser-use drives a real Chromium browser through Playwright. It's built for accuracy and flexibility, and it runs on your own machine or a dedicated server.

Architecture

Browser: Full Chromium via Playwright with JavaScript rendering
Perception: DOM parsing + screenshot analysis for visual understanding
Planning: LLM-based action planning (click, type, scroll, wait)
Execution: Direct browser control with action confirmation
Environment: Local process or Docker container

Strengths

Visual Understanding: Browser-use reads the DOM and looks at screenshots, so it understands where things sit on the page, not just how the markup is structured. On modern web apps, where position on screen often carries meaning the HTML doesn't spell out, that helps.

Full Browser Capability: It runs a real browser, so JavaScript-heavy sites, single-page apps, and fiddly interactions work without workarounds.

Session Persistence: Cookies, local storage, and login state carry over between actions. Sign in once and the agent stays signed in.

File Downloads: It can download files, process them, and fold the results back into a workflow.

Extensibility: A plugin system lets you add custom actions and perception modules.

Ideal For

Complex multi-step web workflows
Data extraction from JavaScript-heavy sites
Applications requiring authentication persistence
Local or dedicated server deployments
Research and analysis tasks

Vercel Agent-Browser: The CLI Approach

Vercel's agent-browser is a command-line tool built for AI agents to use, Claude Code, Codex, Cursor, and the like. According to the project README, it's written mostly in Rust and runs a full local Chrome (Chrome for Testing) through a client-daemon architecture. It is not a lightweight headless browser running on an edge network; the README is explicit that it doesn't natively run on Vercel Edge Functions, because it needs a real browser.

Architecture

Language: Native Rust CLI (~86% Rust per the repo)
Browser: Full local Chrome / Chromium (Chrome for Testing)
Model: Client-daemon, a background daemon holds the browser, the CLI talks to it
Used by: AI coding agents that call it as a tool
Deployment: Local by default; can also run alongside Chrome in an ephemeral Vercel Sandbox microVM

Strengths

Agent-Native Design: It's built to be driven by an AI agent from the command line, which makes it a natural fit for coding agents that already work in a terminal.

Annotated Screenshots: It can capture screenshots, including annotated ones with numbered labels on elements (--annotate), which gives a multimodal model something concrete to reason about visually.

Multi-Tab and Downloads: It supports multiple tabs (agent-browser tab new) and file downloads to a chosen path (--download-path).

Sandbox Option: Per the README, you can run agent-browser plus Chrome inside an ephemeral Vercel Sandbox microVM via @vercel/sandbox, a VM pattern, not an edge-function one.

Optional AI Chat: It can optionally route AI chat through the Vercel AI Gateway, a separate service. Worth flagging: it is not built on the Vercel AI SDK, despite earlier claims to that effect.

Ideal For

AI coding agents that operate from a terminal
Teams already comfortable in the Vercel ecosystem
Local workflows, and sandboxed VM runs when you need isolation
Cases where annotated visual reasoning helps the model

Feature Comparison

A caution on the numbers below. The star counts are taken from the original article and look outdated against the live GitHub pages as of June 2026: Browser-use sits closer to ~99.5k than the 86,000 listed, and agent-browser closer to ~36.4k than 27,000. Treat the figures as rough scale, not precise tallies. We've also corrected several rows in the agent-browser column that the original got wrong.

Feature	Browser-use	Vercel agent-browser
GitHub Stars (as stated; outdated)	86,000 (~99.5k live)	27,000 (~36.4k live)
Language	~98% Python	~86% Rust
Browser Type	Full Chromium	Full local Chrome (Chrome for Testing)
JavaScript Rendering	Full	Full (real browser)
Visual Understanding	Yes (DOM + screenshots)	Yes (annotated screenshots)
Hosting	Self-hosted / Docker	Local CLI / Sandbox microVM
Authentication	Session persistence	Session via real browser
File Downloads	Yes	Yes (`--download-path`)
Vercel Integration	Via MCP	Optional Vercel AI Gateway
Local Deployment	Yes	Yes (by design)
Multi-tab Support	Yes	Yes (`tab new`)
Custom Actions	Plugin system	CLI commands

When to Choose Which

Choose Browser-use when:

You need full browser capability (JavaScript apps, complex interactions)
Visual understanding of page layout is important
You're self-hosting or using dedicated servers
Session persistence and authentication matter
You need file downloads and uploads
You're building research or analysis tools

Choose Vercel Agent-Browser when:

You're running an AI coding agent that works from the terminal
You want a Rust CLI a daemon keeps warm in the background
You're comfortable in the Vercel ecosystem and may want the Sandbox option
You want annotated screenshots for the model to reason over
Local execution with the option of an isolated VM run suits your setup

Hybrid Approaches

Some teams reach for both, agent-browser as a tool their coding agent calls in the terminal, and Browser-use for longer, scripted Python workflows. Since both speak to AI agents, you're not locked into one.

The choice isn't strictly either/or. The MCP standard keeps making it easier to swap browser tools or run more than one. As the space settles, expect interfaces that hide more of the plumbing underneath.

The Future

Both projects are moving fast. Browser-use has introduced a layered design, a Python API on top of a Rust core on top of the browser harness, with v0.13 shipping a beta agent powered by that Rust core (see the project repo). Agent-browser, for its part, keeps building out its CLI and sandbox options.

For anyone building agents, two strong choices beats one. Browser-use leans into a Python-first, full-browser workflow; agent-browser gives terminal-based coding agents a fast, Rust-built way to drive Chrome. Pick the one that matches where your agents already live.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

browser-use documentation

What to do next

Pick one repeated workflow with a clear owner and weekly volume.
Automate the preparation step first, then keep human approval for important actions.
Measure time saved, errors reduced, and response speed for four weeks.

Want help applying this? Explore our AI automation services.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Browser-use + Vercel agent-browser: Browser automation compared

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call

Browser-use + Vercel agent-browser: Browser automation compared.

Daniel Fleuren

Pilot

Faster mistakes

Time baseline

TL;DR

Key takeaways

Briefing

Browser-use: The Full-Browser Approach

Architecture

Strengths

Ideal For

Vercel Agent-Browser: The CLI Approach

Architecture

Strengths

Ideal For

Feature Comparison

When to Choose Which

Hybrid Approaches

The Future

Primary references to keep this briefing grounded

What to do next

Use the article as a decision prompt

Turn this into a practical roadmap.

Related articles

Browser-use: Let agents browse the web (86k stars)

LangGraph Review: Agent Orchestration from LangChain

n8n Review: Workflow Automation Meets AI