AI Tools

Browser-use: Let agents browse the web (86k stars).

Browser-use gives AI agents the ability to navigate websites, fill forms, and extract data, all through a natural language interface with 86,000 stars.

Daniel Fleuren2026-06-059 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for Browser-use: Let agents browse the web (86k stars).

Decision

Pilot

Choose one repeated workflow with a visible owner and enough weekly volume to prove the saving.

Risk to watch

Faster mistakes

Keep a review queue and scoped credentials until the workflow has survived real production runs.

Proof to collect

Time baseline

Measure the manual run time, exception rate, approval time, and weekly hours returned.

TL;DR

TL;DR: Browser-use gives AI agents the ability to navigate websites, fill forms, and extract data, all through a natural language interface with 86,000 stars.

Key takeaways

Briefing: The web holds most of what your business needs to know, yet AI agents still struggle to actually use it.
Natural Language Browser Control: The thing that sets browser-use apart is the interface.
How It Works: Browser-use drives a real Chromium browser and runs each page through a few stages.
Key Features: **Visual Understanding**: It pairs DOM parsing with screenshot analysis, so it reads page layout, not just structure ([browser-use/browser-use GitHub repository](https://github.com/browser-use/browser-use)).
By The Numbers: **~99,500 GitHub stars** as of June 2026, up from the ~86,000 snapshot in April 2026; among the leading browser-automation tools for agents ([browser-use/browser-use GitHub repository](https://github.com/browser-use/browser-use)) **Chromium under the hood**, originally driven via Playwright, now a Rust core and browser harness from v0.13 onward **Multi-modal perception**, DOM plus visual understanding **Active development**, frequent releases, including the v0.13 architecture change ([browser-use releases page](https://github.com/browser-use/browser-use/releases))

Briefing

The web holds most of what your business needs to know, yet AI agents still struggle to actually use it. Reading a model's answer is one thing; getting software to log in, click through a booking flow, and pull the right number off a page is another problem entirely.

That gap is what browser-use set out to close. It hands an agent a real browser and lets it work a website the way a person would: open the page, read it, click, type, move on. You tell it what you want in plain English and it figures out the steps.

The project has caught on. The article we're working from cited roughly 86,000 GitHub stars, a real snapshot from around April 2026; the live repo has since climbed to about 99,500 stars as of June 2026 (browser-use/browser-use GitHub repository). Either way, it sits among the most popular browser-automation tools built for AI agents. For Australian teams weighing whether agents can do real web work yet, it's worth understanding how this one operates.

Natural Language Browser Control

The thing that sets browser-use apart is the interface. You don't write Selenium-style scripts. You describe the job (browser-use/browser-use GitHub repository):

from browser_use import Agent

agent = Agent()
result = agent.run("Find the cheapest flight from London to Tokyo on Skyscanner for next week")

From there the agent handles the navigation, fills the form, picks the dates, and pulls the result on its own. It reads the page through a mix of DOM parsing and visual understanding, then decides what to click, where to scroll, and what to actually read.

How It Works

Browser-use drives a real Chromium browser and runs each page through a few stages. (Historically it leaned on Playwright for this; as of version 0.13 the project moved to a Rust core and browser harness, so the older "via Playwright" description only half holds now.)

Perception: The page gets turned into a structured representation. Interactive elements are identified, text is pulled out, and the layout is read.

Planning: Given the goal, the agent works out a sequence of actions, things like click, type, scroll, and wait, to move forward.

Action: The chosen action runs in the browser. Screenshots and DOM updates confirm whether it landed.

Reflection: The agent checks whether the action did what it expected and adjusts if it didn't.

Key Features

Visual Understanding: It pairs DOM parsing with screenshot analysis, so it reads page layout, not just structure (browser-use/browser-use GitHub repository).

Multi-tab Support: Agents can open tabs, switch between them, and close them as a workflow demands.

Authentication Handling: Login flows and session persistence are documented features. CAPTCHA solving is possible through external integration services rather than a guaranteed built-in.

Data Extraction: Structured extraction with schema validation, useful for pulling product listings, article content, or form data.

Error Recovery: When an action fails or a page changes unexpectedly, it retries with an adjusted approach.

By The Numbers

~99,500 GitHub stars as of June 2026, up from the ~86,000 snapshot in April 2026; among the leading browser-automation tools for agents (browser-use/browser-use GitHub repository)
Chromium under the hood, originally driven via Playwright, now a Rust core and browser harness from v0.13 onward
Multi-modal perception, DOM plus visual understanding
Active development, frequent releases, including the v0.13 architecture change (browser-use releases page)

Comparison with Vercel Agent Browser

Vercel's agent-browser takes a different tack. The article put its star count at 27,000; the actual repo (vercel-labs/agent-browser) shows around 36,400 as of June 2026. It's worth correcting the framing too: agent-browser isn't built specifically for Vercel's AI SDK. It's a standalone native Rust CLI for AI agents that you can run locally or on any server, with optional Vercel AI Gateway integration and support for serverless or ephemeral environments like Vercel Sandbox and AWS Lambda.

So the choice isn't local-versus-serverless so much as two general-purpose tools with different homes. Browser-use runs anywhere with a full browser environment and gives you a lot of control over multi-step tasks. Agent-browser is a lean CLI that slots neatly into Vercel's stack when that's where your deployments already live.

Use Cases

Data Collection: Scraping structured data from sites that change often or need interaction to reach.

Form Automation: Working through complex multi-page forms for applications, registrations, or orders.

Research: Systematic web research across several sources, with the results pulled together.

Testing: End-to-end testing of web apps written as plain-language test descriptions.

Monitoring: Watching sites for changes, price drops, or stock coming back.

The Future

The team has reportedly been working on sharper visual understanding, lower latency through browser pool management, and mobile web automation. These roadmap items aren't confirmed on the repo or official docs, so treat them as direction rather than commitments. The broader point holds regardless: as more agents need to reach the web, tools like browser-use matter more.

If you've got an agent that needs to use a website, browser-use is a sensible default, and the star count suggests plenty of other teams have reached the same conclusion. You can read the open-source docs to see how it fits your own setup.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

browser-use documentation

What to do next

Pick one repeated workflow with a clear owner and weekly volume.
Automate the preparation step first, then keep human approval for important actions.
Measure time saved, errors reduced, and response speed for four weeks.

Want help applying this? Explore our AI automation services.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Browser-use: Let agents browse the web (86k stars)

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call