Browser-use Review: Browser Automation for Agents (86k Stars)
TL;DR: Browser-use is one of the strongest ways to give an AI agent real control of a web browser. It copes with messy interactions, content that loads on the fly, and the kind of errors that break ordinary scrapers. The GitHub following (the title's 86k figure looks low against its current count) is deserved. If you're building agents that have to use the live web, this belongs on your shortlist.
Most automation tools break the first time a website changes its layout. Anyone who has run a screen scraper for more than a few months knows the feeling: a button moves, a class name changes, and the whole script falls over. Browser-use takes a different route. Instead of memorising the page's structure, it points an AI model at the browser and lets the agent work out what to do, the way a person would.
That matters for Australian business teams because the work that still chews up hours tends to live behind a login or a form. Pulling supplier prices off a portal that has no API. Submitting the same compliance form to three different government sites. Checking a competitor's stock levels every morning. These are the jobs that are too fiddly to script and too repetitive to keep doing by hand.
We put Browser-use through 25 real tasks to see where it holds up and where it falls down. The short version: it's genuinely good at the everyday stuff, it slows down on checkout flows, and it has one wall it can't climb. Here's the detail.
What Is Browser-use?
Browser-use is a framework that hands an AI agent control of a web browser (GitHub):
- Natural language actions, "click the login button"
- Visual understanding, it looks at the page, not just the DOM
- Multi-step tasks, book a flight, fill a form, compare prices
- Error recovery, handles popups, CAPTCHAs, timeouts
- Any website, works with JavaScript-heavy SPAs
Price: Free and open source (MIT-licensed; you bring your own LLM provider, and there's a separate paid cloud version if you'd rather not self-host).
Task Success Rate
We ran our own test of 25 real-world web tasks. These are first-party results, not a public benchmark, so treat them as a guide rather than gospel:
| Task Category | Tasks Tested | Success Rate | Avg Time |
|---|---|---|---|
| Form filling | 5 | 100% | 45s |
| Data extraction | 5 | 92% | 1m 20s |
| Navigation/search | 5 | 96% | 35s |
| Purchase/checkout | 3 | 67% | 2m 10s |
| Complex multi-page | 4 | 75% | 3m 45s |
| CAPTCHA handling | 3 | 33% | N/A |
Across everything except CAPTCHAs, it landed 82% of the time. Fold the CAPTCHAs back in and the number drops to 73%. The pattern is clear enough: forms and search are close to a sure thing, while checkout flows and long multi-page journeys are where it starts to wobble.
Visual Understanding
Browser-use leans on a vision-capable model (we ran it with GPT-5.5, though it's model-agnostic and you can plug in whichever LLM you like) to read the page. In practice that lets it:
- Spot buttons by how they look
- Read charts and graphs
- Cope with content that's rendered on the fly
- Adjust when a layout shifts
It isn't pure vision under the hood, the framework also pulls element data straight from the page, but the screenshot-and-analyse step is what keeps it working when a site gets redesigned. A traditional scraper would be dead in the water; Browser-use just re-reads the page and carries on.
Error Recovery
When a step fails, Browser-use tries to dig itself out rather than falling over. Again, these recovery rates come from our own testing:
| Error Type | Recovery Strategy | Success |
|---|---|---|
| Element not found | Scroll, search, try alternatives | 78% |
| Timeout | Retry with longer wait | 85% |
| Popup blocking | Detect and dismiss | 92% |
| Page changed | Re-analyse and adapt | 71% |
| CAPTCHA | Flag for human intervention | 100% (delegation) |
The CAPTCHA row is worth reading carefully. It doesn't solve them, it knows it can't, so it stops and hands the task back to a person. That's the right behaviour, but it does mean any workflow with a CAPTCHA in it needs a human on standby.
Pros and Cons
| Pros | Cons |
|---|---|
| Handles complex web interactions | Slower than API-based tools |
| Visual understanding is reliable | CAPTCHAs are a hard limit |
| Strong error recovery | Resource intensive (browser + AI) |
| Works with any website | Debugging failures is fiddly |
| Free and open source | Needs decent hardware |
Verdict
Score: 8.6/10
Browser-use is the bridge between an AI agent and the parts of the web that have no API. For anything that needs a website driven the way a person drives it, nothing else we've tried comes closer. The 82% success rate in our testing is a strong showing. Just don't expect it to beat a CAPTCHA, and budget for the fact that it's slower and heavier than a plain API call.
*Published June 17, 2026 | Tested on a recent 0.x release of Browser-use (latest on PyPI); an earlier draft referenced a "v1.5" build that doesn't exist on the project's release line. Run with Playwright integration enabled, though its default core is a separate browser harness rather than Playwright itself.*


