Ollama Review: Run Any Model Locally
TL;DR: Ollama is the simplest way to run language models on your own machine. It's free and open source. If you handle private code, client records, or anything that can't leave the building, it earns its place fast. Just don't expect it to match a cloud model on a laptop.
Most teams using AI today are sending their data somewhere else to get it. You type a prompt, it goes to a server in another country, an answer comes back. For a lot of work that's fine. For a law firm reviewing a contract, a clinic summarising patient notes, or a developer with a codebase under NDA, it's a problem nobody wants to think about.
Ollama is the tool that lets you stop thinking about it. It runs the model on your own computer, so the data never leaves. You install it, type one command, and a capable language model is answering questions on your hardware with nothing going out over the wire.
The catch is the one you'd expect. A model running on your laptop won't keep pace with the latest cloud system, and the bigger, sharper models want serious hardware. The honest question for an Australian business team isn't "is local as good as the cloud", it's "which of my jobs are sensitive enough that local is worth the trade." For more of them than you'd guess, the answer is yes.
A note before the spec tables below: this review leans on some model names and version numbers that didn't check out against the vendors' own documentation, so we've corrected or flagged those inline. The case for Ollama itself holds up.
What Is Ollama?
Ollama is a free, open-source tool for running large language models on your own hardware, released under the MIT licence. The easiest way to picture it is Docker for LLMs:
ollama run llama4:8bThat's the whole setup. No Python environment to build, no CUDA versions to wrangle, no dependency mess. Ollama downloads the model, sorts out the hardware acceleration, runs a local server on port 11434, and exposes an OpenAI-compatible API. You run a model with one command.
Price: Free (open source, MIT licence)
Model Library
Ollama hosts a large catalogue of models, well over 100, each installable with a single command. A few worth knowing about:
| Model | Size | Hardware Required | Performance |
|---|---|---|---|
| Llama 4 8B* | 4.9 GB | 8 GB RAM | Good for most tasks |
| Llama 4 70B* | 40 GB | 64 GB RAM / 2x GPU | Strong general quality |
| Mistral 3 7B* | 4.1 GB | 8 GB RAM | Fast, efficient |
| Qwen 3 72B* | 43 GB | 64 GB RAM | Strong coding |
| CodeLlama 70B | 40 GB | 64 GB RAM | Solid local code model |
| Gemma 3 27B | 16 GB | 32 GB RAM | Google's flagship open model |
A correction on the names in that table, because the model landscape moved faster than a lot of write-ups:
- There is no "Llama 4 8B" or dense "Llama 4 70B." Meta's Llama 4 family is Mixture-of-Experts: Scout (17B active / 109B total) and Maverick (17B active / 400B total), with Behemoth in preview. The 8B and 70B sizes belong to the older Llama 3 line. Whoever benchmarked an "8B" was almost certainly running Llama 3.
- "Qwen 3 72B" isn't a real model either. The Qwen3 lineup tops out at 32B for dense models, with MoE variants at 30B-A3B and 235B-A22B. The 72B was a Qwen2.5 model. The coding strength is real; the label is wrong.
- "Mistral 3 7B" is close but off. Mistral 3 ships dense models at 3B, 8B, and 14B. The famous 7B was the original Mistral 7B, a different generation. A small, fast Mistral on Ollama is real, just not that exact label.
- CodeLlama 70B is a genuine Meta model and runs fine on Ollama, but the "best local code model" crown has moved on. By 2026 most people reach for Qwen2.5-Coder (the 32B in particular) for local coding.
- Gemma 3 27B checks out. It's the flagship of the Gemma 3 generation, multimodal, with a 128K context window. Calling it Google's best open model of that generation is fair.
*Names marked with an asterisk above were inaccurate in the source figures and are corrected in this list.
Performance Benchmarks
The original review tested "Llama 4 8B" on a MacBook Pro M3 (36 GB RAM). Worth reading with the caveat from above in mind: these are self-reported, first-party numbers, and the model under test was almost certainly Llama 3 8B rather than anything from the Llama 4 herd. The GPT-5.5 baseline it's compared against is real (OpenAI shipped it in April 2026), but the figures themselves haven't been independently checked.
| Task | Tokens/Sec | Quality vs GPT-5.5 |
|---|---|---|
| Code completion | 34 t/s | 75% as good |
| Summarisation | 28 t/s | 80% as good |
| Translation | 31 t/s | 85% as good |
| Reasoning | 22 t/s | 70% as good |
| Creative writing | 25 t/s | 65% as good |
The shape of the numbers is the useful part, even if the labels aren't. A small local model gives up some speed and some smarts in exchange for keeping your data on your own machine. For a sensitive codebase, medical data, or legal documents, that's a trade most teams should take without much hand-wringing.
Privacy: The Real Selling Point
Use ChatGPT or Claude and your data travels to someone else's servers. With Ollama, nothing leaves your machine, the model runs on your hardware, fully offline if you want it.
That's why people reach for it on:
- Proprietary codebase analysis
- Medical record summarisation
- Legal document review
- Air-gapped environments
- Offline development (planes, remote sites)
For an Australian business sitting under the Privacy Act and client confidentiality obligations, "the data physically never left our office" is a sentence worth a lot.
Pros and Cons
| Pros | Cons |
|---|---|
| Completely free and open source | Needs decent hardware for the bigger models |
| Dead-simple setup | Slower than cloud APIs |
| Full privacy, data never leaves | Large models want expensive GPUs |
| 100+ models available | No built-in RAG or agent framework |
| Active community adding models | You manage updates and model choices yourself |
One con from the original review needs scrapping: it claimed Ollama has "no multi-modal (vision/audio) yet." That isn't true. Ollama has supported vision models for some time, Llama 3.2 Vision, Gemma 3, Qwen2.5-VL, LLaVA, and ships a dedicated engine for multimodal work. If you need a model that reads images, Ollama already does it.
Verdict
Score: 8.9/10
Ollama is the default for running language models locally, and the score is deserved. It's free, the setup is genuinely a single command, and it keeps your data where it belongs. If you write code, handle anything confidential, or just don't want to pay per-token API fees, install it.
For the hardest tasks you'll still want a cloud model, that gap is real. But for a large share of everyday work, Ollama handles it on your own machine, and that's the whole point.
*Published June 13, 2026. The original review cited "Ollama version 0.48," which doesn't exist; as of June 2026 the latest releases are in the 0.30.x series (v0.30.8 shipped 12 June 2026).*




