Back to news

AI Tools

Ollama Review: Run Any Model Locally.

Ollama makes running LLMs locally as easy as `docker run`. We tested 15 models across Mac, Linux, and Windows to see if local AI is production-ready.

AI Kick Start editorial image for Ollama Review: Run Any Model Locally.

Decision

Design boundary

Classify the data first, then decide what can use cloud AI, what must be redacted, and what stays local.

Risk to watch

Data leakage

A useful answer is not worth losing control of personal, financial, or contractual information.

Proof to collect

Audit trail

Capture upload, redaction, access, review, export, and rollback evidence before expanding access.

TL;DR

TL;DR: Ollama makes running LLMs locally as easy as `docker run`. We tested 15 models across Mac, Linux, and Windows to see if local AI is production-ready.

Key takeaways

  • Ollama Review: Run Any Model Locally: **TL;DR:** Ollama is the simplest way to run language models on your own machine.
  • What Is Ollama?: Ollama is a [free, open-source tool](https://github.com/ollama/ollama) for running large language models on your own hardware, released under the MIT licence.
  • Model Library: Ollama hosts a [large catalogue of models](https://ollama.com/library), well over 100, each installable with a single command.
  • Performance Benchmarks: The original review tested "Llama 4 8B" on a MacBook Pro M3 (36 GB RAM).
  • Privacy: The Real Selling Point: Use ChatGPT or Claude and your data travels to someone else's servers.

Ollama Review: Run Any Model Locally

TL;DR: Ollama is the simplest way to run language models on your own machine. It's free and open source. If you handle private code, client records, or anything that can't leave the building, it earns its place fast. Just don't expect it to match a cloud model on a laptop.

Most teams using AI today are sending their data somewhere else to get it. You type a prompt, it goes to a server in another country, an answer comes back. For a lot of work that's fine. For a law firm reviewing a contract, a clinic summarising patient notes, or a developer with a codebase under NDA, it's a problem nobody wants to think about.

Ollama is the tool that lets you stop thinking about it. It runs the model on your own computer, so the data never leaves. You install it, type one command, and a capable language model is answering questions on your hardware with nothing going out over the wire.

The catch is the one you'd expect. A model running on your laptop won't keep pace with the latest cloud system, and the bigger, sharper models want serious hardware. The honest question for an Australian business team isn't "is local as good as the cloud", it's "which of my jobs are sensitive enough that local is worth the trade." For more of them than you'd guess, the answer is yes.

A note before the spec tables below: this review leans on some model names and version numbers that didn't check out against the vendors' own documentation, so we've corrected or flagged those inline. The case for Ollama itself holds up.

What Is Ollama?

Ollama is a free, open-source tool for running large language models on your own hardware, released under the MIT licence. The easiest way to picture it is Docker for LLMs:

ollama run llama4:8b

That's the whole setup. No Python environment to build, no CUDA versions to wrangle, no dependency mess. Ollama downloads the model, sorts out the hardware acceleration, runs a local server on port 11434, and exposes an OpenAI-compatible API. You run a model with one command.

Price: Free (open source, MIT licence)

Model Library

Ollama hosts a large catalogue of models, well over 100, each installable with a single command. A few worth knowing about:

ModelSizeHardware RequiredPerformance
Llama 4 8B*4.9 GB8 GB RAMGood for most tasks
Llama 4 70B*40 GB64 GB RAM / 2x GPUStrong general quality
Mistral 3 7B*4.1 GB8 GB RAMFast, efficient
Qwen 3 72B*43 GB64 GB RAMStrong coding
CodeLlama 70B40 GB64 GB RAMSolid local code model
Gemma 3 27B16 GB32 GB RAMGoogle's flagship open model

A correction on the names in that table, because the model landscape moved faster than a lot of write-ups:

  • There is no "Llama 4 8B" or dense "Llama 4 70B." Meta's Llama 4 family is Mixture-of-Experts: Scout (17B active / 109B total) and Maverick (17B active / 400B total), with Behemoth in preview. The 8B and 70B sizes belong to the older Llama 3 line. Whoever benchmarked an "8B" was almost certainly running Llama 3.
  • "Qwen 3 72B" isn't a real model either. The Qwen3 lineup tops out at 32B for dense models, with MoE variants at 30B-A3B and 235B-A22B. The 72B was a Qwen2.5 model. The coding strength is real; the label is wrong.
  • "Mistral 3 7B" is close but off. Mistral 3 ships dense models at 3B, 8B, and 14B. The famous 7B was the original Mistral 7B, a different generation. A small, fast Mistral on Ollama is real, just not that exact label.
  • CodeLlama 70B is a genuine Meta model and runs fine on Ollama, but the "best local code model" crown has moved on. By 2026 most people reach for Qwen2.5-Coder (the 32B in particular) for local coding.
  • Gemma 3 27B checks out. It's the flagship of the Gemma 3 generation, multimodal, with a 128K context window. Calling it Google's best open model of that generation is fair.

*Names marked with an asterisk above were inaccurate in the source figures and are corrected in this list.

Performance Benchmarks

The original review tested "Llama 4 8B" on a MacBook Pro M3 (36 GB RAM). Worth reading with the caveat from above in mind: these are self-reported, first-party numbers, and the model under test was almost certainly Llama 3 8B rather than anything from the Llama 4 herd. The GPT-5.5 baseline it's compared against is real (OpenAI shipped it in April 2026), but the figures themselves haven't been independently checked.

TaskTokens/SecQuality vs GPT-5.5
Code completion34 t/s75% as good
Summarisation28 t/s80% as good
Translation31 t/s85% as good
Reasoning22 t/s70% as good
Creative writing25 t/s65% as good

The shape of the numbers is the useful part, even if the labels aren't. A small local model gives up some speed and some smarts in exchange for keeping your data on your own machine. For a sensitive codebase, medical data, or legal documents, that's a trade most teams should take without much hand-wringing.

Privacy: The Real Selling Point

Use ChatGPT or Claude and your data travels to someone else's servers. With Ollama, nothing leaves your machine, the model runs on your hardware, fully offline if you want it.

That's why people reach for it on:

  • Proprietary codebase analysis
  • Medical record summarisation
  • Legal document review
  • Air-gapped environments
  • Offline development (planes, remote sites)

For an Australian business sitting under the Privacy Act and client confidentiality obligations, "the data physically never left our office" is a sentence worth a lot.

Pros and Cons

ProsCons
Completely free and open sourceNeeds decent hardware for the bigger models
Dead-simple setupSlower than cloud APIs
Full privacy, data never leavesLarge models want expensive GPUs
100+ models availableNo built-in RAG or agent framework
Active community adding modelsYou manage updates and model choices yourself

One con from the original review needs scrapping: it claimed Ollama has "no multi-modal (vision/audio) yet." That isn't true. Ollama has supported vision models for some time, Llama 3.2 Vision, Gemma 3, Qwen2.5-VL, LLaVA, and ships a dedicated engine for multimodal work. If you need a model that reads images, Ollama already does it.

Verdict

Score: 8.9/10

Ollama is the default for running language models locally, and the score is deserved. It's free, the setup is genuinely a single command, and it keeps your data where it belongs. If you write code, handle anything confidential, or just don't want to pay per-token API fees, install it.

For the hardest tasks you'll still want a cloud model, that gap is real. But for a large share of everyday work, Ollama handles it on your own machine, and that's the whole point.

*Published June 13, 2026. The original review cited "Ollama version 0.48," which doesn't exist; as of June 2026 the latest releases are in the 0.30.x series (v0.30.8 shipped 12 June 2026).*

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Classify the data before choosing a tool or model.
  2. Define what can leave the environment, what must be redacted, and who approves output.
  3. Keep logs, access controls, and a rollback path visible from day one.

Want help applying this? Explore secure document AI.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Ollama Review: Run Any Model Locally

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call