Model Review

Llama 4 review: Meta's MoE open model.

Meta's Llama 4 launched 20 April 2026 with 50.2% SWE-bench Pro, 84.8% MMLU, and 256K context. It is completely free, both weights and API access, making it the default choice for cost-sensitive deployments.

Daniel Fleuren2026-06-1512 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for Llama 4 review: Meta's MoE open model.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: Meta's Llama 4 launched 20 April 2026 with 50.2% SWE-bench Pro, 84.8% MMLU, and 256K context. It is completely free, both weights and API access, making it the default choice for cost-sensitive deployments.

Key takeaways

Llama 4 review: Meta's MoE open model: **Release date:** reportedly April 2026 | **Status:** Active | **Licence:** Open weights (Llama 4 Community License) Note on dates and figures: this review carries several numbers we could not confirm against Meta's own documentation.
Benchmarks at a glance: SWE-bench Pro: 50.2% (unconfirmed): See note below MMLU: ~85%: Solid for an open model Context window: 256K tokens (claimed): Official specs are larger Price (input): Weights free; hosted API paid: , Price (output): Weights free; hosted API paid: , Licence: Open weights: Self-hostable, with conditions Meta's pitch with Llama 4 is simple enough that any business owner can follow it: download the model, run it on your own hardware, and stop paying a vendor per question.
The MoE architecture: Llama 4 uses a sparse Mixture-of-Experts design.
Performance assessment: On coding, the picture is murky.
The self-hosting proposition: Because the weights are free to download, the cost of running Llama 4 yourself is infrastructure.

Llama 4 review: Meta's MoE open model

Release date: reportedly April 2026 | Status: Active | Licence: Open weights (Llama 4 Community License)

Note on dates and figures: this review carries several numbers we could not confirm against Meta's own documentation. Meta's official announcement puts the Llama 4 launch (Scout and Maverick) at April 2025, not 2026, and the published specs differ from some figures below. Where a claim is unconfirmed, we say so plainly and keep the number visible so you can judge it yourself.

Benchmarks at a glance

Metric	Score	Context
SWE-bench Pro	50.2% (unconfirmed)	See note below
MMLU	~85%	Solid for an open model
Context window	256K tokens (claimed)	Official specs are larger
Price (input)	Weights free; hosted API paid	,
Price (output)	Weights free; hosted API paid	,
Licence	Open weights	Self-hostable, with conditions

Meta's pitch with Llama 4 is simple enough that any business owner can follow it: download the model, run it on your own hardware, and stop paying a vendor per question. That is a genuinely different deal from the metered API world most teams live in, and it is the reason Llama matters even when it doesn't top the leaderboards.

The catch is that "open" and "free" aren't the same thing, and the marketing around this release blurs the two. The model weights are free to download. Running them is not, you either buy GPUs or rent a hosted API that charges per token. Some of the eye-catching numbers floating around about Llama 4, including its release date and a few headline benchmarks, also don't line up with Meta's own published figures. We flag those as we go.

So the honest framing is this. Llama 4 is a capable, broadly useful open model that can save a real GPU-equipped team a lot of money. It is not a magic "free forever" button, and it is not the best model at any single task. For Australian teams weighing self-hosting against a paid API, that distinction is the whole decision.

The MoE architecture

Llama 4 uses a sparse Mixture-of-Experts design. Per Meta's Maverick model card, the Maverick variant has roughly 400 billion total parameters but only activates about 17 billion of them per token. That is a real break from Llama 3, which used a dense architecture where every parameter fires on every token, and it brings Meta into line with how most frontier labs now build models (Meta's Llama 4 blog calls these its first native MoE models).

The practical upshot of a sparse design: you get the knowledge capacity of a very large model without paying the full inference cost on every request, because only a slice of the network runs at a time.

Performance assessment

On coding, the picture is murky. The article's headline of 50.2% on SWE-bench Pro, framed as a 6.8-point jump over Llama 3.1's final release, is one we couldn't verify. Llama 4 doesn't appear on the SWE-bench Pro leaderboard we checked, and an independent SWE-bench Lite run put Maverick far lower, around 8%. Treat the 50.2% figure as unconfirmed. What we can say with more confidence: Llama 4 handles routine engineering work, boilerplate, simple debugging, code review, better than it handles complex multi-file changes or novel algorithmic problems. It is a useful assistant, not a senior engineer.

On general knowledge it holds up well. Independent trackers like llm-stats put Maverick's MMLU around 85%, which is strong for an open model. The article's specific comparison numbers, GPT-5.5 Instant at 84.2% and Qwen 3 at 84.6%, we couldn't confirm against any source, so read those as unverified. The broad point still stands: for Q&A, summarisation, and content generation, Llama 4 is more than adequate.

The self-hosting proposition

Because the weights are free to download, the cost of running Llama 4 yourself is infrastructure. The article suggests a single H100 can serve the Q4 quantised version with acceptable latency for internal tools, and a pair of H100s for production. Those numbers are plausible for a 400B-total/17B-active MoE under Q4 quantisation, but we couldn't find an authoritative source confirming the exact hardware recommendations, so take them as a reasonable starting estimate rather than a spec.

The economics are still the draw. Once the hardware is paid off, each additional request costs you electricity rather than per-token API fees. One thing to keep in mind: the Llama 4 Community Licence isn't unconditional. It restricts some EU access to the multimodal models and requires a separate commercial licence for companies above 700 million monthly active users, unlikely to bite most Australian businesses, but worth reading before you build on it.

Verdict

Llama 4 isn't the best model at any one thing, but it's good enough at most things, and you can run it on your own gear. For startups, researchers, and teams that already own GPUs, it's a sensible default to start with. Move to a paid model when you hit a specific capability wall, and treat the "completely free" framing with caution, because the free part is the weights, not the running of them.

Score: 7.8 / 10 (capability) / 9.5 / 10 (value)

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

Meta Llama documentation

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: Llama 4 review: Meta's MoE open model

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call