Back to news

AI News

The Death of Fine-Tuning: Why Context Is Replacing Retraining.

Fine-tuning was once the standard approach to adapting models for specific tasks. In 2026, it is being rapidly displaced by in-context learning and long-context retrieval. We explain why.

AI Kick Start editorial image for The Death of Fine-Tuning: Why Context Is Replacing Retraining.

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: Fine-tuning, the process of retraining a model on task-specific data, is being pushed aside by in-context learning with long-context retrieval. As context windows stretch to 1 million tokens and models get better at using what you hand them, the cost and effort of fine-tuning increasingly fail to justify themselves for most jobs.

Key takeaways

  • 1-million-token context windows let entire knowledge bases sit inside a prompt ([MiniMax M3](https://www.minimax.io/models/text/m3); [Gemini 3.5 Flash](https://ai.google.dev/gemini-api/docs/interactions/whats-new-gemini-3.5))
  • Modern models retrieve information reliably across long contexts, though the often-quoted "93-97% accuracy" range is uncited (reportedly 2026 testing; treat as unconfirmed)
  • Fine-tuning runs into the tens of thousands of dollars; in-context learning costs cents per request (uncited order-of-magnitude estimates, 2026)
  • Fine-tuned Kimi K2.7-Code is reportedly far more accurate on proprietary codebases than the base model, but the specific 25-40% figure is unsourced ([benchmark scrutiny, VentureBeat](https://venturebeat.com/technology/kimi-k2-7-code-cuts-thinking-tokens-30-practitioners-say-benchmarks-dont-check-out))

Analysis

For two years, if you wanted an AI model to do your specific job well, you retrained it. You gathered examples, you ran an expensive training job, and you ended up with a model tuned to your task. A whole cottage industry grew up around that work: tools, consultants, services, all selling the same promise.

That promise is fading. The reason is almost embarrassingly simple. Models can now read far more in one sitting than they could even a year ago, and they actually remember what they read. So instead of baking your knowledge into a model's weights over a week of training, you can paste your documentation straight into the prompt and get answers that are just as good, often within seconds, for a fraction of the cost.

For an Australian business team, the practical upshot is this: the slow, costly path to a "custom" AI is no longer the obvious one. The faster path, feed the model your manuals, your policies, your product docs at the moment you ask, has caught up, and in a lot of cases overtaken it. Fine-tuning isn't gone. But it's stopped being the first thing you reach for.

Here's what's driving the shift, and where retraining still earns its keep.

The Context Window Revolution

The most direct pressure on fine-tuning comes from how much a model can read at once. When GPT-3 launched, its context window was about 2,000 tokens (GPT-3, Wikipedia). You couldn't fit any real task documentation into a prompt that small, so retraining the weights was the only way to make the model "know" your domain. Today, models like MiniMax M3 and Gemini 3.5 Flash offer 1-million-token contexts, room for an entire codebase, a product documentation library, or a full customer-support knowledge base, dropped straight into the prompt. (Some coverage also lists a "DeepSeek V3.5" in this group, but that model name appears to be unconfirmed; DeepSeek's 1M-context model at this point is V4.)

That changes the maths of adapting a model. Rather than spending weeks preparing training data, running a costly fine-tuning job, and grading the output, you include the relevant documentation in the prompt and get comparable or better results. It's faster and cheaper, and it bends easily, change the underlying documents and the next prompt reflects it, no retraining required.

Supporting AI Kick Start editorial image for death-of-fine-tuning-context-replacing-retraining.
Generated AI Kick Start editorial visual used to explain the article's practical workflow and trade-offs.

Improved In-Context Learning

The second shift is that models have got much better at actually using the context you give them. Early models treated a long prompt a bit like packing material. They handled information near the start and end well, then lost the thread in the middle. This "lost in the middle" pattern, documented by Liu and colleagues in Lost in the Middle: How Language Models Use Long Contexts, made long-context approaches hard to trust.

Newer models have largely worked past it. MiniMax reportedly cites very high needle-in-a-haystack retrieval at 1M tokens for M3, though that specific figure isn't published on its official blog or model page, so treat it as unconfirmed rather than a benchmarked fact. Google's Gemini models show similar reach, and even models with "only" 128K-256K windows tend to perform reliably across their whole range.

What this means in practice: putting your task documentation in the prompt is now a real alternative to fine-tuning for most work. Give a model a well-built prompt with the right examples and reference material, and on many tasks it matches what a fine-tuned model would do.

The Cost Calculation

Fine-tuning has never been cheap, and it has got dearer as models have grown. Retraining something the size of Llama 4 (400B parameters) or GLM-5.2 (753B parameters) needs serious GPU time, by most reasonable estimates, tens of thousands of dollars for a single run, on top of the engineering hours to prepare data, babysit the training, and grade the results. Those dollar figures are uncited order-of-magnitude estimates rather than published prices, so read them as ballpark.

In-context learning, by contrast, costs nothing extra to develop and adds only the inference cost of the longer prompt in production. Estimates put 100,000 tokens of context at roughly $0.01-0.03 per request on the cheaper providers, though premium models run higher (Gemini 3.5 Flash sits closer to $0.15 per 100K input tokens, per OpenRouter pricing). Either way, it's a rounding error next to a fine-tuning run.

The gap widens once you account for upkeep. A fine-tuned model is frozen in time. When your product documentation changes, your knowledge base updates, or your task shifts, you retrain. An in-context setup picks up those changes the moment you edit the prompt content.

When Fine-Tuning Still Makes Sense

None of this kills fine-tuning. There are jobs where it still beats in-context learning outright. Work that demands very low latency benefits, because shorter prompts process faster. Work with rigid output formats that have to come out identical every time benefits from weight-level adaptation. And work where the training data carries subtle patterns that are hard to spell out in a prompt benefits from the model absorbing those patterns through retraining.

Kimi K2.7-Code is the usual example here. Fine-tuned on a company's own codebase, it's reportedly claimed to beat the base-model-plus-context setup by a wide margin on internal coding tasks, but that specific improvement figure has no supporting source and looks like an illustrative estimate, so don't bank on the exact number. The general point holds: for organisations whose core business is writing code, that kind of gain can justify the fine-tuning bill.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Write the job-to-be-done before looking at another product.
  2. Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
  3. Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: The Death of Fine-Tuning: Why Context Is Replacing Retraining

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call