AI Tools

nanochat: Andrej Karpathy's minimal LLM training stack (55k stars).

Andrej Karpathy's nanochat proves you can train a GPT-2 class model for $48. Here's why 55,000 developers have starred this educational masterpiece.

Daniel Fleuren2026-06-1211 min readFounders and operatorsUpdated 2026-06-19

Written by

Daniel Fleuren

Founder, AI Kick Start. 20+ years enterprise IT

Updated 2026-06-19

AI Kick Start editorial image for nanochat: Andrej Karpathy's minimal LLM training stack (55k stars).

Decision

Shortlist

Score tools by workflow fit, data handling, owner readiness, and cost at scale before buying seats.

Risk to watch

Shelfware

A capable tool still fails if nobody owns the workflow or checks whether it is used weekly.

Proof to collect

Pilot score

Run one real task through each shortlisted tool and record quality, time saved, and support burden.

TL;DR

TL;DR: nanochat is Karpathy's minimal, from-scratch training and inference stack for a small ChatGPT-style model, with roughly 55,000 GitHub stars. The README puts a GPT-2-class model at about $48 of compute; the full chat clone is closer to $100. It covers the whole loop: data, tokenisation, the training run, and inference and serving. It's built to teach. Clean Python, heavy comments, and a codebase that stays small on purpose.

Key takeaways

nanochat strips LLM training down to a readable, end-to-end codebase, with about 55,000 GitHub stars behind it.
Budget the small GPT-2-class model at roughly $48 of compute and the fuller chat clone at around $100.
The documented run targets an 8xH100 node in about two to four hours; the "single RTX 4090 overnight" framing is unconfirmed.
The real win for most teams isn't training a model. It's understanding one well enough to buy and use AI tools with confidence.

Briefing

There's something almost rebellious about nanochat. While training a large language model is rumoured to run into the millions, Andrej Karpathy's minimal training stack shows that understanding how these models actually work is within reach of anyone with a modest budget and a bit of curiosity. With around 55,000 GitHub stars, it has become the reference resource for learning what goes on inside an LLM.

Analysis

For a few years now, the standard story about building AI has gone like this: it's the domain of a handful of labs with budgets most companies will never see. nanochat pokes a hole in that. It's a single, readable codebase that takes you from raw text all the way to a working chatbot you can talk to, and the compute bill for the small version is roughly what you'd spend on a team lunch.

The author matters here. Andrej Karpathy ran AI at Tesla and was a founding member of OpenAI, and over the past few years he's spent a lot of his time teaching rather than building products. nanochat is the latest in that line of work, and the star count suggests a lot of people were waiting for exactly this.

So what's the "so what" for a business reader? You don't need to train your own model to benefit. The value is clarity. If your team can read through a project like this, the AI tools you're buying stop being a black box. You start to understand what a token is, why context windows have limits, and where the real costs sit. That makes you a sharper buyer.

The $48 Claim

The headline number is the hook. The README says you can train a GPT-2-class model for about $48 of compute, and the repo backs it up with everything you need: data preparation, tokenisation, the training loop, and inference code, all in clean Python with comments explaining the choices behind each step.

One thing worth keeping straight: the $48 figure is the GPT-2 tier. The fuller chat clone that Karpathy is best known for promoting lands closer to $100. Same project, two different rungs on the ladder, and the price you'll quote depends on which one you build.

The original article framed the run as a single GPU, an RTX 4090, finishing in roughly 24 hours. That doesn't match the documented setup. nanochat is designed to run on an 8xH100 node and finish in about two to four hours. It can be coaxed onto a single GPU using gradient accumulation, but it'll be a lot slower, and the repo never mentions a 4090. Treat the "one consumer card overnight" version as unconfirmed.

Whichever way you run it, the resulting model won't go toe to toe with GPT-4. What it will do is generate coherent text, handle basic questions, and teach you how transformers work from the ground up. That last part is the point.

What's In the Box

nanochat is a full LLM training stack, not just a demo:

Data Pipeline: Scripts for downloading and preprocessing training data from multiple sources. Includes deduplication, filtering, and quality scoring.

Tokenisation: A byte-pair encoding implementation with vocabulary building, training, and encoding/decoding. It targets GPT-2-grade capability; the exact "GPT-2 tokeniser format compatibility" isn't something the repo spells out, so read that as the intent rather than a guarantee.

Model Architecture: A clean PyTorch implementation of the GPT architecture with configurable depth, width, and attention patterns. Every layer is commented with references back to the original "Attention Is All You Need" paper.

Training Loop: Distributed training support, gradient checkpointing, mixed precision, and learning rate scheduling, plus Weights & Biases integration for experiment tracking.

Inference Engine: Text generation with temperature sampling, top-k, top-p, and repetition penalty. Includes a simple chat interface.

Why 55,000 Stars?

A lot of it comes down to who built it. Karpathy's "Neural Networks: Zero to Hero" series and earlier projects like nanoGPT and llm.c made him the person people turn to for the fundamentals. nanochat extends that work into a complete, end-to-end system.

The code reads like it was written to be read. Functions carry docstrings, the tricky sections have inline comments, and the README walks through the concepts before it drops you into the implementation. It's set up for learning, not just for running.

The Educational Vision

Karpathy has been open about the goal: make AI less of a mystery by putting the fundamentals where people can reach them. nanochat sits alongside his video lectures, blog posts, and his back-and-forth with the community. The issue tracker reads more like a classroom than a bug queue, with beginners asking questions and more experienced people answering.

Contributions are welcome, but they're curated with a firm hand. Clarity wins over features. Pull requests that pile on complexity without teaching anything tend to get a polite no, which is how the codebase stays approachable.

Getting Started

The README includes a quickstart. Note that the commands below are an illustrative example rather than a copy-paste of the current repo. The actual project has shifted to a uv-based setup with a speedrun script, so check the README for the live instructions before you run anything:

git clone https://github.com/karpathy/nanochat.git
cd nanochat
pip install -r requirements.txt
python data/prepare.py
python train.py --config configs/gpt2_small.yaml

If you've ever wondered how LLMs actually work under the hood, nanochat is a straight answer. For business teams, that understanding pays off in better tool decisions, sharper questions for vendors, and a more honest read on what AI can and can't do for you yet.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

OpenAI platform documentation

What to do next

Write the job-to-be-done before looking at another product.
Score each shortlisted tool for workflow fit, data handling, cost, and owner readiness.
Run one small pilot and remove anything the team does not use weekly.

Want help applying this? Explore the AI tools directory.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: nanochat: Andrej Karpathy's minimal LLM training stack (55k stars)

Read with ChatGPT Open Claude Search with AI Mode

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call