Briefing
There's something almost rebellious about nanochat. While training a large language model is rumoured to run into the millions, Andrej Karpathy's minimal training stack shows that understanding how these models actually work is within reach of anyone with a modest budget and a bit of curiosity. With around 55,000 GitHub stars, it has become the reference resource for learning what goes on inside an LLM.
Analysis
For a few years now, the standard story about building AI has gone like this: it's the domain of a handful of labs with budgets most companies will never see. nanochat pokes a hole in that. It's a single, readable codebase that takes you from raw text all the way to a working chatbot you can talk to, and the compute bill for the small version is roughly what you'd spend on a team lunch.
The author matters here. Andrej Karpathy ran AI at Tesla and was a founding member of OpenAI, and over the past few years he's spent a lot of his time teaching rather than building products. nanochat is the latest in that line of work, and the star count suggests a lot of people were waiting for exactly this.
So what's the "so what" for a business reader? You don't need to train your own model to benefit. The value is clarity. If your team can read through a project like this, the AI tools you're buying stop being a black box. You start to understand what a token is, why context windows have limits, and where the real costs sit. That makes you a sharper buyer.
The $48 Claim
The headline number is the hook. The README says you can train a GPT-2-class model for about $48 of compute, and the repo backs it up with everything you need: data preparation, tokenisation, the training loop, and inference code, all in clean Python with comments explaining the choices behind each step.
One thing worth keeping straight: the $48 figure is the GPT-2 tier. The fuller chat clone that Karpathy is best known for promoting lands closer to $100. Same project, two different rungs on the ladder, and the price you'll quote depends on which one you build.
The original article framed the run as a single GPU, an RTX 4090, finishing in roughly 24 hours. That doesn't match the documented setup. nanochat is designed to run on an 8xH100 node and finish in about two to four hours. It can be coaxed onto a single GPU using gradient accumulation, but it'll be a lot slower, and the repo never mentions a 4090. Treat the "one consumer card overnight" version as unconfirmed.
Whichever way you run it, the resulting model won't go toe to toe with GPT-4. What it will do is generate coherent text, handle basic questions, and teach you how transformers work from the ground up. That last part is the point.
What's In the Box
nanochat is a full LLM training stack, not just a demo:
Data Pipeline: Scripts for downloading and preprocessing training data from multiple sources. Includes deduplication, filtering, and quality scoring.
Tokenisation: A byte-pair encoding implementation with vocabulary building, training, and encoding/decoding. It targets GPT-2-grade capability; the exact "GPT-2 tokeniser format compatibility" isn't something the repo spells out, so read that as the intent rather than a guarantee.
Model Architecture: A clean PyTorch implementation of the GPT architecture with configurable depth, width, and attention patterns. Every layer is commented with references back to the original "Attention Is All You Need" paper.
Training Loop: Distributed training support, gradient checkpointing, mixed precision, and learning rate scheduling, plus Weights & Biases integration for experiment tracking.
Inference Engine: Text generation with temperature sampling, top-k, top-p, and repetition penalty. Includes a simple chat interface.
Why 55,000 Stars?
A lot of it comes down to who built it. Karpathy's "Neural Networks: Zero to Hero" series and earlier projects like nanoGPT and llm.c made him the person people turn to for the fundamentals. nanochat extends that work into a complete, end-to-end system.
The code reads like it was written to be read. Functions carry docstrings, the tricky sections have inline comments, and the README walks through the concepts before it drops you into the implementation. It's set up for learning, not just for running.
The Educational Vision
Karpathy has been open about the goal: make AI less of a mystery by putting the fundamentals where people can reach them. nanochat sits alongside his video lectures, blog posts, and his back-and-forth with the community. The issue tracker reads more like a classroom than a bug queue, with beginners asking questions and more experienced people answering.
Contributions are welcome, but they're curated with a firm hand. Clarity wins over features. Pull requests that pile on complexity without teaching anything tend to get a polite no, which is how the codebase stays approachable.
Getting Started
The README includes a quickstart. Note that the commands below are an illustrative example rather than a copy-paste of the current repo. The actual project has shifted to a uv-based setup with a speedrun script, so check the README for the live instructions before you run anything:
git clone https://github.com/karpathy/nanochat.git
cd nanochat
pip install -r requirements.txt
python data/prepare.py
python train.py --config configs/gpt2_small.yamlIf you've ever wondered how LLMs actually work under the hood, nanochat is a straight answer. For business teams, that understanding pays off in better tool decisions, sharper questions for vendors, and a more honest read on what AI can and can't do for you yet.


