GLM-5.2 vs Kimi K2.7-Code: Chinese models compared
Analysis
If you run a business in Australia and you have been keeping half an eye on AI tooling, here is the short version. The best coding models no longer all come from San Francisco. Two Chinese labs, Zhipu AI and Moonshot AI, now ship open-weights models that go toe to toe with the closed-source names you already know, and they do it at a fraction of the price.
That matters for a practical reason. Open weights mean you, or a vendor you trust, can run the model yourself instead of renting it through a foreign API. For a finance team handling client data or a dev shop nervous about where its code goes, that is not a small thing.
The trouble starts when you try to pick a winner. Comparison tables for GLM-5.2 and Kimi K2.7-Code are floating around the internet, and many of them, including the one this article was built from, get the headline numbers wrong. Some are off by a few points. At least one has the result backwards. So treat any clean-looking "X beats Y by 5.4 points" table with suspicion, including ours, and check the figures against the labs.
What follows keeps every number from the original comparison so you can see what was claimed, then sets it against what the sources actually report.
Head-to-head benchmarks
| Metric | GLM-5.2 | Kimi K2.7-Code | Delta |
|---|---|---|---|
| SWE-bench Pro | 51.4% | 56.8% | +5.4 pts (Kimi) |
| MMLU | 85.2% | 85.7% | +0.5 pts (Kimi) |
| Context window | 256K | 256K | , |
| Price (input) | $0.80 / 1M | $0.50 / 1M | Kimi cheaper |
| Price (output) | $2.40 / 1M | $2.00 / 1M | Kimi cheaper |
| Parameters | 753B (MoE) | Not disclosed | , |
A warning before you act on this table: most of it does not hold up. We have kept the original figures so you can see what was circulating, but here is what the sources actually say, row by row.
- SWE-bench Pro. The 51.4% / 56.8% split, and the idea that Kimi leads by 5.4 points, is not supported. Real reporting puts GLM-5.2 at 62.1 on SWE-bench Pro, the top open-source result on that benchmark, while Moonshot's own number for Kimi K2.7-Code is 58.6 (VentureBeat). In other words, the direction is reversed: on sourced figures GLM-5.2 is ahead, not behind. And Moonshot's 58.6 was vendor-reported, with practitioners flagging that the benchmarks did not fully check out (VentureBeat).
- MMLU. The 85.2% / 85.7% figures appear to be invented. No reporting we could find gives these MMLU numbers for either model (LLM-Stats). Treat them as unconfirmed.
- Context window. This row is wrong for GLM-5.2. Kimi K2.7-Code does land around 256K. But GLM-5.2's headline feature is a 1 million token context window, not 256K (Pandaily). So this is not a tie; GLM-5.2 holds a large advantage on context.
- Price. Neither price row matches any provider rate we could verify. First-party Z.ai pricing for GLM-5.2 runs closer to $1.40 input / $4.40 output per 1M tokens (WaveSpeed), and OpenRouter lists Kimi K2.7-Code at $0.74 input / $3.50 output (OpenRouter). The $0.80/$2.40 and $0.50/$2.00 figures above are unconfirmed.
- Parameters. GLM-5.2's 753B (MoE) checks out (ForkLog). Kimi K2.7-Code is not undisclosed, though: its specs are public at roughly 1 trillion total MoE parameters with 32B active (Hugging Face). That makes Kimi the larger model by total parameter count, not the smaller one.
Where Kimi K2.7-Code wins
Software engineering. The name is honest about the focus. Moonshot built K2.7-Code as a coding-first model for end-to-end programming and agentic work, and it reports a +21.8% gain on Kimi Code Bench v2 over the previous K2.6 (MarkTechPost). So it is genuinely a strong coder. What we cannot stand behind is the original claim that it beats GLM-5.2 on SWE-bench Pro by 5.4 points. On the sourced figures, GLM-5.2 scores higher there. If coding is your priority, both are contenders, and you should test them on your own codebase rather than trust a single benchmark line.
Price. The original framing had Kimi as the cheaper option at $0.50/$2.00. The verified rates tell a less tidy story: Kimi sits around $0.74 input / $3.50 output (OpenRouter) versus GLM-5.2's roughly $1.40 / $4.40 (WaveSpeed). So Kimi does come out cheaper on real provider pricing, just not at the numbers first stated. At volume, that gap is worth modelling against your actual token usage.
Where GLM-5.2 wins
Knowledge capacity and context. GLM-5.2 carries 753 billion total parameters (ForkLog). The original article leaned on that as a representational-capacity edge, but the comparison is muddier than it looked, because Kimi K2.7-Code is the larger model on paper at about 1T total / 32B active (Hugging Face). The clearer GLM-5.2 advantage is its 1 million token context window (Pandaily), roughly four times Kimi's. If your work involves feeding large documents, long codebases, or whole knowledge bases into a single prompt, that is a real, verifiable point in GLM-5.2's favour.
Chinese language depth. Both models are strong in Mandarin. The original claim that GLM-5.2 has a marginal edge on classical Chinese, Chinese legal terminology, and Chinese-specific knowledge benchmarks is unconfirmed; we found no sourced benchmark data comparing the two on those tasks (Artificial Analysis). Take it as an unverified editorial impression, not a measured result.
Verdict
Here is the honest answer. The original take crowned Kimi K2.7-Code as the better all-rounder, and built that case on cheaper pricing, a coding win, and near-equal knowledge. But that case rested on numbers that do not survive contact with the sources. On verified figures, GLM-5.2 leads on SWE-bench Pro and on context length, Kimi is the larger model rather than the smaller one, and the price gap is narrower than claimed (though Kimi is still cheaper).
So we are not declaring a winner. Both are credible open-weights models from serious labs, and the right choice depends on what you actually need: GLM-5.2 if long context and a top SWE-bench Pro result matter most, Kimi K2.7-Code if you want a coding-focused model at the lower price. The benchmark wars between these two are noisy and, in places, disputed even by practitioners (VentureBeat). Run both against your own work before you commit.


