Model Drop: Kimi K2.7 Code

Moonshot updates the world's best open source coding model

Jun 12, 2026

Kimi K2.7 (also branded "K2.7 Code") is Moonshot AI’s next iteration on the popular open-weight coding model. This is the first time Moonshot has put "Code" in the name, and the whole release is pitched as the budget answer to Fable 5's $10 / $50 rate card.

Model: Kimi K2.7 (kimi-k2.7-code on the Moonshot API, moonshotai/Kimi-K2.7-Code on Hugging Face). Often written “Kimi 2.7” or “K2.7 Code.”

Model type: Text in, text out, tuned for agentic code generation and long-horizon software engineering

Ship date: June 12, 2026

Maker: Moonshot AI (Beijing)

Pricing: $0.95 per million input tokens, $4.00 per million output tokens, and $0.19 per million on cache hits, on the Moonshot API. Free weights on Hugging Face for self-hosting.

Available on: The Moonshot API (OpenAI- and Anthropic-SDK compatible, one-line base URL swap), Kimi Code (the terminal and IDE coding agent the K2 family ships through), and Hugging Face for open weights under the Modified MIT license.

Headline benchmarks: Moonshot published K2.7’s gains as deltas over K2.6 rather than clean head-to-head numbers against the frontier: +21.8% on Kimi Code Bench v2 (Moonshot’s in-house coding eval), +11% on Program Bench, and +31.5% on MLS Bench Lite (multi-language support across Python, Rust, and Go). The other headline is efficiency: a claimed ~30% reduction in reasoning-token usage versus K2.6 on the same tasks.

Other info: Mixture-of-experts, 1 trillion total parameters with 32B active per token, the same architecture family as K2.5 and K2.6 (so existing deployments swap weights without reconfiguring the inference stack). 262,144-token (262K) context window, carried over from K2.6, with automatic context compression for sustained long-horizon sessions. License: Modified MIT (free commercial use; visible “Kimi K2.7” credit required for products above ~100M monthly users or ~$20M/month revenue). No system card published at launch. The “30% fewer reasoning tokens” claim is positioned as a fix for “overthinking.”

More details: Kimi platform

What shipped

Moonshot AI dropped Kimi K2.7 this morning as another open-weight, coding-specialist iteration on the K2 family. It’s the same 1T / 32B-active mixture-of-experts architecture as K2.5 and K2.6, the same 262K context window, the same Modified MIT license, and the same one-line base-URL swap to point an existing OpenAI- or Anthropic-SDK client at the Moonshot endpoint. The pitch is narrow and explicit: get most of the frontier’s coding capability at roughly a twelfth of the token price, with a model specifically tuned to stop burning reasoning tokens.

The evidence Moonshot put forward is unusual, because it’s almost entirely relative to its own predecessor rather than to the frontier. The launch numbers are framed as gains over K2.6: +21.8% on Kimi Code Bench v2, +11% on Program Bench, +31.5% on the multi-language MLS Bench Lite, and a ~30% cut in reasoning-token usage on equivalent tasks. Moonshot didn’t publish K2.7’s SWE-Bench Verified or SWE-Bench Pro scores against Fable 5, GPT-5.5, or Opus 4.7 at launch, and “Kimi Code Bench v2” is a benchmark only Moonshot reports. But the efficiency story is real and falsifiable.

What’s new

K2.7 isn’t a new base model. It’s a coding-tuned post-train on the K2 MoE family.

Reasoning-token efficiency as the headline feature. The ~30% reduction in reasoning-token usage over K2.6 is the first time a Moonshot release led with efficiency instead of capability. “Overthinking” (models spending thousands of tokens deliberating on problems that need a few) is a real cost (monetarily and environmentally) and latency tax in production agent loops, and a model that solves the same task with a third fewer thinking tokens is directly cheaper to run (on top of already being cheap per token).
Multi-language coding gains. The +31.5% on MLS Bench Lite is the largest single delta Moonshot published, and it targets the exact weakness independent reviewers flagged on K2.6: solid Python, shakier Rust and Go.
A named coding SKU. Moonshot has always pointed the K2 family at coding, but “K2.7 Code” is the first time the coding specialization is in the model name rather than a console preview flag. It signals Moonshot is going to keep a coding-optimized line distinct from a general-agent line.

How and where to use it

Where it’s available

The Moonshot API via OpenAI- and Anthropic-compatible endpoints (swap the base URL, set model to kimi-k2.7-code)
Kimi Code for the terminal and IDE coding agent
Hugging Face for open weights under the Modified MIT license, served via vLLM or SGLang; Multi-provider routers (OpenRouter, Fireworks, Together, DeepInfra) are not all live day-one but should follow within days given the license, the same way they did for K2.6.

What it’s good at

High-volume, cost-sensitive agentic coding where the reasoning-token cut compounds with the already-low per-token price
Multi-language refactors across Python, Rust, and Go, where the MLS Bench Lite gain is supposed to land. Long-horizon sessions on mid-sized codebases that fit inside 262K tokens
Routine, templated, or boilerplate-heavy work where cache hits at $0.19 per million tokens drop the effective cost into near-free territory
Anywhere open weights, a Modified MIT license, and air-gapped or self-hosted deployment beat “hosted by the best lab”

What it’s bad at / shouldn’t be used for

High-stakes architectural decisions and gnarly merge conflicts, where Claude Fable 5’s 90%-plus coding and analytics scores still hold a real lead and the cost difference is worth eating
Knowledge-heavy or deep-reasoning work
Workloads that need a 1M-token window
Regulated or data-sovereignty-sensitive work where sending prompts to a Beijing-hosted API is a non-starter (self-host the MIT weights, which is the point)
Anything where you need a published system card before deployment

First impressions

Independent, hands-on evaluations don’t exist yet, so the positives lean on launch-day coverage and on carried-over sentiment from the K2 family’s daily drivers, and the negatives lean on the structural questions the launch materials leave open.

The positives

CryptoBriefing framed the release as a pricing event before a capability one:

“The AI coding wars just got a new price leader... The pitch is straightforward: get close to the same performance for a fraction of the cost.”

At $0.95 / $4.00, K2.7 is not a discount on the frontier, it’s a different budget tier, and the $0.19 cache-hit price makes templated and repetitive agent work effectively free.

Value The Markets zeroed in on the efficiency angle the rest of the coverage mostly skipped:

“The 30% decrease in reasoning tokens addresses a prevalent issue in automated coding, known as “overthinking.” Excessive token usage during problem-solving leads to increased latency and elevated API expenses.”

A model that solves the same task with a third fewer reasoning tokens is cheaper and faster on every single call, independent of whatever the capability leaderboards eventually say.

The carried-over signal from the K2 family is the strongest “real-world” data point available on launch day. The Hacker News thread on running K2 as a daily coding driver is full of developers who already switched off Opus:

“Was spending crazy amounts on Claude and it was sporadic at best... Switched to Kimi K2.5 and honestly didn’t think it would do anything other than destroy my code. Crazy enough, it solved the problem I had in less than 60 seconds and I was hooked.”

That sentiment predates 2.7, but the people most likely to adopt K2.7 on day one are the ones already running K2.5 or K2.6 in Kimi Code or OpenCode.

The negatives

The benchmark framing is the structural problem, and it’s the same disease in a different strain from the DeepSeek V4 release. Moonshot published K2.7’s gains as percentage deltas over K2.6 on its own benchmarks (Kimi Code Bench v2, Program Bench, MLS Bench Lite) and did not publish standard SWE-Bench Verified or SWE-Bench Pro numbers against Fable 5, Opus 4.7, or GPT-5.5. “21.8% better than our last model on our own coding eval” is a real improvement and also unfalsifiable by anyone outside Moonshot. Until a third party runs 2.7 on the public benchmarks, the only honest statement is that it’s meaningfully better than K2.6 at coding and cheaper to run.

On the head-to-head that does exist, the predecessor sets the expectation. BenchLM’s provisional comparison of Fable 5 against K2.6 is not close:

“Claude Fable 5 is clearly ahead on the provisional aggregate, 96 to 84... The biggest single separator in this matchup is HLE, where the scores are 64.5% and 34.7%.”

K2.7 is a coding-focused post-train, so it should narrow the coding gap. Nothing in the launch suggests it touches the 30-point HLE gap on hard reasoning and knowledge.

Jake’s take

For high-volume, low-stakes code generation where a wrong answer costs an hour instead of a client, the price-per-capability math is lopsided in Moonshot’s favor (and it isn’t close). K2.7 sharpens exactly the edge I care about: the 30% reasoning-token cut. The thing that makes a coding agent expensive isn’t the per-token price, it’s how many tokens it burns thinking out loud before it even does anything. The multi-language gain matters too, since the one consistent complaint I’ve seen with K2.6 is that it was a Python model wearing a Rust costume.

Unfortunately Moonshot shipped a coding model and declined to show us how it codes against the models we’d actually switch from. “21.8% better than K2.6 on a benchmark only we run” isn’t helpful; the absence of a single SWE-Bench Verified line against it reads as a choice, not an oversight.

Kimi 2.7 is going to save real money and write a lot of boilerplate, and I still won’t trust a benchmark card that grades itself.

Discussion about this post

Ready for more?