Composer 2.5: the model that reveals Cursor’s real ambitions

Synthetic summary

Cursor has released Composer 2.5, a coding model that signals a bigger ambition: controlling its own AI stack, from product to training.

Cursor has released Composer 2.5, its new coding model built directly into the IDE. At first, it looks like a regular model update: better at long-running tasks, more reliable with complex instructions, and easier to work with.

The bigger story is Cursor’s direction. Cursor wants to build its own models, optimized for Cursor, its agents, and real development workflows.

The clearest signal came from Dan Perks at Cursor. He said that an internal test redirected almost the entire company’s Cursor chats to Composer 2.5 for around two days. He did not even notice.

For a coding model, that matters. Developers do not need a model that only looks good in a demo. They need a model that understands the repo, edits the right files, follows constraints, and does not break their workflow.

Composer 2.5 is built on Kimi K2.5, Moonshot’s open-source checkpoint, the same base used for Composer 2. Cursor then pushed it further with more training, harder RL environments, targeted textual feedback, and 25 times more synthetic tasks than Composer 2.

The most strategic part is Colossus 2. Elon Musk said Composer 2.5 was “partially trained on Colossus 2”. Cursor also says it is working with SpaceXAI on a much larger model, trained from scratch with 10 times more total compute, using Colossus 2 and its “million H100-equivalents”.

That is what makes Composer 2.5 more interesting than a simple model update. Cursor is starting to connect its IDE, agents, usage data, post-training pipeline, and large-scale training infrastructure.

Benchmarks should still be treated carefully. They do not fully capture what matters inside an IDE: choosing the right files, respecting the project style, avoiding bad commands, and knowing when to stop. Still, they give a useful signal.

Benchmark	Composer 2.5	Opus 4.7	GPT-5.5

Terminal-Bench 2.0	69.3%	69.4%	82.7%	61.7%
SWE-Bench Multilingual	79.8%	80.5%	77.8%	73.7%
CursorBench v3.1	63.2%	64.8% max / 61.6% default	64.3% xhigh / 59.2% default	52.2%

Composer 2.5: the model that reveals Cursor’s real ambitions

Synthetic summary

Read next

Cursor Multitask: the mode built for people who always have one prompt ahead

Google Antigravity: Why it’s not just another Cursor