Logo Icon

October 2025 AI Engineering Roundup

I’ve been using Claude Sonnet 4.5 extensively, and I love it. It replaced GPT-5-Thinking-High as my daily driver in cursor. It’s a little less smart, but it’s significantly faster, which matters a lot. I do still turn to GPT-5-Pro for my thorniest problems in coding and in research.

I’ve used Cursor’s new Composer model, and it’s extremely fast—noticeably snappier than Sonnet for simple tasks. It does seem a bit less intelligent than Sonnet on complex problems, but when you want something done quickly and the task is straightforward, the speed is impressive. My current model hierarchy for coding is something like:

GPT-5-Pro → Sonnet 4.5 → Cursor Composer

Where the left side is slower and smarter and the right side is faster and less smart. I usually start with Sonnet, switching to composer for easy tasks and GPT-5-Pro for hard ones.

Claude Skills are also new this month, and I love them. It’s a really powerful framework for writing simple extensions and skills. I’d love to see this extended to more frameworks like cursor: it’s elegant and efficient.

OpenAI Company Knowledge looks like a great idea for better agentic search across more connections, but I haven’t tried it. Gemini 2.5 Computer Use, gpt-oss-safeguard, and Claude Haiku 4.5 are all new this month, but I haven’t used them and don’t expect to.

Typescript overtaking Python as the #1 language on Github seems like a big deal.


Model releases

Claude Haiku 4.5 (Oct 15) — $1/$5 per million tokens, one-third Sonnet cost with 2x speed. 200K context, 64K output. Fun in parallel with Cursor 2.0’s agents tab. Run 3 of them for the cost of one Sonnet.

Cursor Composer (Oct 29) — 4x faster than frontier models (250 tokens/sec). MoE architecture with custom MXFP8 quantization. Optimizes for speed over maximum intelligence once models cross capability threshold.

Windsurf SWE-1.5 Fast (Oct 30) — 40.08% on SWE-Bench Pro at 950 tokens/sec, 13x faster than Sonnet 4.5. Not sure I’ll ever use it.

Sora 2 (Oct 6) — Synchronized audio and visual generation. I don’t use video models.

Gemini 2.5 Computer Use (API preview) — UI automation via AI Studio and Vertex.

gpt-oss-safeguard (Oct 29) — Apache 2.0 licensed safety models (120B/20B params) that interpret custom policies at inference time.

Enterprise products

Cursor 2.0 (Oct 29) — Agent-centric development interface. Runs up to 8 AI agents simultaneously, unified diff views, native browser tool, sandboxed terminals. Parallel agents are fun but expensive. Parallel cheap agents can work surprisingly well, as you only need one of them to stumble into the correct solution.

OpenAI Company Knowledge — One-click connectors to Slack, SharePoint, Google Drive, GitHub. GPT-5 variant with inline citations and permission-aware responses.

Claude Skills (Oct 16) — Markdown files with YAML metadata, more efficient than MCP. MCP is a pain to work with, so I love simpler frameworks for extending model capabilities.

Claude Code for Web (Oct 20) — Asynchronous coding in browsers with CLI sync. I still like to be able to view the diffs of code the AI writes and Cursor is still the best for this.

OpenAI agent stack — Apps SDK, AgentKit, Evals, RFT for Agents, Codex GA. Maybe I’ll try building an agent with it next month.

Open source

DeepSeek-OCR (Oct 20) — Impressive optical compression: 7-20x text compression through visual representation. 3B params, 97% accuracy at 10x compression, 200K+ pages/day on single A100. Endorsed by Andrej Karpathy. This paper made a splash and I’m still digesting the conclusions. It’s a cool idea.

MiniMax M2 (Oct 27) — 230B-param MoE (10B active), 204K context. Ranks first among open source models, surpasses Claude 3 Opus in math/coding, costs 8% of Claude Sonnet, 2x faster. Apache 2.0. I haven’t tried it.

NVIDIA (Oct 28) — Nemotron family (coding, documents, safety), Cosmos (30-sec world simulations), Isaac GR00T N1.6 (humanoid robotics), Clara (biomedical). 650+ models, 250+ datasets on Hugging Face.

Chinese modelsQwen VL (4B-235B), GLM-4.6 (near Sonnet 4.5), Ling-1T (trillion-param). I like the Qwen models but haven’t tried the new ones. GLM-4.6 has strong vibes based on online chatter but I haven’t tried it.

Emu3.5 (Oct 30) — Multimodal world model, 10T+ tokens. Discrete Diffusion Adaptation: 20x faster inference.

Research

KV cache optimizationExpected Attention (training-free compression), Elastic-Cache (8.7x speedup on GSM8K, 45.1x on longer sequences). Critical: 70B models need ~320GB for 1M token cache. KV caches are essential for high performance inference.

RAG sufficient context (Google, Oct 29) — RAG increases hallucinations with insufficient context (Gemma: 10.2% → 66.1%). LLM autorater deployed in Vertex AI RAG Re-Ranker, 10% accuracy improvement.

Meta inference optimization (Oct 17) — Direct Data Access (O(N) → O(1) latency), Ring Attention (1M tokens <1 min on H100). I’ll start believing in Meta AI if llama 5 is any good.

Developer tools

Google Jules — Asynchronous coding agents for backlog bugs, GitHub integration, auto PRs. I probably won’t use it over cursor, but probably nice for people in the Google ecosystem.

Infrastructure

Model Context Protocol — Security alert: 6 CVEs, 43% of public MCP servers have command injection vulnerabilities (558K+ installations affected). MCP sometimes seems like more trouble than it’s worth to me.

TypeScript overtook Python as #1 GitHub language (Aug 2025 data), 1M+ contributors, +66% YoY. 4.3M+ AI repos.

Financing

Poolside$2B round, Nvidia $1B commitment at $12B pre-money (Oct 30).

Mercor$350M Series C at $10B valuation, 5x increase from $2B earlier in 2025.

Google-Anthropic (Oct 23) — Tens of billions for up to 1M TPUs, 1+ gigawatt compute by 2026.

Microsoft-OpenAI (Oct 28) — $250B Azure commitment from OpenAI, IP rights to 2032, path for open-weight models.

Amazon Project Rainier — Trainium2 cluster, Anthropic using >1M chips by year-end.

Notable rounds: Fireworks AI ($250M at $4B), Harvey ($150M at $8B), Legora ($150M at $1.8B).