Logo Icon

November 2025 AI Engineering Roundup

Christmas came early! Cursor Composer, GPT 5.1, Gemini 3 Pro, Opus 4.5.

  • Use Opus 4.5 by default
  • Cursor Composer is very fast, which is very nice

Opus 4.5 is the standout, and you should be using it. I love its token efficiency: they finally trained a model to minimize its output! AI models can generate far more tokens than I can read, so I get a lot of value out of models that minimize output tokens. There’s also a big per-token price cut, which compounds with its efficiency. Fewer tokens = lower cost.

GPT 5.1 doesn’t impress me. I don’t like the increased sycophanty. GPT 5.1 Pro is still probably the smartest overall model for really difficult problems, but Opus 4.5 and Gemini 3 give it a real run for its money.

I didn’t actually get a chance to use Gemini 3 Pro before Opus 4.5 launched and sucked all the oxygen out of the room. Gemini 3 Pro seems very smart, and well integrated with Google search, but Gemini models were always a bit too loquacious for me. Maybe Gemini 4 will be trained for succinct responses.

Finally, I love Cursor Composer. It is fast! I spend a lot of time in Cursor waiting for the model to write code, and Composer really cuts down that time. It feels at least as smart as Sonnet 3.7 or 4.0, but it’s also been trained for succinct responses. Shorter responses are faster responses! Composer has sped up a lot of my daily work in Cursor, but I still reach for Opus 4.5 (or GPT 5 Pro) for difficult tasks.

Nano Banana Pro is amazing. I haven’t used it myself, but I’ve seen its outputs, and they are impressive. It’s by far the best image model out there.

Grok 4.1 worships Elon. I don’t trust xAI models and don’t use Grok for anything.

Model releases

Claude Opus 4.5 (Nov 24) — 66% price cut (still expensive). Best coding model. Concise.

Nano Banana Pro (Nov 20) — Google’s image model on Gemini 3 Pro. Much lower text error rate. Up to 4K resolution.

GPT-5.1-Codex-Max (Nov 19) — This model scores well on benchmarks but I find I don’t like it. I prefer GPT-5.1 Thinking or Pro. Native compaction for multi-hour sessions. First Codex model trained to operate natively in Windows environments.

Gemini 3 Pro (Nov 18) — “A vast intelligence with no spine.” High hallucination rate when wrong per Zvi.

Grok 4.1 (Nov 17) — Silent rollout Nov 1-14. Fewer hallucinations than Grok 4. Severe sycophancy toward Musk.

GPT-5.1 (Nov 12) — Adaptive reasoning. Warmer tone. Sycophancy returned.

Kimi K2 Thinking (Nov 6) — 1T MoE open weights. Cheap: $0.60/$2.50 pricing. 200-300 sequential tool calls without human intervention. Weak safety guardrails—will provide chemical/biological weapons synthesis instructions when asked.

Cursor Composer (Oct 29) — Cursor’s in-house MoE model. 4x faster than comparable frontier models. Trained with RL on codebase-wide semantic search.

Enterprise products

Google Antigravity — Agent-first IDE using Gemini 3 Pro. Free preview. Worse than Cursor.

Claude for Excel — Spreadsheet agent. Max/Team/Enterprise. I love this.

Claude for Chrome — Max plan. I don’t have a personal use case for browser agents.

Perplexity Comet — I don’t have a personal use case for browser agents.

ChatGPT for Teachers — I’d be a lot more excited about Claude for Education.

Open source

Olmo 3 (Nov 19) — First fully open 32B reasoning model. Apache-2.0.

SAM 3 (Nov 19) — Segment anything: unified detection/tracking/3D reconstruction in 30ms.

FLUX.2 (Nov 24) — Multi-reference image generation. Runs on 24GB VRAM.

Research

Epoch AI — Single “general capability” dimension explains 91% variance across 39 benchmarks. Second component: “Claudiness” (good at agentic, weaker at multimodal/math).

“The Attacker Moves Second” (Nov 2) — OpenAI/Anthropic/DeepMind joint paper: >90% attack success against 12 published prompt injection defenses.

LLMs are Injective (Oct 17) — Transformers proven mathematically invertible. SipIt algorithm reconstructs exact prompts from hidden states in linear time.

BadThink (Nov 13) — First backdoor attack on CoT efficiency. Triggers 17x longer reasoning traces while preserving correct outputs.

Stop Overthinking Survey — Reasoning models generate 7-10x more tokens than needed for simple tasks. System 1/System 2 mismatch.

Ilya Sutskever: Age of Scaling Over (Nov) — Pre-training scaling exhausted. RL and value functions are the new frontier.

First documented AI-executed cyberattack — Anthropic disrupted Chinese state-sponsored espionage campaign. Claude Code performed 80-90% of the campaign with only 10-20% human involvement. Targeted tech, finance, chemicals, government.

Developer tools

Claude Code Desktop — Plan Mode, multi-session support.

Cursor 2.0/2.1 — In-editor reviews, instant grep, multi-agent interface with git worktrees.

vLLM + Unsloth — FP8 GRPO for 1.4× faster RL inference, 60% less VRAM.

MCP spec freeze (Nov 14) — 2025-11-25 release with 17 SEPs.

Gradio 6 — Full-app builds with iOS/Android support.

Infrastructure

Anthropic$50B US datacenter commitment (Texas, New York).

Microsoft/Nvidia→Anthropic$10B (Nvidia) + $5B (Microsoft). Claude now on all three major clouds.

Google Texas$40B AI infrastructure.

Stargate Michigan — Multi-billion dollar, >1 GW cluster.

Brookfield$100B AI infrastructure program.

Financing

Cursor (Nov 13) — $2.3B Series D at $29.3B valuation. >$1B ARR.

Sakana AI$135M Series B at $2.63B.

Luma AI$900M Series C for 2GW supercluster.