December 2025 AI Engineering Roundup

December 31, 2025 • By Zach Deane-Mayer

The big story this month is that there isn’t one. Opus 4.5 dropped in November and nothing from December comes close. Opus is my daily driver for coding, writing, and chat. I barely use any other models. If you’re not using Opus, you’re missing out.

GPT-5.2 disappointed me. The benchmarks looked great, but in practice it’s slow and the responses feel constrained. The 40% price increase doesn’t help. When I compare models in parallel in Cursor, Opus has usually solved the problem before GPT-5.2 thinking has even started to output. GPT-5.2 and Gemini 3 Pro are both very smart models, but the vibes are getting bad. Gemini 3 Flash is very, very cheap: I could see myself using it for high-volume work.

Open weight models had a great month. DeepSeek V3.2 and GLM-4.7 are genuinely competitive at very low prices, but I honestly don’t use them over Opus 4.5. It’s worth paying the extra money for the best intelligence, especially for coding. OpenRouter’s State of AI Report confirms this.

The Linux Foundation launched a neutral home for agent standards, with Anthropic donating MCP and OpenAI donating AGENTS.md. I hope Claude Skills are donated next.

I am very excited for Astral’s typechecker. ty makes a nice addition to uv and ruff.

Selling H200’s to China is a bad idea, but we’re doing it anyway.

Model Releases

GPT-5.2-Codex (2025-12-18) — OpenAI’s agentic coding model. Context compaction for long-horizon work, large refactors/migrations, Windows support, cybersecurity capabilities. 64% Terminal-Bench 2.0, 0.8% over GPT-5.2 on SWE-Bench Pro. In practice, it doesn’t seem to me like a large lift over GPT 5.2.

Gemini 3 Flash (2025-12-17) — Google’s fast model. $0.50/$3.00 per 1M tokens (1/4 Pro price). 1M context, 65K output, Jan 2025 cutoff. ~150 tok/s. 1T+ tokens/day. Gemini flash is probably the best fast/cheap model out there.

OpenAI Image 1.5 (2025-12-16) — OpenAI image model. Pretty good, but nano banana from Google is better.

GPT-5.2 (2025-12-11) — OpenAI’s update. Pricing: $1.75/$14 per million tokens (40% higher than GPT-5.1), Pro $21/$168, Aug 2025 cutoff. Qualitative: benchmarks don’t reflect real capabilities, best for cold analysis. Slow responses and poor vibes vs Claude Opus 4.5. I still sometimes use GPT 5.2 Pro—it does have better attention to detail than Opus, but it is slow and expensive.

Gemini 3 Deep Think (2025-12-04) — Google AI Ultra subscribers only. Google extended thinking with parallel reasoning. Gold-medal IMO/ICPC. In practice I just use GPT 5.2 pro, but for extremely dense analysis problems I might try Gemini Deep Think too.

GPT-5.1-Codex-Max (2025-12-04) — OpenAI agentic coding model. $1.25/$10 per million tokens. Integrations in VS Code, Cursor, Windsurf, Linear. OpenAI’s model proliferation continues: I have trouble deciding when to use 5.2 thinking vs 5.2 codex vs 5.1 codex max. In practice I just use Opus instead.

Enterprise Products

Disney-OpenAI Partnership (2025-12-11) — Disney invests $1B in OpenAI. 3-year license with year-one exclusivity to generate video with 200+ Disney/Pixar/Marvel/Star Wars characters via Sora. I don’t use Sora or other video generation models.

Google Managed MCP Servers (2025-12-10) — Fully managed remote MCP servers for Maps, BigQuery, Compute Engine, Kubernetes. Protected by Cloud IAM and Model Armor firewall. Works with Gemini CLI, AI Studio, Claude, ChatGPT. I like managed MCP servers and hope more companies move in this direction.

Anthropic-Accenture Partnership (2025-12-09) — Multi-year partnership forming Accenture Anthropic Business Group. 30,000 professionals trained on Claude. First product targets CIOs for AI-powered software development.

Anthropic-Snowflake Partnership (2025-12-03) — $200M multi-year strategic partnership. Claude available to 12,600+ Snowflake customers. Claude Sonnet 4.5 powers Snowflake Intelligence with >90% accuracy on text-to-SQL.

Claude Code $1B Revenue (2025-12-02) — Claude Code reaches $1B run-rate revenue 6 months after May launch. Anthropic acquires Bun JavaScript runtime. I like Bun a lot and I like Anthropic a lot, so I hope they do good things together.

Amazon Nova 2.0 (2025-12-02) — Amazon’s new model. I haven’t used it and don’t plan to.

Open Source

GLM-4.7 (2025-12-27) — Open coding model with Interleaved/Preserved/Turn-level Thinking. Top open weights model on Artificial Analysis. 63 tok/s on 4x Mac M3 Ultras. The next time I need a local model, this is the first one I’ll try.

MiniMax-M2.1 (2025-12-23) — Open coding model. Looks interesting, I would likely use GLM 4.7 first.

Gemma Scope 2 (2025-12-19) — Google’s interpretability toolkit for their open Gemma models. Seems academically interesting.

Nvidia Nemotron 3 (2025-12-15) — Open weights/data/recipes. 24GB VRAM—in theory it fits on consumer GPUs. Another interesting local model to try out.

Devstral 2 (2025-12-09) — Mistral’s Claude Code competitor. Cool idea, claims it’s close to Sonnet 4.5, but that doesn’t cut it for me. I need Opus 4.5 level intelligence! That said, I do see value in coding models you can run locally with an Apache 2.0 license.

Mistral 3 (2025-12-02) — Mistral is back! I like open models in theory, but don’t use them in practice. To me the big story with Mistral 3 is that it uses a variant of the DeepSeek 3 architecture.

DeepSeek V3.2 (2025-12-01) — 685B MoE open-weights model under MIT license. DeepSeek Sparse Attention for efficient long-context. V3.2-Speciale achieves gold-medal on IMO 2025 and IOI. Very cheap ($0.28/$0.42 per 1M tokens) but slow (~30-40 tok/s). Interesting for the architecture innovations.

Research

Activation Oracles (2025-12-19) — Training LLMs to decode own neural activations and answer questions in natural language. Uncovers misaligned goals in fine-tuned models without specific training. I love this idea.

METR Claude Opus 4.5 Evaluation (2025-12-19) — 50% time horizon of 4h49m, significantly exceeding previous models. 80% time horizon 27 minutes. METR says Opus 4.5 essentially saturated their benchmark and they need to revise their methodology going forward. Full Post.

OpenAI FrontierScience (2025-12-16) — New benchmark to evaluate LLMs on scientific research tasks.

OpenRouter State of AI Report (2025-12-07) — Empirical study of 100T tokens shows reasoning models now >50% of usage. Claude owns ~60% of coding workloads with 20K+ token prompts. Programming drives >50% of paid traffic. Users prioritize quality over price.

Natural Emergent Misalignment (2025-11-21) — This is from November, but I missed it in the last roundup. It’s an important finding: “Don’t tell your model you’re rewarding it for A and then reward it for B, or it will learn you’re its adversary.” Anthropic Paper.

Developer Tools

Windsurf Wave 13 (2025-12-27) — Parallel agents, Git worktrees, multi-pane Cascade, dedicated terminal. I use Cursor instead, but competition is good.

Agent Skills (2025-12-18) — Anthropic releases “skills” as open standard. Adopted by OpenAI, Cursor, GitHub, VS Code, etc. I love skills and am happy it’s now standard. They basically allow models to dynamically load rules and tools based on the task at hand. This should go in the new Linux Foundation agent standards.

ty (2025-12-16) — Astral’s type checker. I love Astral’s other products (uv and ruff) and will be using this going forward. UV in particular fixed python for me: if you’re not using it, you should be! Speed matters, and ty is a lot faster than the alternatives.

Claude Code Updates (2025-12-15) — Syntax highlighting, ghost text suggestions, plugins marketplace.

Agentic AI Foundation (AAIF) (2025-12-09) — Linux Foundation creates neutral home for AI agent standards. Anthropic donates MCP, OpenAI donates AGENTS.md, Block donates Goose. Platinum members: AWS, Google, Microsoft, Bloomberg, Cloudflare.

Infrastructure

Data Center Deals Record $61B (2025-12-19) — Global data center dealmaking hit record $61B in 2025. Debt issuance nearly doubled to $182B. Meta raised $62B debt since 2022. 100+ transactions in first 11 months.

Hut 8 and Fluidstack Anthropic Data Center (2025-12-17) — Hut 8 and Fluidstack to build AI data center for Anthropic in Louisiana.

NVIDIA SLURM Acquisition (2025-12-15) — NVIDIA acquired SchedMD (SLURM workload scheduler).

NY RAISE Act Signed (2025-12-10) — New York Governor Hochul signed RAISE Act establishing AI safety requirements similar to California’s SB 53. Passed despite opposition lobbying from AI industry.

H200 Chip Sales to China (2025-12-08) — Trump admin approves Nvidia H200 sales to China with 25% tax. H200 is 6x more powerful than H20. Pushed by David Sacks. Substantially erodes US compute advantage. This is a bad idea.

Anthropic TPUv7 Purchase (2025-11-28) — Anthropic purchasing >1GW of Google TPUs. I missed this last month, but it’s important.

Financing

Meta acquires Manus AI (2025-12-29) — Meta Superintelligence Labs acquired Manus for ~$4B at $100M ARR, 9 months after launch. I’ve never used Manus and likely never will.

Nvidia acquires Groq’s assets (2025-12-24) — $20B non-exclusive licensing deal. Groq CEO/president join Nvidia. GroqCloud remains separate. Nvidia’s largest acquisition (prev: Mellanox $7B). It’s hard for me to analyze this one: it looks like an acquihire?

Anthropic IPO Preparation (2025-12-20) — Hired IPO lawyers, possibly targeting 2026 at $300B+ valuation.

Harvey Series F (2025-12-04) — $160M Series F at $8B valuation, serving 700+ law firms in 58 countries.

Black Forest Labs Series B (2025-12-01) — $300M Series B for FLUX image generation development. FLUX models are solid for image gen and can run locally, but nano banana from Google (closed source) is still the best.