March 2026 AI Engineering Roundup
My daily driver is still Opus 4.6 via Claude Code, but I find I’m often using GPT 5.4 thinking xhigh to check its work. Opus can be a little lazy, and GPT 5.4 is good at getting the details right. It’s also great at assembling facts and giving rundowns.
Anthropic shipped more product in one week than most companies ship in a quarter. Auto Mode, Cowork, 1M context GA, Marketplace. I love this.
Judge Lin’s 43-page ruling in Anthropic vs. Department of War was devastating. The government’s case was retaliation dressed up as national security, and everyone knew it.
Meanwhile Cline and LiteLLM both got supply-chain-attacked in the same month.
Model Releases
Anthropic Capybara (leaked) (2026-03-27) — Fortune corroborated a leak revealing Capybara, a new tier above Opus with better coding, reasoning, and cybersecurity scores. Rollout constrained by cost and safety testing. If real, Anthropic is about to widen the gap again. I’m excited.
GPT-5.4 Mini and Nano (2026-03-17) — OpenAI’s most capable small models. Mini ($0.75/$4.50 per 1M tokens) is 2x faster than GPT-5 mini with 400K context. Nano ($0.20/$1.25) is cheaper than Gemini 3.1 Flash-Lite and benchmarks ahead of Haiku 4.5 on agentic tasks. The new defaults for subagent fan-out. I haven’t tried them yet but the pricing makes Nano interesting for batch jobs.
Claude Opus 4.6 1M Context GA (2026-03-13) — Anthropic makes Opus 4.6 1M-token context the default for Max/Team/Enterprise. Removed API surcharge, dropped beta header requirement, expanded to 600 images/PDF pages per request. 78.3% on MRCR v2 at 1M tokens – new long-context high watermark. I used to hit compaction in every session. 1M by default is very nice.
Grok 4.20 Beta (2026-03-12) — xAI’s reasoning model offers 2M context at $2/$6 per 1M tokens. I don’t (and won’t) use xAI models. They’re out of the frontier model race, as far as I am concerned.
GPT-5.4 (2026-03-05) — OpenAI’s best model in a year and the first to unify reasoning, coding, and native computer use. 75% OSWorld-Verified (above human baseline), 82.7% BrowseComp, 1M context, 74 tok/s. $2.50/$15 per 1M tokens. Widely praised for improved personality and knowledge work. GPT-5.4 Pro launched same day with extreme reasoning – solved the first FrontierMath Open Problem. GPT-5.4 hit 5T tokens/day and $1B annualized net-new revenue in its first week. 5.4 (especially xhigh or pro) is a very powerful model, and I regularly use it alongside Opus 4.6.
Gemini 3.1 Pro (2026-03-04) — Google’s new frontier model. Cheaper than Sonnet. But user reports say post-training is still weak – slop language, empty responses, poor instruction following. I used for a while alongside Opus 4.6, but have replaced it with GPT 5.4. Opus 4.6/GPT 5.4 is the Pareto frontier.
Enterprise Products
Anthropic vs. Department of War (2026-03-26) — The Pentagon designated Anthropic a supply-chain risk after the company refused exceptions for mass surveillance and autonomous weapons – unprecedented against a US tech firm. Federal Judge Rita Lin blocked it in a 43-page ruling, calling it ‘classic illegal First Amendment retaliation.’ Microsoft confirmed Anthropic products remain available to customers despite the blacklist.
Meta-Manus Acquisition Blocked by China (2026-03-25) — China bars Manus co-founders from leaving the country, blocking Meta’s $2B acquisition of the agentic AI startup. Chinese authorities questioned executives over foreign investment reporting rules. State control over AI companies is getting real.
Microsoft Execuhires AI2 Leadership (2026-03-24) — Microsoft poached AI2 leadership including Ali Farhadi, Hanna Hajishirzi, and Ranjay Krishna for Microsoft Superintelligence. Open research institutions cannot compete with hyperscaler salaries. The OLMo project’s future is unclear.
Sora Shut Down (2026-03-24) — OpenAI kills the Sora video platform, reportedly losing $500K/day. First casualty of Fidji Simo’s strategic cleanup. Compute reallocated to coding and enterprise products. The Google+ of generative video.
Claude Cowork (2026-03-17) — Anthropic’s biggest product week ever. Cowork is a VM-based knowledge work agent built on Claude Code with full mouse/keyboard/screen control, browser automation, and markdown-based reusable skills. Dispatch enables remote control from your phone. Channels adds Slack/Telegram/Discord integration. Built in 10 days from existing internal primitives. Moving from narrow model endpoint toward super-app. I use Cowork a lot: it’s great.
Google Gemini Embedding 2 (2026-03-12) — Google’s first natively multimodal embedding model maps text, images, audio, video, and PDFs into one vector space. 100+ languages, flexible output dimensions via Matryoshka Representation Learning. Available via Gemini API and Vertex AI. I haven’t tried it yet.
Perplexity Personal Computer (2026-03-11) — Perplexity announces an always-on local/cloud hybrid agent running on Mac mini that works across local files, apps, and sessions with remote control. I haven’t tried it.
OpenAI Acquires Promptfoo (2026-03-09) — OpenAI acquires Promptfoo, the most popular open-source LLM red-teaming and eval platform. AI labs are consolidating the security tooling ecosystem.
Claude Marketplace (2026-03-06) — Anthropic launches an app marketplace where users pay for third-party Claude-powered solutions using existing subscription tokens. Initial partners include GitLab, Harvey, Lovable, Replit, Rogo, and Snowflake.
Open Source
Cohere Transcribe (2026-03-26) — Cohere’s 2B Apache-2.0 speech model tops HuggingFace Open ASR leaderboard with 5.42 WER across 14 languages. Processes 33 hours of audio in 12 minutes on one A100 (~60x real-time). Contributed encoder-decoder optimizations to vLLM yielding 2x throughput gains. Cohere switching from non-commercial to Apache 2.0 is a big deal. I haven’t tried it, but I love local speech-to-text models and plan to check it out.
Voxtral TTS (2026-03-26) — Mistral’s 3.4B open-weights TTS model achieves 68.4% win rate vs ElevenLabs Flash v2.5 across 9 languages. ~90ms time-to-first-audio. Novel architecture combines autoregressive semantic tokens with flow matching for acoustic tokens. Supported by vLLM for serving.
Xiaomi MiMo-V2-Pro (Hunter Alpha) (2026-03-18) — The anonymous ‘Hunter Alpha’ dominating OpenRouter was Xiaomi’s MiMo-V2-Pro, a 1T-param MoE (42B active). 78% SWE-bench Verified vs Opus 4.6’s 80.8%. 1M context. $1/$3 per MTok. Built by former DeepSeek researcher Luo Fuli. Xiaomi stock jumped 5.8% on reveal.
NVIDIA Nemotron 3 Super (2026-03-11) — NVIDIA’s 120B total / 12B active open-weights model with Mamba-Transformer hybrid LatentMoE architecture and 1M context. First open model pre-trained with NVFP4 quantization. Up to 2.2x faster inference than comparable models in FP4, 484 tok/s. KV-cache ~3x lighter than Qwen3.5-122B. Fully open: weights, data, recipe, and tech report. I haven’t tried it yet.
Qwen 3.5 (2026-03-03) — Alibaba’s new open-weights family from 0.8B to 397B-A17B MoE, all multimodal with reasoning. Uses hybrid GDN+attention architecture. 27B variant uses 75% linear attention, runs on a single consumer GPU. 262K native context extendable to 1M. The best Qwen yet – released the same week lead researcher Junyang Lin and multiple core contributors resigned amid an org restructuring. Alibaba CEO attended an emergency all-hands.
Research
Claude Code Source Leak (2026-03-31) — Anthropic accidentally shipped .map files in Claude Code v2.1.88 on npm, exposing 512K lines of TypeScript source. Revealed KAIROS (autonomous daemon mode), Undercover Mode (stealth OSS contributions), and anti-distillation fake tool injection. Anthropic confirmed human error.
LiteLLM Supply Chain Attack (2026-03-24) — LiteLLM 1.82.7-1.82.8 on PyPI compromised via the CEO’s hacked GitHub account. Malicious payload exfiltrated cloud creds, SSH keys, K8s configs, and wallet files. Transitive risk hit packages like DSPy. Karpathy noted the entire filesystem becomes attack surface in agentic workflows. Two major AI tooling supply chain attacks in one month.
UK AISI Cyber Attack Scaling Law (2026-03-23) — UK AI Security Institute discovers a scaling law for autonomous cyber attacks. Average steps completed on a 32-step corporate network attack rose from 1.7 (GPT-4o, Aug 2024) to 9.8 (Opus 4.6, Feb 2026). Best single run: 22/32 steps. Scaling from 10M to 100M tokens yields up to 59% gains. Full autonomous cyber agents are getting close.
Mamba-3 (2026-03-17) — Albert Gu and Tri Dao release Mamba-3, a MIMO variant that improves model quality at similar decode speed. Strongest performance among linear models and fastest prefill+decode at 1.5B. Targets inference-heavy RL and long-rollout workloads. The SSM vs transformer debate continues.
Kimi Attention Residuals (2026-03-16) — Kimi.ai replaces fixed residual accumulation with input-dependent attention over prior layers. 25% compute advantage, under 2% inference latency overhead. Validated on Kimi Linear 48B. Called ‘the most interesting architectural ideas being discussed openly anywhere.’ I haven’t tried it but the efficiency claims are exciting.
Karpathy Autoresearch (2026-03-09) — Karpathy ran an agent-driven research loop on nanochat, finding ~20 additive changes that transferred from depth-12 to depth-24, improving Time-to-GPT-2 by 11% after ~700 autonomous experiments. Calls this ‘the final boss battle’ for frontier labs. Automated ML research is real.
Claude Finds 22 Firefox Vulnerabilities (2026-03-06) — Anthropic partners with Mozilla; Claude Opus 4.6 found 22 confirmed Firefox vulnerabilities in 2 weeks, 14 high severity – roughly 20% of Mozilla’s 2025 high-severity fixes.
Developer Tools
Codex Plugin for Claude Code (2026-03-30) — OpenAI ships a plugin that embeds Codex inside Claude Code for code review, adversarial review, and background task handoff. Rather than waiting for developers to switch from Claude Code (46% market share), OpenAI brought Codex to them. I have my own skill for running Codex from Claude Code. It’s great.
Stripe Projects.dev (2026-03-26) — Stripe lets agents instantly provision services via CLI. Run ‘stripe projects add posthog/analytics’ to create accounts, get API keys, set up billing. Part of the broader trend where every SaaS becomes an MCP endpoint. I like automation.
Claude Code Auto Mode (2026-03-24) — Claude Code gets a risk classifier that auto-approves safe tool calls and blocks risky ones, replacing the all-or-nothing choice between approving every action and –dangerously-skip-permissions. Enable with –enable-auto-mode, cycle with Shift+Tab. I like this direction, and this replaces my manual bash hook looking for bad commands via grep.
Chrome 146 MCP Support (2026-03-19) — Chrome ships native MCP support as a core DevTools feature. One toggle exposes your live browser session – including authenticated tabs – to any AI agent via MCP. No extensions or workarounds needed. This is going to be everywhere.
OpenAI Acquires Astral (2026-03-19) — OpenAI acquires Astral (uv, ruff, ty) – the tools behind 126M+ monthly PyPI downloads. Team joins Codex. Projects stay open source under permissive licenses. Third major AI lab devtools acquisition after Google/Antigravity and Anthropic/Bun. Signal that owning core developer tooling is now strategic. I use uv daily and I love Astral. I hope that OpenAI treats them well.
Claude Code Code Review (2026-03-09) — Multi-agent PR review for Claude Code. Agents hunt issues in parallel, verify findings, rank severity. Internal metrics: jumped from 16% to 54% of PRs getting meaningful comments, with under 1% incorrect findings. Expensive but good.
Cursor Cloud Agents and Composer 2 (2026-03-06) — Cursor ships cloud agents with full VM and computer use – agents test their own changes end-to-end and produce video demos. Agent usage now exceeds tab autocomplete at Cursor. Composer 2 follows with their in-house frontier coding model built on Kimi K2.5 via continued pretraining and RL through Fireworks AI. $0.50/$2.50 per 1M tokens, 61.3 CursorBench. I don’t use Cursor much any more and have mostly switched to Claude Code.
Infrastructure
H100 Rental Prices Reversing (2026-03-27) — H100 rental prices have reversed their 2024 depreciation and are climbing since December 2025, driven by reasoning model and agent demand. Dylan Patel says H100s are worth more today than 3 years ago. The GPU shortage is back.
Meta-AMD $100B AI Chip Deal (2026-03-26) — AMD announced a $100B agreement to supply up to 6 GW of AI capacity to Meta using custom MI450 Instinct GPUs. Meta increased its West Texas datacenter commitment sixfold to $10B targeting 1 GW by 2028.
xAI TERAFAB (2026-03-26) — Elon Musk announces TERAFAB, a $20B domestic chip manufacturing project by SpaceX and Tesla targeting a terawatt of compute per year. I’ll believe it when I see it.
NVIDIA GTC 2026 and Vera Rubin (2026-03-16) — Jensen’s GTC keynote: $1T sales backlog for 2027, positions inference as the central inflection point. Announced Vera Rubin next-gen platform with Google Cloud among the first to offer NVL72 rack-scale systems in H2 2026. H200 production reallocating to Vera Rubin chips. Also unveiled NemoClaw for enterprise-secure agentic compute.
NVIDIA Dynamo (2026-03-10) — NVIDIA open-sources a datacenter-scale inference engine supporting SGLang, TRT-LLM, and vLLM. Optimizes serving through prefill/decode disaggregation, intelligent scheduling, and Kubernetes orchestration. Designed around cost/latency/quality tradeoffs for multi-node LLM inference. I haven’t tried it but this fills a real gap.
Apple M5 Pro / M5 Max (2026-03-03) — Apple’s new chips claim 4x faster LLM prompt processing vs M4 Pro/Max. M5 Max: up to 128GB unified memory at 614GB/s. 2x faster SSD at 14.5GB/s. Major upgrade for local AI inference – running 397B MoE models on a laptop is now plausible. I want one!
Financing
Q1 2026 AI Venture Funding Record (2026-03-31) — Q1 2026 set an all-time record: $300B invested globally with $242B (80%) going to AI. The four largest rounds ever – OpenAI $122B, Anthropic $30B, xAI $20B, Waymo $16B – accounted for 65% of global venture investment.
Kimi $1B Raise (2026-03-14) — Moonshot AI (Kimi) raises $1B at $18B valuation, up 4x in three months. The best-funded Chinese AI lab behind the Attention Residuals paper and Kimi K2.5.
Cursor $2B ARR / $50B Valuation (2026-03-12) — Cursor doubles ARR to $2B in three months, 60% from corporate customers. Raising at $50B valuation.
Replit Agent 4 / $9B Valuation (2026-03-11) — Replit triples valuation to $9B in 6 months. Agent 4 pivots from coding platform to full knowledge work suite with canvas, apps, slides, and videos. Part of the broader trend of coding agent builders expanding to general knowledge work. I haven’t tried it.
AMI Labs ($1B Seed) (2026-03-10) — Yann LeCun launches AMI Labs with a $1.03B seed at $3.5B pre-money – one of the largest seed rounds ever. Building world models around JEPA for robotics, explicitly betting against autoregressive LLMs as the path to intelligence. Team heavy on FAIR vision researchers.
OpenAI $110B Funding Round (2026-03-05) — OpenAI raises $110B from Amazon, Nvidia, and SoftBank. The round was the third most important thing Altman announced that day (after GPT-5.4 and Codex updates).