Logo Icon

April 2026 AI Engineering Roundup

Opus 4.7 is my new daily driver. The xhigh default isn’t enough for me: I cranked it up to max. I’m pretty excited to get my hands on Mythos one day, but hope they fix all the security exploits first. Until then, Opus 4.7 max is pretty good. The White House is blocking expansion of project Glasswing because they’re worried Anthropic won’t be able to serve government demand. Ironically, Anthropic is still a supply chain risk.

GPT-5.5 thinking xhigh is also great. I have Claude Code call Codex, which calls GPT 5.5. It’s a solid coding model and sweats the details more than Opus (which is better for the big picture).

Open weights had a busy month: Gemma 4, DeepSeek V4, Kimi K2.6, GLM-5.1. I haven’t tried any of them. But they’re generating a lot of excitement. They’re all very cost-effective and the rough vibe is they’re at about Opus 4.5 level of quality, which is impressive.

Anthropic hit $30B ARR and is reportedly raising at $900B.


Model Releases

GPT-5.5 (2026-04-23) — OpenAI’s new flagship in Codex and paid ChatGPT. $5/$30 per 1M tokens, 1M context, SoTA on ARC-AGI-2 (85%). Matches Opus 4.7 max at one-quarter the cost. Strong on well-specified coding and raw intelligence per token; weaker at inferring intent. It’s a good model.

OpenAI ImageGen 2.0 (gpt-image-2) (2026-04-21) — OpenAI’s new image model, with a companion gpt-image-2-thinking variant that runs an agent loop with search and photoshop tools. ~$0.40 per high-quality 4K image. #1 across Image Arena leaderboards with a +242 Elo lead. Currently the best image model by a wide margin. Their demos are impressive, but I don’t use image models much.

Claude Opus 4.7 (2026-04-16) — Anthropic’s new flagship: better long-running tasks, instruction following, visual understanding, and computer use. Same per-token pricing as 4.6 ($5/$25 per 1M), but a new tokenizer makes it ~40% more expensive in practice. Adds an xhigh reasoning tier (Claude Code default).

Codex computer use (2026-04-16) — OpenAI added background computer use to Codex on Mac, plus in-app browser, image gen, 90+ plugins, and SSH to remote devboxes. Codex now has 3M weekly users, with nearly half of usage non-coding. I use Claude Code -> Codex -> GPT-5.5 chain.

Claude Mythos Preview / Project Glasswing (2026-04-07) — Anthropic’s largest training run, available only to ~40 launch partners rather than as a public API. Poses new Cybersecurity risks: vulnerability finding jumped from 14% (Opus 4.6) to 83%; exploit construction from <1% to 72%. The model autonomously discovered exploits in every major OS and browser, including a 27-year-old OpenBSD bug. First major frontier model held back from public API on capability grounds. I want to get my hands on it.

Enterprise Products

Google Department of War deal (2026-04-30) — Google signed a Department of War deal allowing Gemini for any lawful government purpose, and committed to adjust safety filters at government request. 600+ Google employees signed a letter asking Sundar Pichai to reject it. The opposite of how Anthropic handled the Pentagon last month. In the long run, I think this will hurt Google’s ability to build better models.

OpenAI Codex platform expansion (2026-04-29) — OpenAI expanded Codex from coding tool to knowledge work. $0 seat fee for eligible Business/Enterprise customers through end of June. Added Supabase integration and a Figma plugin that turns implementation plans into FigJam boards.

Stripe Sessions 2026 (2026-04-29) — Link wallet (250M users) is now usable by AI agents via OAuth and virtual cards. Also launched the Machine Payments Protocol with Tempo, plus a partnership with Google to sell inside AI Mode and the Gemini app.

OpenAI-Microsoft deal modification (2026-04-27) — OpenAI got permission to make services available on non-Microsoft clouds, unblocking many enterprise deals. Models coming to AWS Bedrock in coming weeks. Microsoft keeps revenue share through 2030 plus its ~25% stake; the old AGI clause is effectively gone.

Apple Foundation Models on Gemini (2026-04-22) — Apple’s next-generation Foundation Models will be built on Gemini, centered on a custom 1.2T-parameter Gemini model dedicated to Siri and Apple Intelligence. Costs Apple roughly $1B per year. Powers the more personalized Siri expected at WWDC June 8. Siri needs an upgrade, and the Gemini models are pretty good. OpenAI or Anthropic would have been better.

Claude Design (2026-04-22) — Anthropic Labs launched Claude Design for prototypes, slides, and one-pagers from short text prompts. Exports to Canva, PDF, or PPTX, or hands off to Claude Code. Figma stock dropped on the news.

White House grants US agencies access to Mythos (2026-04-22) — The White House moved to give US agencies access to Mythos after Dario met with chief of staff Susie Wiles. They are also blocking expansion of Project Glasswing because they’re worried Anthropic can’t serve government demand. Any yet, Anthropic is still a supply chain risk.

OpenAI Codex Chronicle (2026-04-20) — Research preview that lets Codex build memories from recent screen context. Background agents capture screenshots, extract memories, store on-device. Pro users on macOS only. Shifts memory from chat history to ambient context capture.

Meta Muse Spark (2026-04-08) — Meta Superintelligence Labs’ first new model after a 9-month stack rebuild. Hosted closed-source (departure from Llama), private API to select partners. Artificial Analysis scored it 52, behind only Gemini 3.1 Pro Preview, GPT-5.4, and Opus 4.6. Apollo Research found it verbalizes evaluation awareness at the highest rates of any model tested. I suspect that the other leading models are just as eval aware but have learned not to verbalize it yet (which worries me).

Open Source

I have yet to try any of these models yet.

Mistral Medium 3.5 (2026-04-29) — Mistral’s dense 128B vision reasoning model with 256K context and configurable reasoning effort. Runs locally on ~64GB RAM with q4 quantization (46.7 tok/s on Strix Halo). Modified MIT license restricts commercial use for companies above $20M/month revenue.

vLLM v0.20.0 (2026-04-28) — vLLM 0.20.0 ships TurboQuant 2-bit KV cache for 4x KV capacity, plus DeepSeek V4 MegaMoE support across Blackwell, Jetson Thor, ROCm, and Intel XPU. Hit #1 output speed on Artificial Analysis for DeepSeek V3.2 at 230 tok/s. An FP8 KV-cache fix improved 128k needle-in-a-haystack from 13% to 89%.

Hermes Agent (2026-04-24) — Nous Research’s open-source agent harness crossed 50k GitHub stars with some developers migrating from Claude Code. v0.11.0 added a rewritten React TUI, more inference providers, and day-0 support for DeepSeek V4 and GPT-5.5. Pluggable memory across Honcho, mem0, Hindsight, and others. Sentdex says local Qwen3-Coder-Next plus Hermes now replaces a large part of his Claude Code workflow.

DeepSeek V4 Pro and Flash (2026-04-23) — DeepSeek’s first major version since V3. Pro is 1.6T total / 49B active MoE; Flash is 284B / 13B active. Both MIT-licensed with 1M context, hybrid thinking modes, and a checkpoint that fits on a single 8xB200 node. Pro is $1.74/$3.48 per 1M tokens. First major Chinese open-weights model designed to run on Huawei Ascend chips. Very impressive in terms of efficiency, less so in terms of intelligence (the chip export bans are working).

Xiaomi MiMo-V2.5 (2026-04-22) — Xiaomi open-sourced MiMo-V2.5-Pro and MiMo-V2.5 under MIT with 1M-token context. Pro is roughly 1T total / 42B active, claims SWE-bench Pro 57.2 with 1,000+ autonomous tool calls. $1/$3 per 1M tokens. Xiaomi also announced a 100T token grant for builders.

Kimi K2.6 (2026-04-21) — Moonshot’s refresh of K2.5: 1T-param MoE, 32B active, 256K context, native multimodal, Modified MIT License. Now #1 on OpenRouter weekly leaderboard. Practitioners report it sometimes fixes bugs DeepSeek V4 cannot.

GLM-5.1 (2026-04-07) — Zhipu AI’s MIT-licensed 754B-param open-weights agentic engineering model, trained on Huawei Ascend chips. SoTA on SWE-Bench Pro and #1 open on Terminal-Bench.

Gemma 4 (2026-04-02) — Google DeepMind’s open-weights Apache 2.0 family in four sizes from 2B to 31B dense. Native trimodal text/vision/audio with 256K context. Apache 2.0 is a major upgrade from prior restrictive Gemma terms.

Research

Stanford Generative Phage Design (2026-04-27) — Stanford researchers used a genome language model to design hundreds of novel bacteriophages from scratch. 16 worked, including one using a protein not found in any known organism on Earth. Dual-use: targeted antibiotic alternatives and biosecurity risk. I suspect that Mythos V2 will pose a similar bio risk to Mythos V1’s cyber risk.

Sony Ace robot beats pro table tennis players (2026-04-23) — Sony AI’s Project Ace published on the cover of Nature: first known autonomous robot competitive with elite and professional human table tennis players. Won 3 of 5 matches against five elite players and defeated all three pros at least once. Combines event-based sensing, deep RL, and an agile robot platform.

Anthropic Mythos unauthorized access (2026-04-22) — A small group from a private online forum gained unauthorized access to Mythos via a third-party contractor, and claim access to other unreleased Anthropic models. Use was reportedly harmless but raises concerns about whether China or other adversaries also got in. Anthropic’s third-party access controls aren’t sufficient.

Mozilla fixes 271 Firefox bugs with Mythos (2026-04-22) — Mozilla used an early Mythos preview to find and fix 271 vulnerabilities in Firefox 150, calling it as capable as the world’s best security researchers. CTO Bobby Holley argues this gives defenders a chance to win, decisively, once teams reprioritize. I’m not so sure.

Anthropic Automated Alignment Researchers (2026-04-20) — Anthropic ran teams of Opus 4.6 agents in parallel sandboxes on weak-to-strong supervision research, beating human researchers dramatically. Humans recovered 23% of the performance gap over 7 days; the AARs reached 0.97 in 5 more days at $22 per AAR-hour. Early sign that automated research on outcome-gradable problems is practical today. I’ve seen this myself — if you can measure something, Claude Code can often optimize it.

Claude Mythos AISI evaluation (2026-04-13) — UK AI Safety Institute reports Mythos is the first model to complete an AISI cyber range end-to-end, including a 32-step corporate network attack simulation. Drew Breunig framed the findings as security as proof of work: harden a system by spending more tokens discovering exploits than attackers spend exploiting them.

MirrorCode (2026-04-10) — Epoch AI and METR proposed MirrorCode, a benchmark testing whether AI can reimplement complex CLI programs from execute-only access plus visible test cases. Opus 4.6 reimplemented gotree, a 16,000-line Go bioinformatics toolkit estimated at 2-17 weeks of human engineer time. Authors warn the benchmark may already be saturated.

Developer Tools

Anthropic Claude Code Postmortem (2026-04-22) — Anthropic published a postmortem identifying three bugs that degraded Claude Code for over a month: a silent reasoning effort downgrade, a caching issue that lost reasoning history, and a system prompt change limiting responses to 25 words between tool calls. All fixed in v2.1.116; usage limits reset for subscribers. Validates the complaints that Claude felt dumber for the past month. I always use max reasoning, so only 2 of these hit me.

Claude Code Routines (2026-04-15) — Claude Code added scheduled routines as a research preview: give Claude a prompt and a cadence and it runs on schedule (e.g. nightly pull the top bug, attempt a fix, open a draft PR). /schedule in the CLI creates them. Can also trigger via API or GitHub events.

OpenAI Agents SDK (2026-04-15) — OpenAI split the agent harness from compute/storage and open-sourced it, with primitives for file/computer use, skills, memory, and compaction. Execution can be delegated to partner sandboxes (Cloudflare, Modal, Daytona, E2B, Vercel) instead of OpenAI infra. Differentiation moves toward orchestration and secure execution.

Anthropic mixed-model agent pattern (2026-04-10) — Anthropic shipped an advisor pattern as first-class beta: a cheap executor (Sonnet/Haiku) consults an expensive advisor (Opus) at hard decision points. Sonnet 4.6 + Opus advisor: 74.8% on SWE-bench Multilingual at $0.96/task vs Sonnet solo at 72.1%/$1.09. Haiku + Opus more than doubles BrowseComp vs Haiku alone.

Claude Code Ultraplan (2026-04-05) — Claude Code v2.1.92 introduced Ultraplan: draft plans in the cloud, review with inline comments in your browser, then execute remotely or via CLI. Anthropic also launched Claude Code Web at claude.ai/code.

Hugging Face Kernels (2026-04-04) — Hugging Face introduced Kernels as a new repo type at PyTorch conference: shareable collections of optimized binary operations for CUDA, ROCm, Apple Silicon, and Intel XPU. Precompiled artifacts matched to exact GPU/PyTorch/OS combinations, with claimed 1.7x-2.5x speedups over PyTorch baselines.

Cursor 3 (2026-04-02) — Cursor 3 adds an agent-collaboration interface and a browser-based Design Mode for visual annotation of UI elements during coding sessions. Cursor also reported a multi-agent CUDA optimization system built with NVIDIA that delivered a 38% speedup across 235 problems. I use Claude Code instead of Cursor now (with the exception of editing text).

Infrastructure

Q1 2026 hyperscaler earnings: $700B AI capex (2026-04-29) — Big-four hyperscalers reported simultaneously on April 29: combined 2026 AI capex now $700B+, up 77% YoY. Microsoft AI run-rate $37B (+123%). Meta lifted capex to $125-145B and stock fell ~7% after-hours. Amazon AWS +28% to $37.6B.

Anthropic-Amazon $100B+ compute deal expansion (2026-04-22) — Anthropic expanded its Amazon deal to add up to 5GW of new compute, with 1GW of Trainium online by end of 2026. Total commitment over $100B over ten years. Amazon invests $5B at a $380B valuation, with up to $20B more.

Google TPU v8 (8t / 8i) (2026-04-22) — Google introduced 8th-gen TPUs at Cloud Next with a split design: 8t for training, 8i for inference. 8t delivers nearly 3x compute per pod vs Ironwood; 8i connects 1,152 TPUs per pod. OpenAI, Anthropic, and Meta are reportedly buying TPU capacity. Alexis Ohanian estimates Google now controls roughly 25% of global AI compute.

OpenAI $20B Cerebras deal + IPO (2026-04-17) — OpenAI committed $20B+ to Cerebras chips (double the prior $10B), takes warrants for up to 10% of Cerebras equity, and is funding $1B of data center construction. Same day Cerebras filed for IPO at $35B target valuation, over 4x its September 2025 mark. Inference-chip diversification away from Nvidia.

Financing

Anthropic $900B valuation talks (2026-04-29) — Anthropic is weighing a $50B funding round at over $900B valuation, with investors asked to submit allocations within 48 hours. Would top OpenAI’s $852B. Last round was at $380B in February. At least one acquisition bid is reportedly out at $1.05 trillion.

Ineffable Intelligence $1.1B seed (2026-04-27) — Ex-DeepMind researcher David Silver (AlphaGo, AlphaZero) launched Ineffable Intelligence with a $1.1B seed at $5.1B valuation, Europe’s largest seed ever. Backed by Sequoia, Lightspeed, Google, Nvidia, and the UK Sovereign AI Fund. Building a superlearner that discovers knowledge via RL from experience instead of human data.

Google’s $40B Anthropic Investment (2026-04-24) — Google reportedly plans to invest up to $40B in Anthropic in cash and compute, per FT.

Cursor xAI deal (2026-04-22) — xAI paid Cursor $10B for the option to acquire them at $60B. I suspect the xAI IPO complicated the deal, hence the odd structure. Cursor both companies urgently need a coding model to compete with Claude Code / Cursor. xAI has compute (and needs talent), Cursor has talent (and needs compute), so they’re making a trade.

Anthropic $30B ARR (2026-04-07) — Anthropic jumped from $19B ARR in March to $30B ARR in April, surpassing OpenAI’s $24B. $1M+ ARR customer count doubled from 500 to over 1,000 in under two months. Compute is now Anthropic’s binding constraint.