AI Weekly Update - May 19, 2025
OpenAI, Cursor, Windsurf, and others fight for AI coding supremacy
what to know for now
💻 OpenAI launches Codex cloud agent. The new Codex product adds an orchestration layer that parallel-executes coding, testing and CI tasks across repos. It ships first to ChatGPT Pro, Team and Enterprise, with Plus rollout queued, and benchmarks show 2× lower latency than Copilot. Read more:
🗂️ Cursor 0.50 ships Tab model + Background Agent. The Tab model now spans multi-file edits with syntax-aware refactor, while the preview Background Agent runs remote parallel tasks that users can attach to or take over mid-flight. Read more
🏄♂️ Windsurf launches SWE-1 model family. SWE-1, SWE-1-lite and SWE-1-mini are frontier-scale models tuned for the full software-engineering lifecycle (planning, refactor, long-running tasks), not only code generation. The startup claims 99% workflow coverage and says a 15-person team trained the suite with startup-budget compute. Read more
🦾 Google debuts AlphaEvolve for continuous RLHF. AlphaEvolve pairs Gemini-2 backbones with an always-on reinforcement loop that retrains on live user interactions. Google says this halves catastrophic-forgetting errors and lifts tool-use accuracy on its internal EvalHub by 18%. Read more
🧪 AI Research of the Week
AlphaEvolve: A coding agent for scientific and algorithmic discovery
From Google DeepMindJake’s Take: AlphaEvolve wires a large-language model into a loop: generate code, run benchmarks, keep winners, mutate losers, iterate. The agent trimmed 4 × 4 complex matrix multiplication from 49 to 48 scalar multiplications, ending a 56-year record set by Strassen. It also produced schedulers and chip-placement heuristics that save compute budget on internal Google workloads.
Automated algorithm search shifts mathematical discovery from pencil to GPU rack. Tool builders gain a pathway to squeeze extra efficiency from every layer of the stack, from compiler passes to datacenter routing. Peer review must keep pace, supplying proofs and safety checks for machine-generated results. It’s likely that research groups will start to plug similar agents into unsolved problems across physics, cryptography, and optimization (so regulators need to keep up).
what to know for later
🕵️♂️ Rumors swirl around Anthropic Sonnet and Opus refresh. New Claude Sonnet and Opus variants with extended “think time” and larger context windows are slated for early June, aiming to close the reasoning-depth gap to OpenAI. Read more
🔬 OpenAI scientist: ‘models can drive novel research’. Chief scientist Jakub Pachocki argues next-gen reasoning models will autonomously generate discoveries if given longer compute cycles, hinting at an open-weights release for researchers. Read more
💰 Perplexity eyes $14B valuation. The search-chat startup is finalizing a $500M round led by Accel, doubling last year’s price and positioning Perplexity as a top-five AI unicorn. Read more
📉 Meta delays Llama 4 Behemoth. The flagship model slips to fall amid performance concerns, raising questions about Meta’s $72B AI spend and prompting internal restructuring talk. Read more