AI Weekly Update - June 9, 2025
Cursor gets valued at $10B, Google pushes yet another Gemini update
Our weekly updates just got bigger! Free subscribers will continue to get the top stories each week, while Paid subscribers will get a few extra bullets each week to keep you even more up-to-date.
last week’s top stories
💰 Cursor raises $900M at $9.9B valuation. Anysphere (Cursor’s parent corp) hit $500 million ARR, then closed a Thrive-led round that included Andreessen Horowitz, Accel, and DST. Funds expand compiler research and enterprise rollout. Read more
🏛️ Anthropic debuts Claude Gov for agencies. The air-gapped model family runs only inside classified networks, embeds retrieval on secure data, and supports chain-of-thought auditing. Early users include intel and defense units. Read more
🚀 Gemini 2.5 Pro preview lifts Elo. Google pushed a June build that gains +24 on LMArena and +35 on WebDevArena while topping GPQA and HLE math reasoning. The release adds longer context and faster parallel sampling. Read more
⚖️ Reddit sues Anthropic over scraping. The complaint alleges mass copying of posts, seeks injunctive relief plus triple statutory damages, and cites licensing deals with OpenAI as precedent. Anthropic has 21 days to answer. Read more
🎬 AMC taps Runway for AI production. The studio will use Gen-3 tools for pre-viz, trailer cuts and contextual marketing assets, stitching directly into its production pipeline and DAM. Runway gains premium training footage. Read more
🛡️ Anduril doubles to $30B value. Founders Fund led a new defense-tech round, pushing revenue past $650 million and underwriting autonomous systems from AI towers to underwater drones. Anduril says margin improves with full-stack integration. Read more
🤖 Amazon pilots humanoid delivery bots. A “humanoid park” near Seattle runs Digit-class robots through obstacle courses beside Rivian vans to gauge last-meter drop-offs. Software uses fleet-scale RL with simulation feedback. Read more
🧪 AI Research of the Week
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
From Apple Machine Learning Research
Jake’s Take: Apple pitted “large reasoning models” (think o3, Sonnet 4 Thinking, Gemini 2.5 Pro) against controllable Tower-of-Hanoi and river-crossing puzzles that scale in compositional complexity. Accuracy climbs at first then nosedives to zero once the move count crosses a small threshold, even when models receive the full solution algorithm. Standard LLMs beat the fancy “reasoning” variants on easy tasks, parity hits in the middle, and both tanks on hard cases, suggesting a need for rethinking the core architecture of most LLMs (or, that Apple might be farther behind and more scared than we thought).
and then, even more news…
Keep reading with a 7-day free trial
Subscribe to Handy AI to keep reading this post and get 7 days of free access to the full post archives.