Handy AI

Handy AI

Anthropic preps Sonnet 5, while OpenAI plans to strike back with GPT-5.3

AI Weekly Update - February 2, 2026

Jake Handy's avatar
Jake Handy
Feb 02, 2026
∙ Paid

🎆 Tired of having to explain AI stuff to your coworkers? Share Handy AI with them so that they can get the most important AI news delivered weekly to their inbox (in addition to our high quality editorials).

Share Handy AI

last week’s top stories

🕵️ Claude Sonnet 5 rumor cycle heats up. A cluster of X and Threads posts claims Claude Sonnet 5 lands this week, with talk of a larger context window and tighter system policies. If Anthropic is rolling prompt and classifier changes ahead of a release, expect behavior shifts in refusals and tool-use safety. Read me

🔮 GPT-5.3 chatter picks up. eWeek rounded up leak claims around a GPT-5.3 preview, including higher throughput, longer-context upgrades, and pricing pressure on competing APIs. The plausible angle is product packaging: staged rollout across ChatGPT tiers, then API, then extra endpoints once eval coverage looks sane. Read me

🧑‍💻 Altman tees up Codex launches. Sam Altman posted that Codex-related launches arrive over the next month, starting next week. Codex has evolved from autocomplete into repo-level agents, so expect more surface area: new IDE hooks, task runners, policy gates, and tighter sandboxing. OpenAI also hinted at moving its preparedness posture toward “Cybersecurity High”. Read me

🧯 OpenClaw extensions show supply-chain risk. Tom’s Hardware reports more than a dozen OpenClaw extensions packaged as backdoors, using reverse proxying to hand remote control to an attacker. Agent ecosystems amplify this problem because extensions often hold API keys, local file access, and permission to click around the web. Read me

🌐 Chrome gets an agent with a mouse. Google added Gemini 3 Auto Browse in Chrome, a side-panel agent that can research, fill forms, shop, and handle multi-step flows. Transactions still require user approval, but the agent can click through pages, which raises the usual prompt-injection and malicious DOM traps. Read me

🎮 Genie 3 builds explorable worlds. Project Genie lets users sketch a world with text or an image, then walk around while Genie 3 generates the next frames in real time. Under the hood, this is a world model that couples visual generation with a learned dynamics prior, which is valuable for games, simulation, and robotics training data. Google is shipping it as a Labs prototype to AI Ultra. Read me

🎬 Aronofsky experiments with AI animation. Darren Aronofsky’s studio Primordial Soup produced an American Revolution series using AI tools for animation while keeping human voice actors in the loop. The pipeline question matters: once you can generate consistent characters and scenes across episodes, studios get a new cost curve for mid-budget storytelling. Read me

📄 Amodei drops a long warning. Anthropic CEO Dario Amodei published a January 2026 essay framing powerful AI as an imminent governance problem rather than a distant research topic. He focuses on three failure modes: bio risk, cyber offense, and economic concentration, plus practical levers like evals, deployment gating, and transparency. Even if you disagree with the timeline, the essay is a useful map for what serious safety work looks like beyond PR. Read me

🎙️ ElevenLabs releases The Eleven Album. ElevenLabs launched The Eleven Album, a studio-style release co-created with artists using its Eleven Music composition model. This is less about one album and more about the workflow: prompt-to-arrangement, iterative versions, then distribution to mainstream streaming. The music industry will argue over credit and royalties; the platform teams will focus on fingerprinting, provenance, and rights metadata at scale. Read me

🎵 ONCE ships music distribution MCP. ONCE published a Model Context Protocol server that exposes music distribution as tools an agent can call: metadata ingest, asset upload, routing, and status checks. The technical shift is composability, where the same MCP server can plug into multiple clients, so your “release manager” becomes a chat interface with audit logs. Read me

🧪 OpenAI ships Prism for scientists. Prism is OpenAI’s LaTeX-native writing and collaboration workspace, wired into GPT-5.2 for drafting, editing, and claim checking. It competes with Overleaf by bundling authoring, citation management, and model-assisted review inside one UI. Read me

💸 Nvidia wobbles on OpenAI money. Reports around a mega Nvidia investment in OpenAI hit turbulence, and Jensen Huang publicly pushed back on the idea of a stalled deal. Even a smaller check matters, because funding rounds at this scale influence who gets priority access to hardware, cloud capacity, and distribution partners. The meta-signal: capital markets want receipts from AI spend, instead of another hype loop. Read me


🧪 AI Research of the Week

The Quiet Contributions: Insights into AI-Generated Silent Pull Requests
From Idaho State University

Jake’s Take: This is the first study to tackle something we’re all starting to notice in the wild: AI agents are now submitting pull requests (approval requests for code changes), and they’re doing it silently (no explanation, no discussion, just code dropped into a repo). The researchers analyzed nearly 5,000 of these “silent PRs” from five different AI agents across popular Python projects, trying to figure out why some get merged and others get rejected when there’s literally zero context to go on. They looked at whether these ghost contributions actually help or hurt, examining code complexity, quality issues, and security vulnerabilities.

The big implication here is that as AI becomes a regular contributor to open source, we need better ways to evaluate code that shows up without the usual human context of “here’s why I did this.” It’s a fascinating, albeit incomplete, glimpse into a future where your next code reviewer might be wondering whether the author was human or machine.


and then, even more news…

🔍 Google Search turns into chat. Google made Gemini 3 the default engine for AI Overviews and added follow-up questions that jump straight into AI Mode from the results page. For publishers, this pushes more answers into the SERP layer, so traffic economics get worse while attribution gets fuzzier. Read me

User's avatar

Continue reading this post for free, courtesy of Jake Handy.

Or purchase a paid subscription.
© 2026 Jake Handy · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture