Personal Assistant
Home Settings
Daily Digest Newsletters Papers Ruby Posts AI Posts Ruby: Blogs and News AI: Blogs and News Gem Updates Gem Discoveries Digest Tweets
Twitter Lists Bluesky Lists RSS Lists Tracked Gems
Sign in Explore
@simonw

Simon Willison

@simonw

For a moment I thought this revealed the existence of Gemini 3.5 Flash, but it turns out that's a typo and it was Gemini 3 Flash that got second place after Opus 4.5 (which surprisingly beats Opus 4.6 here)

Kilian Lieret

Kilian Lieret

@KLieret

· Feb 18

We just updated the official SWE-bench leaderboard comparing all models with the exact same scaffold (mini-SWE-agent v2). Detailed cost analysis & links to browsable trajectories in 🧵

Quoted tweet media
9:27 PM · Feb 18, 2026