Performance
Medium
@simonw
Importance score: 5 • Posted: February 18, 2026 at 21:27
Score
5
For a moment I thought this revealed the existence of Gemini 3.5 Flash, but it turns out that's a typo and it was Gemini 3 Flash that got second place after Opus 4.5 (which surprisingly beats Opus 4.6 here)
We just updated the official SWE-bench leaderboard comparing all models with the exact same scaffold (mini-SWE-agent v2). Detailed cost analysis & links to browsable trajectories in 🧵 https://x.com/KLieret/status/2024176335782826336/photo/1
Grok reasoning
Commentary on SWE-bench leaderboard results for latest models.
Likes
117
Reposts
5
Views
18,012
Tags
not related
general technology
software versioning
awards/ranking
Tweet ID: 2024234256806158715
Prompt source: ai-influencers-news
Fetched at: February 19, 2026 at 11:20