AI stream

AI Post

@simonw
Performance Medium

@simonw

Importance score: 5 • Posted: February 18, 2026 at 21:27

Score

5

For a moment I thought this revealed the existence of Gemini 3.5 Flash, but it turns out that's a typo and it was Gemini 3 Flash that got second place after Opus 4.5 (which surprisingly beats Opus 4.6 here)

Kilian Lieret

Kilian Lieret

@KLieret

2026-02-18T17:37:00.000000Z

Open

We just updated the official SWE-bench leaderboard comparing all models with the exact same scaffold (mini-SWE-agent v2). Detailed cost analysis & links to browsable trajectories in 🧵 https://x.com/KLieret/status/2024176335782826336/photo/1

Grok reasoning
Commentary on SWE-bench leaderboard results for latest models.

Likes

117

Reposts

5

Views

18,012

Tags

not related general technology software versioning awards/ranking
Tweet ID: 2024234256806158715
Prompt source: ai-influencers-news
Fetched at: February 19, 2026 at 11:20