Personal Assistant

Home Settings

Daily Digest Newsletters Papers Ruby Posts AI Posts Ruby: Blogs and News AI: Blogs and News Gem Updates Gem Discoveries Digest Tweets

Twitter Lists Bluesky Lists RSS Lists Tracked Gems

Sign in Explore

Ethan Mollick

@emollick

This matches the general feeling on the big Chinese open source models. They have great benchmarks and near-frontier status on some coding, but there is a larger gap with the the big closed models than the benchmarks would indicate when it comes to real work and general “smarts”

Flo Crivello

@Altimor

· Feb 18

But every time we've evaluated them, we've found the same thing: that their real life performance, for agentic behavior, and outside of coding use cases, falls extremely short of what they show on evals.

6:33 PM · Feb 18, 2026