Personal Assistant
Home Settings
Daily Digest Newsletters Papers Ruby Posts AI Posts Ruby: Blogs and News AI: Blogs and News Gem Updates Gem Discoveries Digest Tweets
Twitter Lists Bluesky Lists RSS Lists Tracked Gems
Sign in Explore
@emollick

Ethan Mollick

@emollick

Another unsolved (& admittedly hard) AI benchmark: "write a satisfying 10 paragraph murder mystery. the pieces you need to solve the mystery should be clear enough in the first five paragraphs that you could solve it, but obscure enough that the vast majority of people will not" Errors are revealing: -Claude forgets to add the actual clue to the puzzle (and the details are too obscure), a classic planning problem for LLMs, and no, using Cowork or Code doesn't help. -ChatGPT 5.4 Pro creates a completely obvious clue and then proceeds to write with the over-elaborate metaphors and complications that have haunted ChatGPT fiction. Pro did better than Thinking, though. -Gemini 3.1 Pro is closest, but the ice is a little obvious, and it completely flubs the explanation about why the ice thing was important.

Post media Post media Post media
2:34 AM · Mar 7, 2026