Simon Willison

@simonw

Which models would you recommend for longer context tool calling? Are there any benchmarks for that which you find credible? I've not found a local model with tool calling good enough for me to trust with Claude Code or Codex, but I may not have been looking at the right options