AI stream

AI Post

@danielhanchen
Debugging Medium

@danielhanchen

Importance score: 4 • Posted: March 09, 2026 at 21:46

Score

4

If you find Claude Code with local models to be 90% slower, it's because CC prepends some attribution headers, and this changes per message causing it to invalidate the entire prompt cache / KV cache. So generation becomes O(N^2) not O(N) for LLMs.

Unsloth AI

Unsloth AI

@UnslothAI

2026-03-09T21:43:56.000000Z

Open

Note: Claude Code invalidates the KV cache for local models by prepending some IDs, making inference 90% slower. See how to fix it here: https://unsloth.ai/docs/basics/claude-code#fixing-90-slower-inference-in-claude-code https://x.com/UnslothAI/status/2031123848729600371/photo/1

Grok reasoning
Practical fix for performance issue in Claude Code with local models, from Unsloth AI founder, useful for developers.

Likes

881

Reposts

72

Views

93,505

Tweet ID: 2031124589557002457
Prompt source: ai-news
Fetched at: March 10, 2026 at 06:02