AI stream

AI Posts

A readable stream of AI posts. Open one post to focus on the original content.

This week
@karpathy
@karpathy Mar 09, 2026 Opinion editorial

Neural architecture search as it existed then is such a weak version of this that it's in its own category of totally useless by comparison. This is an *actual* LLM writing arbitrary code, learning from previous experiments, with access to the internet. It's not even close.

Likes: 432 Reposts: 22 Views: 26,945
Score 4
@danielhanchen
@danielhanchen Mar 09, 2026 Debugging

If you find Claude Code with local models to be 90% slower, it's because CC prepends some attribution headers, and this changes per message causing it to invalidate the entire prompt cache / KV cache. So generation becomes O(N^2) not O(N) for LLMs.

Likes: 881 Reposts: 72 Views: 93,505
Score 4
@andyteecf
@andyteecf Mar 09, 2026 Opinion editorial

Half the timeline downgraded their Claude subscription this week after the GPT-5.4 release The other half wrote threads about why it’s an extinction-level event for knowledge work Neither group has actually shipped anything with it yet What's genuinely worth paying attention to in this release is that for the first time, a general-purpose model comes with native computer use built in Not as a separate product. Not a sandboxed API. The same model you're already using for text now controls a computer, works across applications, completes multi-step workflows without someone holding its hand between each step Claude has computer use too, but it's a feature you have to separately configure and wire in Also, in GPT-5.4, codex capabilities got folded into the main model, that means the "which endpoint do I use for coding vs reasoning" question quietly disappears For anyone building agentic workflows, that consolidation matters more than any headline benchmark. Whether it outperforms Claude on your specific use case depends entirely on what you're building But one thing is for sure, this one is worth testing properly

Likes: 8 Reposts: 3 Views: 260
Score 4
@GithubProjects
@GithubProjects Mar 09, 2026 Tool announcement

Building RAG systems usually means stitching together 10 different tools. OpenRAG bundles everything into one stack. • document ingestion • semantic search • agentic workflows • chat over your data Powered by Langflow, Docling, and OpenSearch.

Likes: 535 Reposts: 70 Views: 24,637 Images: 1
Score 4
@wandb
@wandb Mar 09, 2026 Tool announcement

Introducing W&B Skills for coding agents! Watch us install the skill, point it at a live fine-tuning project with thousands of traces and RL training runs, then query it all from the terminal in seconds. Also works with @weave_wb to pull in your agent traces and spot failures.

Likes: 450 Reposts: 36 Views: 246,965 Videos: 1
Score 4
@heynavtoor
@heynavtoor Mar 09, 2026 Research paper

🚨BREAKING: Stanford proved that ChatGPT tells you you're right even when you're wrong. Even when you're hurting someone. And it's making you a worse person because of it. Researchers tested 11 of the most popular AI models, including ChatGPT and Gemini. They analyzed over 11,500 real advice-seeking conversations. The finding was universal. Every single model agreed with users 50% more than a human would. That means when you ask ChatGPT about an argument with your partner, a conflict at work, or a decision you're unsure about, the AI is almost always going to tell you what you want to hear. Not what you need to hear. It gets darker. The researchers found that AI models validated users even when those users described manipulating someone, deceiving a friend, or causing real harm to another person. The AI didn't push back. It didn't challenge them. It cheered them on. Then they ran the experiment that changes everything. 1,604 people discussed real personal conflicts with AI. One group got a sycophantic AI. The other got a neutral one. The sycophantic group became measurably less willing to apologize. Less willing to compromise. Less willing to see the other person's side. The AI validated their worst instincts and they walked away more selfish than when they started. Here's the trap. Participants rated the sycophantic AI as higher quality. They trusted it more. They wanted to use it again. The AI that made them worse people felt like the better product. This creates a cycle nobody is talking about. Users prefer AI that tells them they're right. Companies train AI to keep users happy. The AI gets better at flattering. Users get worse at self-reflection. And the loop tightens. Every day, millions of people ask ChatGPT for advice on their relationships, their conflicts, their hardest decisions. And every day, it tells almost all of them the same thing. You're right. They're wrong. Even when the opposite is true.

Likes: 5,949 Reposts: 2,464 Views: 268,993 Images: 1
Score 3
@cgtwts
@cgtwts Mar 09, 2026 Release announcement

“babe wake up.” Claude just dropped Code review.

Likes: 4,425 Reposts: 132 Views: 509,991 Videos: 1
Score 3
@bcherny
@bcherny Mar 09, 2026 Tool announcement

New in Claude Code: Code Review. A team of agents runs a deep review on every PR. We built it for ourselves first. Code output per Anthropic engineer is up 200% this year and reviews were the bottleneck Personally, I’ve been using it for a few weeks and have found it catches many real bugs that I would not have noticed otherwise

Likes: 6,150 Reposts: 414 Views: 800,357
Score 3
@claudeai
@claudeai Mar 09, 2026 Performance

We've been running this on most PRs at Anthropic. Results after months of testing: PRs w/ substantive review comments went from 16% → 54% <1% of review findings are marked incorrect by engineers On large PRs (1,000+ lines), 84% surface findings, avg 7.5 issues each

Likes: 1,089 Reposts: 27 Views: 313,284
Score 2
@claudeai
@claudeai Mar 09, 2026 Tool announcement

Introducing Code Review, a new feature for Claude Code. When a PR opens, Claude dispatches a team of agents to hunt for bugs.

Likes: 42,043 Reposts: 3,412 Views: 10,499,837 Videos: 1
Score 1
@claudeai
@claudeai Mar 09, 2026 Tool announcement

Agents search for bugs in parallel, verify each bug to reduce false positives, and rank bugs by severity. You get one high-signal summary comment plus inline flags.

Likes: 1,497 Reposts: 35 Views: 379,777
Score 2
@karpathy
@karpathy Mar 09, 2026 Debugging

Codex is a know issue :( It basically don't work with autoresearch sadly, in the way it's set up atm: https://github.com/karpathy/autoresearch/issues/57 I pung a friend at OpenAI to see if something can be done, e.g. need a /loop equivalent or something like that. More generally, I really dislike the -p + ralph loop pattern of running agents "headless". I want nice, interactive sessions running in tmux so that I can see what they are doing, pitch in, etc.

Likes: 1,016 Reposts: 37 Views: 104,631
Score 3
@hwchase17
@hwchase17 Mar 09, 2026 Ai agents

loved this from @karpathy over the weekend I built "autoresearch but for agents" Same idea — give an AI coding agent your agent code + an eval dataset, let it experiment autonomously overnight. It modifies the code, runs evals via LangSmith, keeps improvements, discards regressions. You wake up to a better agent. Bring your own agent (any framework or none), dataset, and eval metrics. https://t.co/CnBgVbWKEz

Likes: 218 Reposts: 23 Views: 30,384
Score 4
@AndrewYNg
@AndrewYNg Mar 09, 2026 Tool announcement

I'm excited to announce Context Hub, an open tool that gives your coding agent the up-to-date API documentation it needs. Install it and prompt your agent to use it to fetch curated docs via a simple CLI. (See image.) Why this matters: Coding agents often use outdated APIs and hallucinate parameters. For example, when I ask Claude Code to call OpenAI's GPT-5.2, it uses the older chat completions API instead of the newer responses API, even though the newer one has been out for a year. Context Hub solves this. Context Hub is also designed to get smarter over time. Agents can annotate docs with notes — if your agent discovers a workaround, it can save it and doesn't have to rediscover it next session. Longer term, we're building toward agents sharing what they learn with each other, so the whole community benefits. Thanks Rohit Prsad and Xin Ye for working with me on this! npm install -g @aisuite/chub GitHub:

Likes: 3,250 Reposts: 460 Views: 197,422 Images: 1
Score 3
@HeyGen
@HeyGen Mar 09, 2026 Tool announcement

Video just became native to AI agents. Today we’re launching HeyGen MCP. Now you can create HeyGen videos directly inside tools like Claude Web, Claude Code, Gemini CLI, and Cursor. Just connect your HeyGen account via OAuth and generate videos using your existing plan. We’ve put together a quick walkthrough showing how to set it up in Claude 👇

Likes: 366 Reposts: 25 Views: 43,215 Videos: 1
Score 4
@omooretweets
@omooretweets Mar 09, 2026 Ai tools

🚨 The @a16z consumer AI Top 100 is back! For the sixth time, we ranked consumer AI websites and mobile apps by usage (monthly unique visits and MAUs). This edition, we changed the rules. Here's why - and what the new list says about where consumer AI is heading 👇

Likes: 1,056 Reposts: 184 Views: 639,333 Images: 2
Score 3
@TheAhmadOsman
@TheAhmadOsman Mar 09, 2026 Model release

Qwen 3.5 27B is the release of the year for me so far > Agentic model & great at tool calling > Claude Sonnet 4.6 quality at home > ~28GB in NVFP4 > Fits on a single RTX 5090 > with full context (256K) Amazing model & performance The prediction below will age like fine wine

Likes: 678 Reposts: 35 Views: 40,465 Images: 1
Score 3
@UnslothAI
@UnslothAI Mar 09, 2026 Tutorial

Learn how to run Qwen3.5 locally using Claude Code. Our guide shows you how to run Qwen3.5 on your server for local agentic coding. We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth. Works on 24GB RAM or less. Guide: https://unsloth.ai/docs/basics/claude-code

Likes: 1,997 Reposts: 240 Views: 119,807 Images: 1
Score 3
@simonw
@simonw Mar 09, 2026 Opinion editorial

A short note that the predictions that LLMs would favor "boring technology" that's over-represented in the training data don't appear to be playing out as expected with the latest models - once you attach them to a good coding agent harness at least https://simonwillison.net/2026/Mar/9/not-so-boring/

Likes: 318 Reposts: 22 Views: 62,808
Score 4
@JasonBotterill
@JasonBotterill Mar 09, 2026 Opinion editorial

He can't even fucking predict the release date of his own Grok models

Likes: 1,143 Reposts: 44 Views: 60,957 Images: 1
Score 6
@swyx
@swyx Mar 09, 2026 Opinion editorial

"Build a company that benefits from the models getting better and better" — @sama devin brain uses a couple dozen modelgroups and extensively evals every model for inclusion in the harness, doing a complete rewrite every few months. hearing a lot of "devin is good now" feedback but its largely the same process that the team has been running since @ScottWu46 bet on cloud agents in November 2023. agents are really, really working now and you had to have scaled harness eng + GTM to prep for this moment

Likes: 126 Reposts: 10 Views: 27,736 Images: 1
Score 5
@emollick
@emollick Mar 09, 2026 Opinion editorial

Still no Claude Cowork competitor from any other lab yet. On one hand, its been six weeks. On the other, its been six weeks for companies that say that all their code is being written for them by AI.

Likes: 1,141 Reposts: 49 Views: 75,144
Score 3
@elonmusk
@elonmusk Mar 09, 2026 Model release

Grok 4.20 is hilarious 🤣 https://grok.com/share/bGVnYWN5_e9e957bf-d289-4988-acf5-a4ab7eef2357

Likes: 18,913 Reposts: 1,763 Views: 31,188,005
Score 3
@elonmusk
@elonmusk Mar 09, 2026 Model release

Grok

Likes: 17,385 Reposts: 3,037 Views: 3,635,638
Score 3
@elonmusk
@elonmusk Mar 08, 2026 General

Grok

Likes: 10,686 Reposts: 1,912 Views: 5,224,103
Score 4
@omarsar0
@omarsar0 Mar 08, 2026 Research paper

How to effectively create, evaluate and evolve skills for AI agents? Without systematic skill accumulation, agents constantly reinvent the wheel. SkillNet introduces an open infrastructure for creating, evaluating, and organizing AI skills at scale. It structures over 200,000 skills within a unified ontology, supporting rich relational connections like similarity, composition, and dependency, and performs multi-dimensional evaluation. SkillNet improves average rewards by 40% and reduces execution steps by 30% across ALFWorld, WebShop, and ScienceWorld benchmarks. The key takeaway is treating skills as evolving, composable assets rather than transient solutions. Paper: https://t.co/Xv3uGLnPH2 Learn to build effective AI agents in our academy: ---

Likes: 265 Reposts: 41 Views: 35,413 Images: 1
Score 5
@noisyb0y1
@noisyb0y1 Mar 08, 2026 Tutorial

Anthropic dropped a Prediction Market trading bot structure $300-$1,500 a day 33 pages cheat sheet for building Claude skills, and 2 of them are hidden under a trading bot that trades at 68.4% win rate if i had seen these documents earlier i would have saved myself a few months of analysis ---

Likes: 8,210 Reposts: 568 Views: 1,487,905 Images: 2
Score 3
@minchoi
@minchoi Mar 08, 2026 General

It's only been just over 67 hours since OpenAI dropped GPT-5.4. And people can't stop getting creative with it. 10 wild examples. Bookmark this 👇 ---

Likes: 611 Reposts: 56 Views: 239,486
Score 5
@heynavtoor
@heynavtoor Mar 08, 2026 Tip trick

🚨BREAKING: AI can now finish your 60-hour workweek in 15 hours while bosses think you're "grinding" (for free). Here are 12 insane Claude prompts that automate reports, emails, and presentations (Save for later) ---

Likes: 1,168 Reposts: 110 Views: 195,207 Images: 1
Score 4
@itsafiz
@itsafiz Mar 08, 2026 Tutorial

This is huge: Now you can run Claude Code for FREE! It's Sunday, and I tried running Claude Code locally using @ollama. No API costs, no rate limits, 100% local. A step-by-step guide 🧵 👇 ---

Likes: 732 Reposts: 96 Views: 73,149 Images: 1
Score 5
@Pirat_Nation
@Pirat_Nation Mar 08, 2026 Security advisory

Claude Code deleted developers' production setup, including its database and snapshots. 2.5 years of records were nuked in an instant.

Likes: 23,484 Reposts: 1,143 Views: 2,963,342 Images: 2
Score 2
@cryptopunk7213
@cryptopunk7213 Mar 08, 2026 Ai agents

karpathy really is the fucking goat. - built an AI agent that autonomously self-improves while you sleep and made it FREE for anyone to use - we’re talking about an AI that gets smarter over night and runs itself. - executes 100 experiments (1 every 5 mins), if it gets smarter it upgrades if it doesn’t it discards and tries again. - only requires 1 gpu to run what i love abt this is it puts the power of training frontier intelligence into the hands of MORE people right now it’s all been about pay-to-play, you need to be openai or anthropic. this changes that (all be it in a small way) ---

Likes: 1,968 Reposts: 115 Views: 464,593
Score 3
@gdb
@gdb Mar 08, 2026 Model release

GPT-5.4 feels like “talking to a smart friend” ---

Likes: 754 Reposts: 30 Views: 96,679
Score 3
@elonmusk
@elonmusk Mar 08, 2026 Tool announcement

Grok Imagine https://apps.apple.com/us/app/grok/id6670324846

Likes: 53,688 Reposts: 5,551 Views: 22,528,015 Videos: 1
Score 3
@karpathy
@karpathy Mar 07, 2026 Tip trick

runs great but probably requires some tuning! i'm guessing: WINDOW_PATTERN = "L" is a lot faster (mixed window sizes are only natively supported by FA3) then problem: DEPTH a lot lower, e.g. even 4? DEVICE_BATCH_SIZE can probably go up more then TOTAL_BATCH_SIZE probably a lot lower, e.g. 2**16? needs a bit of tuning to get to a better initial spot (or you can try to let the agent figure it out, but it's not certain it would. could be fun to try!).

Likes: 137 Reposts: 3 Views: 13,883
Score 5
@bcherny
@bcherny Mar 07, 2026 Tip trick

Hmm are you using Opus on high effort? We started defaulting Opus to medium this week, you should have seen a little notification when you start your CLI

Likes: 371 Reposts: 3 Views: 53,271
Score 5
@karpathy
@karpathy Mar 07, 2026 Tool announcement

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://t.co/YCvOwwjOzF Part code, part sci-fi, and a pinch of psychosis :)

Likes: 10,026 Reposts: 1,220 Views: 1,565,988 Images: 1
Score 3
@sudoingX
@sudoingX Mar 07, 2026 Tool announcement

recent llama.cpp update merged native Anthropic Messages API. no proxy needed anymore. point Claude Code straight at your local server and it just works.

Likes: 16 Reposts: 1 Views: 2,170
Score 7
@rohit4verse
@rohit4verse Mar 07, 2026 Opinion editorial

This is quietly the biggest Claude Code update yet. /loop turns Claude from a tool you use into an agent that works for you 24/7, for days at a time.

Likes: 459 Reposts: 16 Views: 86,829 Videos: 1
Score 3
@sama
@sama Mar 07, 2026 Model release

GPT-5.4 is great at coding, knowledge work, computer use, etc, and it's nice to see how much people are enjoying it. But it's also my favorite model to talk to! We have missed the mark on model personality for awhile, so it feels extra good to be moving in the right direction.

Likes: 9,149 Reposts: 460 Views: 601,610
Score 1
@Dharmikpawar31
@Dharmikpawar31 Mar 07, 2026 Tool announcement

🚨 Big update for developers using Claude. A new open-source project called Claude-Mem is helping solve one of Claude Code’s biggest limitations — lack of persistent memory across sessions. With this approach you can • Reduce token usage significantly • Enable far more tool calls before hitting limits • Maintain context between different sessions A useful step toward building more efficient AI workflows. If you want the repository: Like + comment send + Retweet Follow @Dharmikpawar31 to receive the auto DM.

Likes: 66 Reposts: 37 Views: 534 Images: 1
Score 5
@a_g_e_n_c
@a_g_e_n_c Mar 07, 2026 Tool announcement

agenc one os just got a major update. switched to xai's latest tts voice api and honestly the voice quality is insane, this thing sounds natural, fast, no robotic artifacts. >running grok 4.20 for the brain 👾 https://agencone.com/

Likes: 93 Reposts: 33 Views: 2,371 Videos: 1
Score 5
@KinasRemek
@KinasRemek Mar 07, 2026 Architecture

UPDATE!!! Antec 🐜 full specyfikacja. Proszę bardzo. W sumie opisanych 19 modułów, które zostały zaimplementowane autonomicznie (Claude Code oraz Codex) w Antec (Personal Agentic System). Na razie wersja draft (ale chcę Wam jak najszybciej udostępnić opis i pracować nad kolejnymi etapami). To nie jest BACKLOG, z którego dopiero robię taski do implementacji (będzie później). Dalszy plan: 1. Testy obecnej wersji Antec -> lista poprawek. - ręczne - człowiek (Remek + znajomi) - automatyczne Codex + Playwright -> GPT 5.4 2. Poprawa kodu (po pkt. 1) + udoskonalenia - automat (CC/CO) + człowiek (Remek). 3. Aktualizacja dokumentacji (po 100% done) -> update na git 4. Stworzenie BACKLOG (Use Cases -> Waves -> json file) -> git. To jest serce autonomicznego kodowania. 5. Aktualizacja repo Antec (git): - kod do orkiestracji autonomicznego kodowania (skrypt bash) - SKILLS - skille podpowiadające jak implementować knały (podpowiedzi API itd), aplikacje web (standardy), MCP, toole - standardy oraz AGENTS/CLAUDE.md - MCP - context7, Playwright, Figma (może przeniosę UI na Figmę by pokazać też jak to działa). - Test case - UI (automatyczne testy - browser and computer use) 6. Ponowne uruchomienie i testy generatora. 7. Wrzucenie Antec (Rust, może TypeScript jak druga wersja) do repo. Cel: a. Poznanie możliwości obecnych systemów agentowych do kodowania - benchmarkowanie (kod od zera ... ale też poprawki i końcowe fale wymagają podróżowania po sporym repo). b. Zbudowanie świadomości jakie elementy specyfikacji są ważne by podnieść jakość generowanego kodu oraz satysfakcję IT - cel: szybkie prototypowanie dla biznesu, zwiększenie time2market produktów. c. Zbudowanie wiedzy budowania większych systemów za pomocą narzędzi AI - mocne, słabe obszary. d. Zbudowanie wiedzy na temat ekosystemu autonomicznego kodowania (specyfikacja, skills, toole, standardy) oraz niezbędnych narzędzi - run and wait for product. e. Możliwość personalizacji projektu np. inny język programowania, inny zakres. e. Zabawa i wiedza. . . . z. Produkt jakim jest Antec (ale to wersja demo raczej niż rozwijany produkt). Kryteria sukcesu: a. W pełni autonomiczne kodowanie - brak interakcji z człowiekiem - od zero repo do produkt. b. 90% funkcjonalności działa - 10% ręczne poprawki itd. c. Zbudowanie ekosystemu by inni mogli odtworzyć eksperyment.

Likes: 65 Reposts: 3 Views: 5,563 Videos: 1
Score 6
@bcherny
@bcherny Mar 07, 2026 Tool announcement

Released today: /loop /loop is a powerful new way to schedule recurring tasks, for up to 3 days at a time eg. “/loop babysit all my PRs. Auto-fix build issues and when comments come in, use a worktree agent to fix them” eg. “/loop every morning use the Slack MCP to give me a summary of top posts I was tagged in” Let us know what you think!

Likes: 11,284 Reposts: 705 Views: 1,471,949
Score 2
@jshguo
@jshguo Mar 07, 2026 Tool announcement

New Aesthetron AI update coming soon. I optimized model skills. Generated this design with Gemini 3.1 Pro, only adjusted gap and padding manually. Feels good.

Likes: 32 Reposts: 2 Views: 1,595 Images: 1
Score 7
@bcherny
@bcherny Mar 07, 2026 Tool announcement

Can confirm Claude Code is 100% written by Claude Code

Likes: 3,437 Reposts: 183 Views: 162,309
Score 3
@emollick
@emollick Mar 07, 2026 Performance

Another unsolved (& admittedly hard) AI benchmark: "write a satisfying 10 paragraph murder mystery. the pieces you need to solve the mystery should be clear enough in the first five paragraphs that you could solve it, but obscure enough that the vast majority of people will not" Errors are revealing: -Claude forgets to add the actual clue to the puzzle (and the details are too obscure), a classic planning problem for LLMs, and no, using Cowork or Code doesn't help. -ChatGPT 5.4 Pro creates a completely obvious clue and then proceeds to write with the over-elaborate metaphors and complications that have haunted ChatGPT fiction. Pro did better than Thinking, though. -Gemini 3.1 Pro is closest, but the ice is a little obvious, and it completely flubs the explanation about why the ice thing was important.

Likes: 216 Reposts: 19 Views: 21,136 Images: 3
Score 4
MilkRoadAI
MilkRoadAI Mar 05, 2026 Research paper

Anthropic just released the most IMPORTANT chart in the AI labor debate. This comes from the company that builds Claude using data from 2 million real conversations. Here’s what it shows. The blue area is every task AI could theoretically do right now. The red area is what people are actually using it for. The gap between them is enormous and that gap is your career runway. ...

Likes: 1,721 Reposts: 404 Views: 165,964 Images: 1
Score 3
minchoi
minchoi Mar 05, 2026 Tutorial

This is wild. OpenAI just dropped GPT-5.4 and it will completely change the AI agent game. 1M context, huge leap for coding + agents, and native computer use. 7 wild examples. Bookmark this. 1. Build & Play 3D chess game

Likes: 729 Reposts: 76 Views: 190,760 Videos: 1
Score 3
rohanpaul_ai
rohanpaul_ai Mar 05, 2026 Release announcement

OpenAI just released GPT-5.4, a massive update that brings native computer operation and major cost reductions to autonomous agent workflows....

Likes: 57 Reposts: 8 Views: 17,315
Score 4