Good news; The same research team in a more recent RCT found that AIs prompted to act as a tutor improved learning outcomes! https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358
A readable stream of AI posts. Open one post to focus on the original content.
Good news; The same research team in a more recent RCT found that AIs prompted to act as a tutor improved learning outcomes! https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358
Grok automatically translating and recommending 𝕏 posts from other languages is starting to work
Anthropic just dropped the Claude Certified Architect exam. 13 free courses, 60 questions, 2 hours. Build, orchestrate, and master multi-agent AI systems.
Agents were getting confused by our pricing. Turns out JS-based sliders are not the best way for them to parse things. So today I’m shipping https://resend.com/pricing.md Plus, content negotiation (Accept: text/markdown). Pricing is now machine-readable for AI agents and LLMs.
Let me explain what I mean using your chess analogy... Imagine a world where chess doesn't exist. In this world, humanity encounters an alien species, and they say "let's play a game of Glurg, it's our traditional pastime. Here are the rules, see you tomorrow" -- and it's the rules of chess. My claim is that following this interaction, a working group of the world's best minds, leveraging current externalized cognitive infrastructure (computers, the internet, etc.) would be able to analyze the rules and develop a working 3000 Elo chess engine within 24 hours, in time for the match. Give them an extra 3 weeks and they'd have a 3500 Elo engine that's 10x more compute efficient. So human intelligence is already at a level where we can go from "here are the rules" to "I can play at 3000 Elo" immediately. Not optimal yet, but not too far off.
Vertex AI helps improve model performance with minimal infrastructure overhead. Check out our new codelab to learn how to fine-tune Gemini 2.5 Flash and walk through the complete SFT workflow using the Vertex AI SDK for Python → https://cloud.google.com/blog/topics/developers-practitioners/mastering-model-adaptation-a-guide-to-fine-tuning-on-google-cloud?utm_source=twitter&utm_medium=unpaidsoc&utm_campaign=fy26q1-googlecloud-blog-ai-in_feed-no-brand-global&utm_content=-&utm_term=-&linkId=59008331
SOMEONE PASTED GOOGLE'S TURBOQUANT PAPER INTO CLAUDE & BUILT A TRADING BOT IN MINUTES THE BOT MADE 3,317 PREDICTIONS AND TURNED $1,500 INTO $83,115 ON POLYMARKET IN 72 HOURS THE PAPER WAS FREE. CLAUDE COSTS $20 A MONTH
Yesterday I gave Claude Cowork everything it needed to do my taxes and sent it loose on TurboTax Claude absolutely flew through the process. Incredibly confident, very few follow up questions It’s so over for accountants
🚨 Cambridge researchers just tested what happens when you overload an AI's memory with irrelevant data. They found a complete collapse of modern RAG systems. Not a minor hallucination. A total failure of the exact retrieval architecture that every enterprise AI relies on to access private data. The models simply drowned in the noise. The researchers tested standard Retrieval-Augmented Generation (RAG) and filtering models like Self-RAG. They fed them information but slowly increased the ratio of distracting, low-quality documents. Here is what they found. Current read-time filtering failed completely. When the ratio of distractors hit 8:1, the accuracy of standard RAG systems plummeted to 0%. The AI lost the ability to find the truth. It exposed a massive architectural flaw. We currently store every single document an AI reads, regardless of quality, and force the model to sort through the garbage at query time. It is highly inefficient and fundamentally broken. The biological fix. The researchers built a new system called "Write-Time Gating" modeled after the human hippocampus. Instead of saving everything, it evaluates novelty, reliability, and source reputation before the data is even stored. And then there is the finding that changes how we build AI: hierarchical archiving. When beliefs update, the system does not delete the old data. It deprioritizes it, maintaining a version history just like the human brain. The result? The write-gated system maintained 100% accuracy even at massive distractor scales, all while costing one-ninth the compute of current systems. The researchers made it clear. When you dump raw, unfiltered data into a database and expect the LLM to figure it out later, you are building a system designed to fail at scale. No reliable retrieval. No cost control. No accuracy guarantees. Nothing. Right now, companies are building massive vector databases, throwing every piece of corporate documentation into them, and assuming the AI will magically find the signal in the noise. Stop treating AI memory like a hard drive. Start treating it like a biological filter. Build the gate at the entrance, not the exit.
Curso de Microsoft para crear Agentes de IA. En Español, desde cero y son 12 lecciones. MCP, RAG, multiagentes y más ↓
🚨BREAKING: Someone compiled every free AI agents resource from Microsoft, Google, OpenAI, Anthropic, and Hugging Face in one place. You can learn: - LangChain, LangGraph, CrewAI, AutoGen, OpenAI Swarm - RAG agents, multi-agent systems, task management, conversational AI - No-code agents with n8n, Vapi, and low-code workflows - 100+ resources with full courses, notebooks, and video lectures 100% Opensource.
🚨ÚLTIMA HORA: CLAUDE YA PUEDE CREAR APPS COMPLETAS DESDE CERO Sin equipo. Sin código. Sin dinero. Solo necesitas saber como usarlo. Aquí tienes 8 prompts para empezar a probarlo des de ya: 🔖 Guárdalos, lo agradecerás.
I still remember when people thought "prompt engineering" was going to become a real career.
Anthropic CEO: “ I have engineers within anthropic who don’t write any code, they just let Claude write the code and they edit it and look it over” “At anthropic writing code means designing the next version of Claude it self, so we essentially have Claude designing the next version of Claude itself, not completely but most of it”. In the last 52 days, the Claude team dropped 50+ major feature launches. This is literally INSANE.
MSA breaks the 100M token barrier Memory Sparse Attention achieves unprecedented 100M token context lengths with near-linear complexity. The architecture maintains 94% accuracy at 1M tokens while outperforming RAG systems and frontier models, using end-to-end sparse attention with document-wise RoPE.
How to 10x your Claude with 4 .md files
🚨 DeepSeek just got a HUGE upgrade It's been upgraded on the web app, and the outputs are much better. Check it out! They could be fine-tuning the outputs with their new generation (V4) model, or this could even be our first look at the V4 series of models, not confirmed yet.
今日も1日、お疲れ様でした(`・ω・´)ゞ 色々調整しながら、最カワを目指して\( *'ω'*)/ pony系からillustrious系に移行する為に、頑張ってみる! 自作LoRAで、だいぶもなかちゃんの顔になってると思う🥴✨️
someone just dockerized an entire AI coding workstation and it's kind of insane one docker compose up and you get: → Claude Code with a browser UI → Gemini, Codex, Cursor, TaskMaster CLIs → Playwright + Chromium, pre-configured → 50+ dev tools (pandas, ffmpeg, prisma, gh...) no config. no debugging "why won't chromium run in docker" uses your existing Claude Max/Pro subscription
ZITのLoRA、LTX-2.3用LoRAを作成しました。 Ruriさんです。 久々のモデル公開です。モデル制作者であることを思い出しました(笑) https://huggingface.co/Kotajiro/ZIT_ruri_LoRA https://huggingface.co/Kotajiro/LTX23-ruri_LoRA
Prompt Engineering is a SCAM. Please take it off your resume. The biggest lie on Tech Twitter right now is that you need to be an "AI Whisperer" to build software in 2026. Here is the reality check: If you need a 600-word prompt with 14 bullet points just to generate a stable React component... the AI isn't the problem. Your architecture is garbage. We spent the last few years teaching people to type "Act as a senior 10x developer and..." Modern models are now smart enough to ignore the fluff. They don't need magic words. They need Constraints. What actually separates a Senior Engineer from a "Prompt Bro" today: 1System Boundaries: Knowing exactly where your Next.js frontend stops and your backend microservice begins. 2Data Contracts: Defining strict schemas and types before you let the AI write a single loop. 3State Management: The one thing autonomous agents still hallucinate on a daily basis. Stop trying to trick the machine with psychological hacks. Start feeding it clean, modular system architecture. If your only technical moat is "writing really good prompts," someone who actually understands database indexing is going to take your job by Q3. Good engineering fixes bad prompting. Good prompting cannot fix bad engineering.
The research team (including @hamsabastani who is on X) found that letting students just use AI resulted in them using it to accidentally shortcut learning But both that study and a separate RCT found that AIs prompted to act as a tutor improved learning https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6423358
I wish the corpus from the era was big enough that we could do counterfactual history. I wonder if you can generate enough synthetic data to get it to work. If I could, I'd love to assign a Victorian agentic scientist to discover the luminous aether. Very @nealstephenson
Very cool work, I wonder what other eras have a large enough corpus for training? https://huggingface.co/spaces/tventurella/mr_chatterbox
Want to talk to the past? Here is an LLM "trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library." Quite different from an LLM roleplaying a Victorian.
Claude Code 2.1.87 is now available. 1 CLI change Highlights: • Cowork Dispatch messages deliver reliably, ensuring dispatched communications reach recipients Full details are in thread ↓
Working on improving this. A bunch of efficiency wins incoming.
I do believe that a large collective of the smartest humans, aided by external tools, sits very close to the optimality bound -- i.e. humans should be able to solve any solvable problem (where the required information is available) if they pay enough attention to it
One of the biggest misconceptions people have about intelligence is seeing it as some kind of unbounded scalar stat, like height. "Future AI will have 10,000 IQ", that sort of thing. Intelligence is a conversion ratio, with an optimality bound. Increasing intelligence is not so much like "making the tower taller", it's more like "making the ball rounder". At some point it's already pretty damn spherical and any improvement is marginal. Now of course smart humans aren't quite at the optimal bound yet on an individual level, and machines will have many advantages besides intelligence -- mostly the removal of biological bottlenecks: greater processing speed, unlimited working memory, unlimited memory with perfect recall... but these are mostly things humans can also access through externalized cognitive tools.
A weird part of working at Anthropic: getting a few of these each day
As it turns out, it's empirically much less efficient than Vizier-style Bayesian hyperparameter search
Claude code is an AI agent that can read your codebase, reason about what to do, run commands, and lots more. And in this in-depth handbook, Vahe teaches you how to use it, from initial set up to advanced features. You'll learn about parallel workflows, skills and rules, security, and lots more - there's even some fun starter projects to try.
False equivalence is a trap I’ve been consciously steering away from in my work the last 5 years. in Zuck’s case it cost him @GoogleDeepMind. in content/strategy it is common to go wide than deep: “oh we do a, b, c, and d” and put equal weight on all of them. But the world is not fair and power laws compound. Most school systems, bureaucrats, managers, and content curators are not set up for one thing to matter 50x more than the next thing. False equivalence killed my first devtools startup. False equivalence plagues policy making in my home country. False equivalence makes you underpay your top performers and spend too much time on lost causes. Rules: Carefully bet on a very small set of things. Don’t hedge, but keep reversibility. Set triggers/levels to monitor if you are wrong. Set tests to DOUBLE DOWN EARLY if you are more right than you thought.
Curious if there has been any good articles written on the impact of VLMs on low-vision and blind people. The advent of a universal text reading, and visual description system seems like it would be a big advance as a result of AI, but haven't seen anything written about it.
X search, long broken, has finally been fixed. All it took was a multi-hundred-million-dollar AI model (Seriously, X search is notoriously bad, but Grok is very good at this specific task, though it is funny that I have to burn a ton of tokens to do a keyword search on the site)
A lot of folks talk about "escaping the permanent underclass". If AGI pans out, the future class divide won't be based on wealth, but on cognitive agency. There will be a "focus class" (those who control their attention and actually do things) and a "slop class" (those whose reward loops are fully RL-managed by AI)
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
No. Closed Labs don't publish research papers on their mainstream models.
Do you think OpenAI didn't profit from all the stuff published/open-sourced by FAIR, DeepSeek and all the others?
I was told by very reliable sources here on this platform that: • chatgpt killed google • claude killed software engineering • ai videos killed hollywood • deep learning killed classical ml • long context killed rag • mcp killed apis • laptops killed desktops • tablets killed laptops • native apps killed the web We never learn.
Nobody gives a clear answer on how to become an AI Engineer. So here's mine. Practical. No fluff. The skills, why they matter, and how to build them 👇 Skill 1: LLM Fundamentals Tokens, context windows, sampling, why models hallucinate. Resource: Read the model cards. Actually read them. ...
> bookmark 60 claude skills > save 60 workflows > star 60 repos > you built a collection > he built a product > you organized repos into folders > he organized customers into Stripe > you said "i'm researching" > he said "good enough" > that was the whole difference
For people who keep asking what to build in AI Engineering > Build your own Reasoner (Chain of Thought implementation) > Build your own Agent loop (ReAct pattern) > Build your own Inference Server (in C++/Rust) > Build your own Transformer from scratch (Attention is all you need) > Build your own Vector Database (HNSW index) > Build your own RAG pipeline > Build your own Flash Attention kernel (CUDA) > Build your own Quantization library (Int8/FP4 implementation) > Build your own Mixture of Experts (MoE) routing layer > Build your own Distributed training loop (FSDP/Tensor Parallelism) ...
Research papers you must read for AI Engineer interviews: 1. Attention is all you need (Transformers) 2. LoRA (Low rank adaption) 3. PEFT ( Parameter Efficient Fine Tuning) 4. VIT (Vision Transformers) 5. VAE (Variational Auto Encoder) 6. GANs ( Generative Adversarial Networks) 7. BERT ( Bidirectional Encoder Representation from Transformers) 8. Diffusion Models (Stable Diffusion) 9. RAG (Retrieval Augment Generation) 10. GPT (Generative Pre-trained Transformers)
Guys, it’s time. There is the appetite now for building a future that doesn’t rely on the kindness of private companies. I pledge to upload all my sessions over the last 3 years. To do this right we need a way to anonymise this and strip secrets https://github.com/0xSero/ai-data-extraction
we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? https://t.co/PmRz0vURni
Naive RAG vs. Agentic RAG, explained visually: Naive RAG has well-known failure modes: - It retrieves once and generates once. If the context isn't relevant, it can't search again. ...
Asking Codex to build a SimGothicManor game and really enjoying how much of its internal planning monologue has become obsessed with tongue-in-cheek gothic, such as worrying about "scope creep in a velvet cape"
For everyone with 16 or less GB ram that want to run a coding model locally. This is an autocomplete code model that I’ve noticed does pretty well. Not as good as Cursor tab but good enough https://huggingface.co/zed-industries/zeta-2
👋 Roughly, the more tokens you throw at a coding problem, the better the result is. We call this test time compute. One way to make the result even better is to use separate context windows. This is what makes subagents work, and also why one agent can cause bugs and another (using the same exact model!) can find them. In a way, it’s similar to engineers — if I cause a bug, my coworker reviewing the code might find it more reliably than I can. In the limit, agents will probably write perfect bug-free code. Until we get there, multiple uncorrelated context windows tends to be a good approach.