When you realise that you just joined MBBS and companies start announcing AI doctors who will take your job 5 years later 🤡
A readable stream of AI posts. Open one post to focus on the original content.
When you realise that you just joined MBBS and companies start announcing AI doctors who will take your job 5 years later 🤡
I'm not so sure about this. Not all, but a lot of SaaS moats really do rely on an implementation complexity that's rapidly fading Take SAML for example - a classic example of a feature that is such a nightmare to implement that most SaaS startups delay as long as possible and then hire specialists If that implementation time drops from months to days, it's yet another little piece of moat that just got eroded away
I'm building a node-based tool that turns any SVG into animated SVGs using Gemini 3.1 Pro it preserves the original aesthetics and the results are insane
Researchers built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. And it hit 98.7% accuracy on a financial benchmark (SOTA). Here's the core problem with RAG that this new approach solves: [... full content about PageIndex ...]
Most people still prompt like it’s 2022. Here’s how to go from basic to expert-level: [ bookmark 🔖 this post for later ] Level 1: Surface Prompts - Zero-shot prompt: Just ask without examples and hope for the best. - One-shot prompt: Provide one example to get slightly better results. - Few-shot prompt: Share multiple examples to guide the answer. - Easy tasks: Summarize, rewrite, brainstorm, explain like I'm 5. This is where most stop. It's quick, but basic. You get generic answers, not high-quality output. Level 2: Real Work Zone - Role: Tell the AI who to be and how to sound. - Tone and style: Define the voice, clarity, or formality. - Plan → Act → Summarize: Direct the process. - Define the task: Be specific about what you want. - Add constraints: Set clear limits and boundaries. - Provide context: Share background, audience & restrictions. - Temporary chats: Use ChatGPT without its memory of you. - Define output format: Bullets, tables, or any structure. - Tool policy: Turn web browsing on or off. - Share examples of quality outputs: Set the standard. - Memory management: Keep projects organized. This is where quality improves. You get targeted, practical, and useful results. Level 3: Where the Magic Happens - Pick the right model: Select the best tool for the job. - Thinking vs Fast: Decide if you want thorough or quick answers. - Reasoning instructions: Tell the AI to think step-by-step. - Chain-of-Thought: Guide logic instead of just giving commands. - Iteration loop: Review, revise, and improve responses. - Problem-solving: Focus on the 20% that gets 80% of results. - Combine role, context, examples & revision for expert-level output. The deeper you go, the better your results get. 📌 Get Advanced ChatGPT Guide (free): https://t.co/kOBWfKrBaX 👉 Follow me @AndrewBolis for more and 🔄 Repost this to help others use AI
Visualizes data through AI agents https://github.com/microsoft/data-formulator
Our current so-called "AI" models do not think. Not one bit. Here's a new paper by Apple that proves it.
RAG & Fine-tuning in LLMs, explained visually:
🚨 @AnthropicAI just released their 2026 Agentic Coding Trends Verdict → Everyone has become a developer. We moved from single assistants to autonomous agent swarms. They now form teams, work days on full systems, and let non-techies ship full apps 💥 18-page report in 🧵↓
2 AI agents working on same feature
[Download 496-page PDF eBook] Applied Causal #Inference Powered by #MachineLearning and #AI: https://arxiv.org/abs/2403.02467 ————— #ML #DataScience #Algorithms #Statistics #DataScientist #PredictiveAnalytics
Unicorns have always been used to measure sparks of AGI. (This was written by GPT-2 in February, 2019)
いや,この研究自体に不備があり証明なんてされてない.https://arxiv.org/html/2506.09250v1 そして昨年終盤以降,Erdos未解決問題の証明: https://mathstodon.xyz/@tao/115855840223258103 やグルーオン散乱振幅式のある厳密解を導出: https://arxiv.org/abs/2602.12176 等のAIによる半自律的な発見が可能になり,思考云々など些末事なんですよ…
- Claude for coding. - Supabase for backend. - Vercel for deploying. - Namecheap for domain. - Stripe for payments. - GitHub for version control. - Resend for emails. - Clerk for auth. - Cloudflare for DNS. - PostHog for analytics. - Sentry for error tracking. - Upstash for Redis. - Pinecone for vector DB. You can literally ship a startup from your bedroom now. It’s not that deep bro.
AI agent for code reviews using SOLID principles https://github.com/sanyuan0704/code-review-expert
Way more substantial comments on LinkedIn and Facebook than on X for paper announcements It's been obvious for quite a while that X is lost for science.
If you have a large pool of people, their "jaggedness" cancels out because they have diverse skills and talents. 1000 agents of the same model are not the same thing, they have the same weakspots and, potentially, are more vulnerable to groupthink-like problems than humans.
Jaggedness remains a key feature of LLMs & I have yet to see a clearly articulated argument about why it will disappear. A jagged general intelligence (not quite an oxymoron, as humans are too) still creates lots of bottlenecks that require people & slow many kinds of take-off.
Grok 4.20’s multi-agent system now powers Grokipedia in real time. Grok writes, updates, and perfects entries instantly, while Grokipedia feeds Grok a constantly refreshed, truth-focused knowledge base. No corporate spin. No edit wars. No slow human gatekeepers. Source: @grok, @grokipedia
"But humans will stop using all this software, it will be AI agents instead!" -- Great, then these services will see 10x more usage.
The maximalist form of my thesis is basically this: SaaS is not about code, it is about solving a problem customers have and selling them the solution. Services + sales. If the cost of code goes to *zero*, SaaS will *not* go away. It will *benefit*, since code is a cost center.
Software engineering accounts for nearly 50% of all AI agent tool calls. Healthcare, legal, finance, and a dozen other verticals are barely touched, each under 5%. That's a hundred AI unicorns waiting to be built. https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open
🇺🇸 Elon sat down with Tucker to talk about the future of AI. They covered everything from superintelligence to why the tech needs guardrails as it scales fast. “[My perception is that we] need to take AI safety seriously enough. We need transparency, we need people to know what’s going on.” Source: @elonmusk, @tuckercarlson, @TheCaptainEli, Fox News
No changes recently. Opus 4.6 and Sonnet 4.6 are more intelligent and use more tokens than previous models. If you want less thinking and lower token usage, run /model and set effort to low or medium.
The best way to use AI is an interface to information that lets you deepen and improve your own knowledge and mental models. The worst way to use AI is as a crutch to outsource and forsake your own cognition
Different LLMs. Different Personalities. Different Purpose > GPT-5.2 (OpenAI) • Boardroom consultant energy • Polished, safe, authoritative by design > Claude 4.6 (Anthropic) • Reflective ethics professor vibe • Nuanced, cautious, highly articulate > Gemini 3.1 Pro (Google) • Hyperactive polymath • Jumps across text, video, code, voice seamlessly > Llama 4 (Meta) • Gritty tinkerer energy • Community-driven, hackable, customizable > DeepSeek V3.2 / R1 • Quiet math Olympiad • Minimal words, maximum reasoning > Qwen 3.5 (Alibaba) • Global overachiever • Culturally fluent, pragmatic, business-first > Grok 4 (xAI) • Edgy back-row commentator • Meme-aware, spicy, culturally plugged-in > Mistral Magistral (Mistral AI) • Sleek minimalist • Fast, sharp, zero-bloat responses > Command R+ (Cohere) • Corporate archivist • Structured, factual, citation-driven >Kimi K2.5 (Moonshot AI) • Unblinking memory champion • Detail-obsessed, long-document master
This account keeps posting older papers as new releases with AI generated commentary, but this paper is from June 2025, where it sparked some interesting debate but basically turned out to not be that relevant in the last year as models improved. https://t.co/es7yFdrhE0
🚨This week's top AI/ML research papers: - GLM-5 - Experiential Reinforcement Learning - Image Generation with a Sphere Encoder - World Action Models are Zero-shot Policies - Unified Latents - Fast KV Compaction via Attention Matching - Adam Improves Muon - LUCID - The Molecular Structure of Thought - Arcee Trinity Large Technical Report read this in thread mode for the best experience
A good Claw can already do most lightweight phone “doing” work, and those agents are unoptimized messes right now. Makes me wonder what Apple is giving up by bowing out of the LLM building world. I suspect a lot more than they thought.
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs A comprehensive 114-page paper (2024) exploring fine-tuning techniques from foundational methods to advanced strategies, including extensions to multimodal models and domain-specific applications in medicine and finance. Paper: https://t.co/eK8oH6sOLX Make it your weekend read.
This 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file will make you 10x engineer 👇 It combines all the best practices shared by Claude Code creator: Boris Cherny (creator of Claude Code at Anthropic) shared on X internal best practices and workflows he and his team actually use with Claude Code daily. Someone turned those threads into a structured 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 you can drop into any project. It includes: • Workflow orchestration • Subagent strategy • Self-improvement loop • Verification before done • Autonomous bug fixing • Core principles This is a compounding system. Every correction you make gets captured as a rule. Over time, Claude's mistake rate drops because it learns from your feedback. If you build with AI daily, this will save you a lot of time.
🚨 BREAKING: Someone leaked the full system prompts of every major AI tool in one GitHub repo. You can now see exactly how they built: → Cursor, Devin AI, Windsurf, Claude Code, Replit → v0, Lovable, Manus, Warp, Perplexity, Notion AI → 30,000+ lines of hidden instructions exposed → The exact rules, tools, and personas behind each product 100% open source
Google isn’t trying to win the AI race. They’re trying to own the entire AI Agent ecosystem. While everyone argues ChatGPT vs Claude, Google quietly built: Models → Gemini Pro, Flash, Deep Think, Gemma Design → Stitch, Whisk, Imagen Research → NotebookLM, AI Mode Video → Veo, Flow, Google Vids Coding → Antigravity IDE, Gemini CLI, Jules Agents → A2A, ADK, FileSearch API The scary part? All of these tools talk to each other. That means: 10x faster prototypes End-to-end AI workflows Production-ready agents on GCP The next AI war won’t be model vs model. It’ll be ecosystem vs ecosystem. Save. Share. Build.
Excellent article! "Anatomy of a High-Performance Agent: PEFT" Key Insights: The Problem: 1. Large context windows are inefficient: • Quadratic computational cost • "Needle in haystack" accuracy degradation • Expensive and slow 2. Full fine-tuning is unsustainable: • 70B model = ~140GB • 10 specialized agents = 1.4TB storage • Days of GPU training per agent The Solution: PEFT (Parameter-Efficient Fine-Tuning) LoRA analogy: Instead of recoloring entire coloring book, put transparent overlay with gradient. Technical: • Freeze base model (W₀) • Train tiny adapter matrices (A, B): ΔW ≈ BA • Adapter = 10-100MB vs 140GB full model • Save base model once, swap adapters per task Benefits for Agent Fleets: • 1 base model + tiny adapters instead of full models per task • Shorter prompts (knowledge baked into weights) • Faster inference (fewer tokens to process) • Affordable specialization ─── How This Relates to ULTRON: Memory vs Context Window: • Article: "Large context is inefficient, bake knowledge into weights" • ULTRON: "Persistent memory is efficient, don't re-explain everything" Both solve the same problem: • PEFT: Specialist models with domain knowledge embedded • ULTRON: Memory-driven agents that learn and remember ─── Content Angle: Twitter/LinkedIn: "Google Cloud just published the definitive guide on building specialized AI agents without breaking the bank. The key insight: Large context windows are a performance trap. Quadratic costs, accuracy degradation, expensive inference. The solution: PEFT (LoRA) - bake specialization into tiny 100MB adapters instead of retraining 140GB models. Same principle we've been pushing: Don't stuff everything into context. Build memory that persists. PEFT for weights. Persistent memory for experiences. Both beat the context window tax. 🧠" https://t.co/AakR3DuKHm
Kalian masih pake ChatGPT, Gemini, atau Grok? Ada yang gak mainstream ini ada LLM yang SANGAT UNDERRATED Namanya Qwen. Udah lama sih, tapi entah kenapa gak sepopuler yang lainnya Dan LLM ini SANGAT BAGUS! GRATIS PULA! Nih saya kasih demonstrasi fitur-fiturnya Dari bikin gambar sampai BIKIN GAME! Coba tonton di sini
Not all AI agents are built the same. So what sets them apart? Here’s a breakdown of 10 core types of AI agents you’ll come across in real-world systems, from simple reactive agents to complex multi-agent systems. 1. Task-Specific AI Agent Built for one focused task like summarizing or translating. It follows a fixed process with no learning or adaptation. 2. Reactive Agent Responds to immediate input without using memory or history. Think of it like a reflex - it reacts, not plans. 3. Model-Based Agent Builds an internal map of its environment. Simulates outcomes before acting to make smarter, context-aware decisions. 4. Goal-Based Agent Starts with a goal and works backward. It plans steps, simulates paths, and selects the route that achieves the goal. 5. Utility-Based Agent Chooses actions based on how beneficial they are. It weighs all options and picks the one with the highest value. 6. Learning Agent Improves over time by learning from past actions. Adjusts its strategy using feedback and stores new knowledge. 7. Planning Agent Focuses on long-term strategy. It defines a goal, maps out steps, and adjusts based on progress not just reaction. 8. Reflex Agent with Memory Uses preset rules but with added memory of past inputs. Helps respond better when situations repeat or evolve. 9. Multi-Agent System Agent Works with or against other agents. They share environments, negotiate roles, and coordinate to reach a bigger goal. 10. Rational Agent Always selects the most logical option. It analyzes the full picture, predicts outcomes, and chooses the smartest path. Save this if you're exploring Agentic AI or designing intelligent decision-making systems.
Billions of dollars going to training, thousands of dollars going to independent benchmarking.
Cool! I only had a quick sim earlier today but really enjoyed a number of ideas even unrelated to the claw part, esp around the skills system. In deep learning there were a number of meta learning approaches (Eg MAML paper in 2017) where the goal is to optimize for the model such that it finetunes to any new task in very few steps. Like - the most potent model. I always wondered what the equivalent of that is in traditional software. The most easily forkable repo. Was reminded of that.
yesterday we chatted with @martin_casado and @sarahdingwang on the pod and he happened to do basic math™ on the logic of asics today @taalas_inc launched their HC1 asic that can inference 17k tok/s. Sure, it's a shitty 3.1 8B today which is a 1.5 year gap. But read the details to the HC2 this winter, and do the math — this timeline will converge to 0 in the next 2 years. Build accordingly.
Fascinating Google paper: just repeating your prompt 2 times can seriously boost LLM performance, sometimes pushing accuracy from 21% to 97% on certain search tasks. An LLM reads your prompt left to right, so early words get processed before the model has seen the later words that might change what they mean. If you paste the same prompt again, the model reaches the 2nd copy already knowing the full prompt from the 1st copy, so it can interpret the 2nd copy with the full context. That means the model gets a cleaner “what am I supposed to do” picture right before it answers, instead of guessing too early and sticking with a bad setup. This helps most when the task needs details that appear late, like when answer choices show up before the actual question, because the 2nd pass sees both together in the right order. In the Google tests, this simple trick took one hard search-style task from 21.33% correct to 97.33% correct for a model setting with no step-by-step reasoning. Across 7 models and 7 benchmarks, repeating the prompt beat the normal prompt in 47 out of 70 cases, and it never did worse in a statistically meaningful way. The big deal is that it is almost free to try, it often boosts accuracy a lot, and it shows many LLM mistakes are “reading order” problems rather than pure lack of knowledge. ---- Paper Link – arxiv. org/abs/2512.14982 Paper Title: "Prompt Repetition Improves Non-Reasoning LLMs"
Introducing: built-in git worktree support for Claude Code Now, agents can run in parallel without interfering with one other. Each agent gets its own worktree and can work independently. The Claude Code Desktop app has had built-in support for worktrees for a while, and now we're bringing it to CLI too. Learn more about worktrees:
If you're looking to buy a Mac Mini, wait 4-6 months, a lot of used Mac Minis in mint condition are about to hit the market
First there was chat, then there was code, now there is claw. Ez
Claude Code on desktop can now preview your running apps, review your code, and handle CI failures and PRs in the background. Here’s what's new:
Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: https://www.anthropic.com/news/claude-code-security
1. go to chrome dev tools 2. in memory tab, take a snapshot & download 3. drop it into @cursor_ai @cursor_ai will write python scripts to analyze the snapshot and point out what's making your website feel sluggish
We just open-sourced the system we use to manage 30 parallel AI coding agents per person. 40K lines of TypeScript. 3,288 tests. 17 plugins. Built in 8 days — by the agents it orchestrates. Yes, we used Agent Orchestrator to build Agent Orchestrator. Some numbers: → 500+ agent-hours in 24 human-hours (20x leverage) → 86 of 102 PRs created by AI (84%) → After Day 4, I stopped writing code entirely Spawn agents. Step away. Ship faster.
Here's my AI coding workflow and all the skills I'm using: Idea -> /write-a-prd -> PRD PRD -> /prd-to-issues -> Kanban Board Kanban -> ralph.sh -> Ralph Loop Ralph Loop -> Manual QA Links below to skills
An AI coding bot took down Amazon Web Services https://arstechnica.com/ai/2026/02/an-ai-coding-bot-took-down-amazon-web-services/?utm_campaign=dhtwitter&utm_content=%3Cmedia_url%3E&utm_medium=social&utm_source=twitter
Claude FULL COURSE 1 HOUR (Build & Automate Anything)