AI stream

AI Posts

A readable stream of AI posts. Open one post to focus on the original content.

This week
C
CasJam Unknown date Ai llm

How I write & build my newsletter with a Claude Code Skill (and @kit). Your browser does not support the video tag. Posted Mar 27, 2026 at 9:55PM

Likes: 0 Reposts: 0 Views: 0
Score 10
i
ianneo_ai Unknown date Ai llm

@ruby_runner retweeted 又发现个好东西!Claude Code 上网这事,一个 Skill 就搞定web-access。GitHub 开源的。装完 Claude 直接能浏览网页、操作浏览器说下它解决了啥问题自带的上网能力很拉。WebFetch 经常抓回来乱码,WebSearch 搜出来的东西你还得自己筛。遇到需要登录的页面?直接歇菜这个 Skill 的思路很暴力——直接接管你本地已经登录的 Chrome不用配 cookie,不用搞 token你 Chrome 里能打开的页面,它都能操作然后它还会自己挑工具——简单页面用 curl,需要 JS 渲染的走 Chrome DevTools Protocol,要搜信息的走搜索引擎 Posted Mar 28, 2026 at 7:55AM

Likes: 0 Reposts: 0 Views: 0
Score 10
f
faridmovsumov Unknown date Ai llm

@ruby_runner retweeted Shopify made an important change to the App Store's robots.txt file, allowing popular LLMs to access content on search pages.Previously, bots couldn't check the page because rules didn't allow them to index these pages, so they relied on outdated articles online about "The best x Shopify apps" to get results. Now it will check the results against the main source of truth, see real rankings, and make better suggestions to users. Results: LLMs will make better app suggestions to merchants. Note: After this change, I tested with ChatGPT, and it verifies that the content is now accessible, but Claude is still getting some errors. I hope this will also be resolved soon. Farid @faridmovsumov This is surprising to me. Is Shopify really blocking the Claude bot in the era of AI from fetching content from the Shopify App Store search results page? Merchants are asking LLMs for app suggestions, and instead of using a trusted source like an app store, bots are relying on outdated articles. Posted Mar 25, 2026 at 6:39PM Posted Mar 28, 2026 at 6:40AM

Likes: 0 Reposts: 0 Views: 0
Score 10
f
faridmovsumov Unknown date Ai llm

This is surprising to me. Is Shopify really blocking the Claude bot in the era of AI from fetching content from the Shopify App Store search results page? Merchants are asking LLMs for app suggestions, and instead of using a trusted source like an app store, bots are relying on outdated articles. Posted Mar 25, 2026 at 6:39PM

Likes: 0 Reposts: 0 Views: 0
Score 10
s
semrush Unknown date Ai llm

@ruby_runner retweeted Your buyers discovered you on TikTok. Validated you on Reddit. Got a second opinion from ChatGPT. And your attribution model has no idea any of that happened 👀 This isn't a future problem, it's already here. • Google holds 73% of discovery across 41 major surfaces (not the 90%+ most marketers plan around)• The other 27% is where opinions form, and decisions get made without you• 43% of consumers have already discovered a brand through AIHere's what the multi-platform discovery can look like 👇 Posted Mar 27, 2026 at 6:21PM

Likes: 0 Reposts: 0 Views: 0
Score 10
A
AYi_AInotes Unknown date Ai llm

@ruby_runner retweeted 2026科学上网指南 5怎么验证你的 IP 是否干净?连上之后别急着去用Claude,先做一次验证,确认IP身份确实变了。方whoer.net htipinfo.io https://t.co/wRYUVY9Son,重新看那两个字段:ASN type 应该显示 residential 或者 isp,不再是 hosting / data center了。IP位置应该显示美国或新加坡的具体城市和ISP名称。一个小细节,我切换之后第一次查IP,看到ISP那一栏显示的是美国一个真实的住宅宽带运营商名字,不是什么"XXX Cloud"或者"XXX Hosting"。那一刻才真正理解"住宅IP"这三个字到底什么意思——它不是模拟的,是真的来自一个物理位置上的家庭宽带通道。看到这个结果就对了,这时候去开Claude,功能应该是完整的:代码解释器、长输出、联网搜索都在。我把这个前后对比的过程叫"IP洗脸",洗之前照镜子看看自己多脏,洗之后再照一次确认干净了。听起来有点蠢,但真的有用。 阿绎 AYi @AYi_AInotes 最近Claude封号潮大家应该都感受到了,Anthropic为了防国内AI公司蒸馏,在大规模清洗代理IP,用普通机场的正常用户直接被误伤。 我之前也是自建VPS+机场,Claude动不动降智后来,用Claude写代码,前一天还好好的,第二天突然回复变短了,代码解释器消失了,问它一个稍微复杂点的问题,给你回一段敷衍得不行的东西。 GPT也是,响应明显变慢,有时候连图都不给你画了, 我一开始以为是模型在偷偷降级,甚至怀疑过是不是官方在搞A/B测试,把我分到了缩水组。 后来我才搞明白一件事,大概率不是模型的问题,是我的IP出了问题。 这篇文章讲清楚三件事: 怎么判断你的IP是不是已经被平台标记了 住宅IP和数据中心IP到底差在哪 以及怎么三分钟切到一个干净的静态住宅IP 一、先搞清楚一件事:你的IP有身份 我一直觉得大部分人对IP的理解停留在"能连上就行"这一层,我之前也是。 但其实每一个IP地址在AI平台的风控系统眼里,都有一个身份标签。这个标签决定了平台怎么对待你——给你完整功能,还是悄悄给你降权。 怎么查?打开 whoer.net 或者 ipinfo.io,看两个字段就够了: 第一个,ASN type 如果显示的是 hosting 或 data center,那在平台风控眼里,你就不是一个正常用户,你是一个"代理用户"。 第二个,IP位置 如果显示的国家跟你选的节点对不上,说明这个IP的归属信息已经乱了,平台更不会信任你。 我之前用普通机场的时候随手查了一次,ASN type一栏,一眼hosting。 难怪Claude老是给我阉割版回复。 我把这个叫"IP身份检查",在你折腾任何别的东西之前,先做这一步,只要ASN type不是 residential(住宅),OpenAI、Claude、X这些风控严格的平台,就有概率对你降权。 你的Pro会员费没少交,但享受的可能是免费版的体验。 二、住宅IP和数据中心IP,到底差在哪?搞清楚自己的IP身份之后,下一个问题自然就来了——那什么样的IP才算"干净"? 这得从源头说起。 数据中心IP,就是机房批量生成的IP,成本低,速度也不差, 但问题是几千人共享同一个IP段。 这些IP段在AI平台的风控数据库里,基本都被标记过了。 你一连上去,平台不需要做任何复杂判断,直接打标签:代理用户。 然后该降智降智,该限速限速,严重的直接封号。 住宅IP完全是另一回事,它来自真实的家庭宽带ISP,有真实的物理位置。 平台看到的就是一个正常的当地居民在用自家WiFi上网,风控通过率完全不在一个量级。 打个比方, 数据中心IP是群租房地址,一个门牌号后面挤着几千号人,快递员看一眼就知道这地方不对劲, 住宅IP是你自己家门牌号,独门独户,快递员没有任何理由怀疑你。 AI平台的风控就是那个快递员,它不看你买了什么套餐,不看你的使用记录,它就看一个东西:你的IP是从哪来的。 所以问题的核心不是你的AI变笨了,是你的IP在替你自报家门:我是代理用户,请降权处理。 三、三分钟切换到干净的住宅IP 好了,知道问题出在哪了,怎么解决? 我自己现在在用的方案是EqualVPN,搭配Clash Verge,整个过程贴出来: 第一步,打开 equaldcdn.com,邮箱注册就行。不需要手机号,不需要境外信用卡,这点对国内用户挺友好的。 第二步,选方案,三档可选: Plus $5/月,50GB流量,多人共享,日常够用 Pro $9/月,100GB流量,仅两人共享,纯净度更高 Max $15/月,150GB流量,完全独享,纯净度拉满 支付宝直接付,按月扣,没有自动续费这种暗坑。 第三步,付完之后后台会生成一个订阅链接,复制它。 第四步,打开Clash Verge → 订阅管理 → 粘贴链接 → 导入 → 选节点 → 连接。 搞定! iPhone用户用Shadowrocket,同样粘贴订阅链接导入就行,Windows、Mac、Linux、iOS、安卓、鸿蒙全平台覆盖。 我从注册到连通,全程没翻文档,大概三四分钟。 我自己是不太喜欢那种上手就要折腾半天的工具,这个确实做到了开箱即用。 四、怎么验证你的IP真的变干净了? 连上之后别急着去用Claude,先做一次验证,确认IP身份确实变了。 方法跟前面的"IP身份检查"一模一样——再打开 whoer.net 或 ipinfo.io,重新看那两个字段: ASN type 应该显示 residential 或者 isp,不再是 hosting / data center了。 IP位置应该显示美国或新加坡的具体城市和ISP名称。 一个小细节,我切换之后第一次查IP,看到ISP那一栏显示的是美国一个真实的住宅宽带运营商名字,不是什么"XXX Cloud"或者"XXX Hosting"。 那一刻才真正理解"住宅IP"这三个字到底什么意思——它不是模拟的,是真的来自一个物理位置上的家庭宽带通道。 看到这个结果就对了,这时候去开Claude,功能应该是完整的:代码解释器、长输出、联网搜索都在。 我把这个前后对比的过程叫"IP洗脸",洗之前照镜子看看自己多脏,洗之后再照一次确认干净了。 听起来有点蠢,但真的有用。 五、几个我自己摸出来的使用建议聊完了怎么设置和验证,再说说日常使用中的一些经验。 如果你只跑Claude和ChatGPT,Plus方案的流量日常就够了。 重度使用或者多设备同时在线的,上Pro更稳,毕竟只跟一个人共享,IP被污染的概率小很多。 对IP纯净度要求极高的场景——比如养X号、跨境电商、需要长期稳定不被风控的业务——直接Max独享,不跟任何人共用,省得被别人的异常行为拖累你的IP信誉。 节点目前以美国和新加坡为主,其他地区覆盖还在完善中,选节点前可以留意一下。 偶尔遇到个别节点连不上的情况,切换一个节点基本就解决了,不算大问题。 还有一点我想说清楚: 普通机场看视频刷网页完全够用,没必要换。 但如果你是重度AI用户,或者在X上认真做内容不想被风控搞, IP这一层值得认真对待。 它不是锦上添花,是基础设施。 写在最后说实话,IP这个东西,我自己折腾了挺久的。 之前一直觉得能连上就行, 被降智搞了好几次才意识到,连接质量和能不能连接是两码事。 我一直觉得,AI工具这个东西,最怕的不是你不会用,而是你明明在正常用, 但因为一些你根本不知道的"基础设施级"的问题,体验一直在打折扣,你还以为是自己的问题,IP就是其中一个。 EqualVPN团队之前找我聊了产品,我自己上手用了一段时间之后整理了这篇教程,希望对被IP问题困扰的朋友有帮助。另外也还在摸索中,有新的发现会继续更新。 他们最近做了限时8折优惠👇 https://www.equaldcdn.com/?ref=Demo-AYi 有问题评论区问,看到都会回。 Posted Mar 26, 2026 at 11:54AM Posted Mar 28, 2026 at 3:08PM

Likes: 0 Reposts: 0 Views: 0
Score 10
A
AYi_AInotes Unknown date Ai llm

最近Claude封号潮大家应该都感受到了,Anthropic为了防国内AI公司蒸馏,在大规模清洗代理IP,用普通机场的正常用户直接被误伤。 我之前也是自建VPS+机场,Claude动不动降智后来,用Claude写代码,前一天还好好的,第二天突然回复变短了,代码解释器消失了,问它一个稍微复杂点的问题,给你回一段敷衍得不行的东西。 GPT也是,响应明显变慢,有时候连图都不给你画了, 我一开始以为是模型在偷偷降级,甚至怀疑过是不是官方在搞A/B测试,把我分到了缩水组。 后来我才搞明白一件事,大概率不是模型的问题,是我的IP出了问题。 这篇文章讲清楚三件事: 怎么判断你的IP是不是已经被平台标记了 住宅IP和数据中心IP到底差在哪 以及怎么三分钟切到一个干净的静态住宅IP 一、先搞清楚一件事:你的IP有身份 我一直觉得大部分人对IP的理解停留在"能连上就行"这一层,我之前也是。 但其实每一个IP地址在AI平台的风控系统眼里,都有一个身份标签。这个标签决定了平台怎么对待你——给你完整功能,还是悄悄给你降权。 怎么查?打开 whoer.net 或者 ipinfo.io,看两个字段就够了: 第一个,ASN type 如果显示的是 hosting 或 data center,那在平台风控眼里,你就不是一个正常用户,你是一个"代理用户"。 第二个,IP位置 如果显示的国家跟你选的节点对不上,说明这个IP的归属信息已经乱了,平台更不会信任你。 我之前用普通机场的时候随手查了一次,ASN type一栏,一眼hosting。 难怪Claude老是给我阉割版回复。 我把这个叫"IP身份检查",在你折腾任何别的东西之前,先做这一步,只要ASN type不是 residential(住宅),OpenAI、Claude、X这些风控严格的平台,就有概率对你降权。 你的Pro会员费没少交,但享受的可能是免费版的体验。 二、住宅IP和数据中心IP,到底差在哪?搞清楚自己的IP身份之后,下一个问题自然就来了——那什么样的IP才算"干净"? 这得从源头说起。 数据中心IP,就是机房批量生成的IP,成本低,速度也不差, 但问题是几千人共享同一个IP段。 这些IP段在AI平台的风控数据库里,基本都被标记过了。 你一连上去,平台不需要做任何复杂判断,直接打标签:代理用户。 然后该降智降智,该限速限速,严重的直接封号。 住宅IP完全是另一回事,它来自真实的家庭宽带ISP,有真实的物理位置。 平台看到的就是一个正常的当地居民在用自家WiFi上网,风控通过率完全不在一个量级。 打个比方, 数据中心IP是群租房地址,一个门牌号后面挤着几千号人,快递员看一眼就知道这地方不对劲, 住宅IP是你自己家门牌号,独门独户,快递员没有任何理由怀疑你。 AI平台的风控就是那个快递员,它不看你买了什么套餐,不看你的使用记录,它就看一个东西:你的IP是从哪来的。 所以问题的核心不是你的AI变笨了,是你的IP在替你自报家门:我是代理用户,请降权处理。 三、三分钟切换到干净的住宅IP 好了,知道问题出在哪了,怎么解决? 我自己现在在用的方案是EqualVPN,搭配Clash Verge,整个过程贴出来: 第一步,打开 equaldcdn.com,邮箱注册就行。不需要手机号,不需要境外信用卡,这点对国内用户挺友好的。 第二步,选方案,三档可选: Plus $5/月,50GB流量,多人共享,日常够用 Pro $9/月,100GB流量,仅两人共享,纯净度更高 Max $15/月,150GB流量,完全独享,纯净度拉满 支付宝直接付,按月扣,没有自动续费这种暗坑。 第三步,付完之后后台会生成一个订阅链接,复制它。 第四步,打开Clash Verge → 订阅管理 → 粘贴链接 → 导入 → 选节点 → 连接。 搞定! iPhone用户用Shadowrocket,同样粘贴订阅链接导入就行,Windows、Mac、Linux、iOS、安卓、鸿蒙全平台覆盖。 我从注册到连通,全程没翻文档,大概三四分钟。 我自己是不太喜欢那种上手就要折腾半天的工具,这个确实做到了开箱即用。 四、怎么验证你的IP真的变干净了? 连上之后别急着去用Claude,先做一次验证,确认IP身份确实变了。 方法跟前面的"IP身份检查"一模一样——再打开 whoer.net 或 ipinfo.io,重新看那两个字段: ASN type 应该显示 residential 或者 isp,不再是 hosting / data center了。 IP位置应该显示美国或新加坡的具体城市和ISP名称。 一个小细节,我切换之后第一次查IP,看到ISP那一栏显示的是美国一个真实的住宅宽带运营商名字,不是什么"XXX Cloud"或者"XXX Hosting"。 那一刻才真正理解"住宅IP"这三个字到底什么意思——它不是模拟的,是真的来自一个物理位置上的家庭宽带通道。 看到这个结果就对了,这时候去开Claude,功能应该是完整的:代码解释器、长输出、联网搜索都在。 我把这个前后对比的过程叫"IP洗脸",洗之前照镜子看看自己多脏,洗之后再照一次确认干净了。 听起来有点蠢,但真的有用。 五、几个我自己摸出来的使用建议聊完了怎么设置和验证,再说说日常使用中的一些经验。 如果你只跑Claude和ChatGPT,Plus方案的流量日常就够了。 重度使用或者多设备同时在线的,上Pro更稳,毕竟只跟一个人共享,IP被污染的概率小很多。 对IP纯净度要求极高的场景——比如养X号、跨境电商、需要长期稳定不被风控的业务——直接Max独享,不跟任何人共用,省得被别人的异常行为拖累你的IP信誉。 节点目前以美国和新加坡为主,其他地区覆盖还在完善中,选节点前可以留意一下。 偶尔遇到个别节点连不上的情况,切换一个节点基本就解决了,不算大问题。 还有一点我想说清楚: 普通机场看视频刷网页完全够用,没必要换。 但如果你是重度AI用户,或者在X上认真做内容不想被风控搞, IP这一层值得认真对待。 它不是锦上添花,是基础设施。 写在最后说实话,IP这个东西,我自己折腾了挺久的。 之前一直觉得能连上就行, 被降智搞了好几次才意识到,连接质量和能不能连接是两码事。 我一直觉得,AI工具这个东西,最怕的不是你不会用,而是你明明在正常用, 但因为一些你根本不知道的"基础设施级"的问题,体验一直在打折扣,你还以为是自己的问题,IP就是其中一个。 EqualVPN团队之前找我聊了产品,我自己上手用了一段时间之后整理了这篇教程,希望对被IP问题困扰的朋友有帮助。另外也还在摸索中,有新的发现会继续更新。 他们最近做了限时8折优惠👇 https://www.equaldcdn.com/?ref=Demo-AYi 有问题评论区问,看到都会回。 Posted Mar 26, 2026 at 11:54AM

Likes: 0 Reposts: 0 Views: 0
Score 10
o
om_patel5 Unknown date Ai llm

@Schappi retweeted this guy 3D printed and vibe coded a tiny Claude robot for his deskit's called "Clawd Mochi." runs on an ESP32 chip with a tiny display that shows animated expressions.hosts its own WiFi hotspot. zero cloud and zero internet required. fully offlinelive-switch between animated faces, a terminal emulator, and a drawing canvas from your browsertotal cost: under $8takes less than an hour to build3D print files AND the full build are both open source too this is the greatest thing anyone has built with vibe coding Posted Mar 28, 2026 at 6:19AM

Likes: 0 Reposts: 0 Views: 0
Score 10
b
barelyknown Unknown date Ai llm

Finite and infinite races…Great post by @pnickdurham Nick Durham @pnickdurham Company building in the AI era will be a hidden battle between Apollonian and Dionysian ideals. This is simultaneously the most leveraged humans have been and the most physically exhausting knowledge work has felt. For the class of people really pushing the models daily, the cognitive load is creeping higher and higher. Every model interaction reflects back the clarity of your understanding, your technical competence, your domain knowledge, your ability to learn quickly, your physical endurance. If your language isn't clear, you're wasting time and burning load. But getting super clear is demanding.@karpathy now functions as a de facto spokesman for tech's view of AI’s frontier. I try not to have any original thoughts on AI unless Karpathy has validated it. Lisan al-Gaib. He posted in December that he has never felt this behind as a programmer. "Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession." This is of course Apollonian instinct laid bare. Optimize. Adapt. Master the new abstraction layer. Do it now or stand in the breadlines with the permanent underclass.In The Birth of Tragedy, Nietzsche argued that Greek culture achieved greatness not through Apollo alone (the god of form, order, reason, individuation) but through its tension with Dionysus (the god of dissolution, ecstasy, and collective surrender). Nietzsche described the Dionysian as the moment when "all the stubborn, hostile barriers, which necessity, caprice, or 'shameless fashion' has set up between man and man, are broken down." AI has supercharged the Apollonian and starved the Dionysian. Every knowledge worker is on the treadmill. Out of rational tool choice, pursuit of euphoria, or existential anxiety. The feeling Karpathy describes, that a failure to claim the boost feels decidedly like a skill issue, is spreading to every profession from engineering to law, to finance, to design. Nolan Lawson wrote an essay this year called "We Mourn Our Craft" about programmers grieving the displacement of hand-written code. This grief obviously will not be unique to programmers. It will be the universal experience of watching your professional identity dissolve faster than you can rebuild it. Karpathy later posted in a reply about what programmers should do: "Experienced devs have a real advantage but only if they rapidly progress through their grief cycle and adapt, now and onwards. Categorically rejecting or ignoring the new layer would be a mistake." Grief that is not metabolized turns into burnout, cynicism, depression, or attrition.@barelyknown's The Dionysus Program offers the only framework I have seen for solving this problem. A core principle from the book is the distinction between Run Time and Ritual Time. Run Time is Apollonian. Its focus is on pure work execution, decisions, accountability to plans, and elimination of variance. Ritual Time is Dionysian. Protected containers where a team can metabolize loss, correct near-fatal mistakes, dissolve outdated identities, resolve personal grievances, and rebuild meaning. Ritual Time sounds like a euphemism for an HR-approved DEI vulnerability exercise. Soft, emotional, indulgent, the opposite of industrialist American culture. In reality, Ritual Time is the harder discipline. Every durable institution in human history grew through disruptive societal change with ritual. Every world religion runs on rituals and ritualized calendars. The liturgical year. Ramadan. Shabbat. These calendars are designed as mandatory interruptions of productive time. Before every major campaign, Napoleon and his officers dined together. Jensen Huang runs NVIDIA through ritualized whiteboard sessions. Even Elon, who would puke a little at this post so I'm not going to tag him, is running Ritual Time in his companies unknowingly. Does his famous “we dine in hell" Tesla production battle cry sound Apollonian to you? The split of Run/Ritual is up for debate. Elon might be 99/1. Maybe the norm is 95/5. But zero will rip your company apart. You cannot engineer ritual out of human organizations any more than you can engineer sleep out of human bodies. It is a basic need, and the organizations that deny it get brittle.@zebriez posted recently: "Every startup I know is in an all out sprint right now. With full awareness that they're going marathon distance. I'm predicting the teams that take their water breaks together are the ones that are building the healthiest, happiest cultures." That's a sound prediction. Drink more water. Marc Andreessen does not want to introspect. I don’t want to either, Marc. Who wants to slow down and performatively celebrate small wins when you are trying to solve the world's problems? If there’s any experience more painful than shaking yourself out of the euphoric and manic phases of Run Time, I’d love to hear about it. That’s why so few leaders do it well. Yet, the strength of trust within your human teams might be your biggest asset in an era of infinite leverage. I have two boys, 4 and 2. I spend most of my dad hours with them describing the function of large steel machinery in as dream-like of a state as I can. Every night we go to bed browsing photos of @ahmedshubber25 dozers and excavators. The Apollonian impulse is decidedly not a Silicon Valley phenomenon. It’s in our bones. Nietzsche's whole point was that Apollo without Dionysus produces rigidity. And rigidity, under enough pressure, shatters. The race is long enough that the ones who never stop will not finish. The winners will take a breath from time to time, order more jugs of water for the office, and occasionally, Greek wine. twitter.com/i... Posted Mar 27, 2026 at 9:27PM Posted Mar 27, 2026 at 9:29PM

Likes: 0 Reposts: 0 Views: 0
Score 10
c
cooperx86 Unknown date Ai llm

Claude Code users: I've not seen anyone mention this settings.json feature yet, but:"feedbackSurveyRate": 0 Posted Mar 27, 2026 at 9:12PM

Likes: 0 Reposts: 0 Views: 0
Score 10
j
jasonbosco Unknown date Ai llm

@jasonbosco retweeted I've had this laundry list of TODOs sitting in my backlog, that I've always wanted to get to, but couldn't convince myself of the value vs effort involved.Thanks to coding agents, that list is now finally shrinking in size.This actually feels... really good!Still no one-shotting here. I still review every line of code and in some cases, I've had to punt on the idea because the changes involved were too invasive... but at least the coding agent helped do the research for me and helped me clarify my thinking. And that was still worth it. Posted Mar 27, 2026 at 2:58PM

Likes: 0 Reposts: 0 Views: 0
Score 10
g
gauravmc Unknown date Ai llm

Jarvis becoming reality, more and more every week. Set up Claude Code channels on my Raspberry Pi 5, running headless as an always-on server, connected via Telegram. It's pointed at my Obsidian vault with full context on my stuff, notes, code. So on-the-go things like "add this to today's daily note" or "what was I working on last week?" work super well. The plugin's a bit buggy, but really useful already! Thariq @trq212 We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone. Your browser does not support the video tag. Posted Mar 19, 2026 at 10:36PM Posted Mar 27, 2026 at 8:46PM

Likes: 0 Reposts: 0 Views: 0
Score 10
t
trq212 Unknown date Ai llm

We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord. Use this to message Claude Code directly from your phone. Your browser does not support the video tag. Posted Mar 19, 2026 at 10:36PM

Likes: 0 Reposts: 0 Views: 0
Score 10
i
iamdevloper Unknown date Ai llm

Black Mirror S8E1: In 2027, developers are allocated a daily Claude token allowance by the government. A junior dev burns through his entire month's supply trying to centre a div. His family starve. He is forced to write the code himself. He can't. Society collapses. Posted Mar 27, 2026 at 9:11AM

Likes: 0 Reposts: 0 Views: 0
Score 10
a
al3rez Unknown date Ai llm

Bro is cooking with claude code & gaming at the same time, wagmi Your browser does not support the video tag. Posted Mar 28, 2026 at 4:09PM

Likes: 0 Reposts: 0 Views: 0
Score 10
a
al3rez Unknown date Ai llm

Ok so I hate working with Figma and spending time on grunt work so I decided to try something last weekI used Puppeteer to take screenshots of all my SaaS and client projects we done at AstroMVP and then send them to both Nanao Banana 2 Setup Google API key for Nano Banana from Google AI studio and ask Claude code to setup Puppeteer or Chrome MCP to take screenshots and then send each image 1 by 1 to Nano Banana with a prompt like below:Present this screenshot inside a browser window. Follow these rules EXACTLY: 1. The browser frame is a thin light gray (#e5e5e5) top bar, about 40px tall, with 3 small colored dots (red, yellow, green) on the left side only. No URL bar, no tabs, no other elements. 2. Below the top bar, the screenshot fills the ENTIRE remaining space edge to edge. No inner padding, no inner margins. 3. The browser window must fill at least 95% of the total image area. Only a tiny sliver of white background visible around the edges (about 2-3% on each side). 4. The browser window has very subtle rounded corners (about 8px radius) and a faint drop shadow. 5. Pure white (#ffffff) background behind the browser window. 6. The screenshot content must stay sharp, readable, and UNMODIFIED. 7. CRITICAL: The browser frame must be EXACTLY the same size, shape, and proportions in every image. Do not vary it. 8. Output 3:2 landscape ratio at maximum resolution. 9. No laptop body, no desk, no 3D perspective. Flat, front-facing only. For mobile apps (phone frame): Present this mobile app screenshot inside a smartphone. Follow these rules EXACTLY: 1. Use a minimal, modern black phone frame with rounded corners and a small notch at the top. 2. The phone is perfectly centered vertically and horizontally on a pure white background. 3. The screenshot fills the entire phone screen with no gaps. 4. The phone should be large - taking up about 85-90% of the image height. 5. Pure white (#ffffff) background. 6. No shadow, no desk, no hands, no 3D angle. Flat, front-facing. 7. The screenshot content must stay sharp, readable, and UNMODIFIED. 8. Output 3:2 landscape ratio with the phone centered.And that's it.. you can sip your coffee/tea while this is getting done or doing some other work instead of manually doing it The result is 80% okay-ish with future models it'll be hard to recognize if it's manually done or using AI.Good luck! Posted Mar 28, 2026 at 2:28PM

Likes: 0 Reposts: 0 Views: 0
Score 10
k
kaka_ruto Unknown date Ai llm

@kaka_ruto retweeted I've been building agent CLIs a lot lately and I ran into a basic problem recentlyThe CLI would change, but the agentic instructions would not(important if you supply instructions that teach agents how to use your cli)So an agent could be “correct” and still fail, because it was following steps from the wrong cli versionI tried a few ideas that sounded easier but weren’t ideal - eg one global “latest” SKILL.md, but this meant older CLI installs would break because they read newer instructionsThen I had an idea today - ship the CLI with a SKILL.md, and make that file version-linked to the CLI releaseHere's the magic about that: when you install an older CLI, you get the matching older skill; when you upgrade the CLI, the managed skill updates with it!So now I keep the skill in the CLI repo, update it in the same PRs, and release them together. On install/update, I place it in a predictable path like .agents/skills/<skill-name>/SKILL.mdThis completely eliminates drift, and for agent workflows that matters more than anything else, try it! Posted Mar 27, 2026 at 7:34PM

Likes: 0 Reposts: 0 Views: 0
Score 10
C
CasJam Unknown date Ai llm

My series on multi-agent teams with OpenClaw 🦞 has shipped for Builder Methods Pro members. Ready-to-build spec to build your custom task scheduling app.Technical setup cheat sheets15 deep-dive videos.Hundreds of builders are in. Join us! buildermethods.com/pro/openclaw-s… Posted Mar 27, 2026 at 7:51PM

Likes: 0 Reposts: 0 Views: 0
Score 10
h
honglilai Unknown date Ai llm

The AI users community seems to be converging towards a concept: the best way to employ agents is to have a clear, objectively measurable goal, and then have the agent iterate towards that, with intermediate feedback driving the next round. This feedback could be a test suite fail/pass or it could be something like a written review of its output. Gianfranco @gianfrancopiana Last week Gumclaw made 206 commits to our repo while I slept. It fixed 13 flaky tests. I didn't write a single line of test code. Gumclaw is Gumroad's team AI assistant. It runs on OpenClaw on a Mac mini at our Brooklyn office. It answers questions, reviews PRs, and now, apparently, fixes flaky tests. Flaky tests are detective work with a 20-minute feedback loop. They pass locally, fail in CI, and after enough false alarms the team starts ignoring red builds. Nobody wants to fix them. So nobody does. I wanted to see if Gumclaw could do the grinding for me. Spoiler alert: it did. This freed me to do my job. My highest-value work is building product, not debugging why a tax test fails 1 in 20 runs. Gumclaw ran overnight while I shipped features. Here's how you can setup the same system for yourself. The toolI built openclaw-autoresearch, a plugin for OpenClaw. It's a port of pi-autoresearch (by Tobi Lutke) to the OpenClaw plugin system. The idea is simple. You give it a command that measures something. Gumclaw runs it, gets a baseline, makes a change, runs it again. If the numbers improve, it commits. If they don't, it logs what it learned and what to try next. Then it loops. All state lives in plain files. If the session crashes, you type /autoresearch resume and Gumclaw picks up where it left off. What happenedI pointed Gumclaw at our test suite on March 18. One week later: 206 commits, 94 CI runs, 13 merged PRs. Race conditions, timing issues, browser session corruption, test cleanup hooks leaking between tests. The best find wasn't even a flaky test. It was a real bug: when remapping file IDs, A became B, then B became C, silently corrupting file references. The flake was just the symptom. What the agent foundIt was methodical. Fix a class of failures, trigger CI, log the results, move to the next class. When a fix didn't hold, it wrote down why and what to try next. Those notes fed an ideas backlog that kept it from repeating failed approaches. By experiment 20, it had built a map of which tests were flaky and why. Some fixes took multiple iterations. One tax input field went through four different approaches before Gumclaw found one that held across CI runs. What I learnedFlaky tests are a perfect target for this. Green or red. Pass or fail. The agent ran 30+ CI cycles overnight without getting bored. The ideas backlog is the killer feature. Every failed experiment forces Gumclaw to write down what it tried. It stops repeating mistakes. It takes time: 206 commits for 13 PRs. Fixing a flaky test is easy. Proving it's fixed means running CI enough times to trust the flake is gone and not hiding. The loop handles that grind. Try it now:openclaw plugin install @gianfrancopiana/openclaw-autoresearch/autoresearch setupopenclaw-autoresearch is open source, and we would love your contributions! Posted Mar 26, 2026 at 4:07PM Posted Mar 27, 2026 at 2:06PM

Likes: 0 Reposts: 0 Views: 0
Score 10
r
rwrrll Unknown date Ai llm

As an introvert who's recently installed a maths genius in my terminal, I'm realising how much being good at questions beats being good at answers Posted Mar 27, 2026 at 10:51PM

Likes: 0 Reposts: 0 Views: 0
Score 10
e
eoghan Unknown date Ai llm

@manuel_frigerio retweeted Last week we quietly shipped the most significant new technology in the customer service agent category since we started it three years ago. It’s a brand new model for Fin trained by our AI Group called Apex, and it’s objectively the highest performing, fastest, and cheapest model for customer service. It beats the very best models in the industry including GPT-5.4 and Opus 4.5. In this post, I’ll share the news of this launch, the implications it will have for our category, but most importantly, I think, the implications this has for the frontier labs landscape. The news Fin was already the highest performing and most sophisticated agent in the customer service space, consistently beating our impressive competitors like Decagon, Sierra and more at an average win rate in the 70s. It operates at tremendous scale, now resolving almost 2M customer issues per week, a number that’s growing at an exponential clip. In its short life it’s grown to nearly $100M in recurring revenue. As of last week, ~100% of all (English language, chat and email) customer conversations are now running on Apex. Since day 1, the Fin engine has comprised a system of models, and last year we started replacing the off-the-shelf models with our own, custom trained on our proprietary data. But the core answering model was always a frontier labs offering—initially versions of GPT and recently Sonnet 4.0. But now that core answering model is Apex 1.0. This model resolves customer issues at a materially higher rate than any other model available. One of our largest customers in the gaming space saw their resolution rate improve overnight from 68% to 75% (i.e. a reduction in unresolved conversations of 22%). We’ve never seen a jump this large from a single improvement since we started Fin. But importantly it’s also dramatically faster, has fewer hallucinations, and is far cheaper than all other available models—all factors that weigh significantly in the consideration of companies deploying these agents to their service operations. This is an extraordinarily difficult thing to achieve. And we owe this breakthrough to the foundational research coming out of our 60-person AI group run by Fergal Reid. But even for elite teams like his, this cannot be replicated without the domain specific proprietary evals that comprise our billions of human and agent customer service interaction data points created by our Fin resolution engine, which had already been hand tuned to be the most effective in the category. Training with this system makes our setup a flywheel whereby we can continue to train new models that improve at the edge of our system’s abilities. Put another way, I expect that the results we’re enjoying with Apex 1.0 are just the tip of the iceberg. What this means for the customer service agent category Service is arguably one of only two or three categories where generative AI has thus far had a material commercial, economic, real world impact. The other being coding, and perhaps the other being the legal industry. The TAM in each of these spaces is insanely large ($250B-$1T?) and as a result they are hotly contested by multiple companies, which have been aggressively capitalized. We believe that the winners in such spaces must and will become full stack AI companies. And we’ve already seen this just last week with Cursor making the first such move, with Fin being the second. As features become ~free to build, the technology factors that will differentiate the players will be the AI under the hood, and if you’re using the same general purpose off-the-shelf model as everyone else, you have no durable differentiation. This means that our competitors will indeed need to eventually release their own models. But we see them just starting now to hire for the talent required to do this and so we think we likely have at least a year head start on the space. Interestingly, 2-3 years ago, this is not how I imagined AI applications would play out. I thought that the points of differentiation would be all of the things we built around the third party models. The AI game will humble you and no doubt make at least some of my predictions in this post eventually look dumb too. What this means for the AI industry at large In a podcast interview last week, Andrej Karpathy said:I do think we should expect more speciation in the intelligences. The animal kingdom is extremely [diverse] in the brains that exist. And there’s lots of different niches of nature… And I think we should be able to see more speciation. And you don’t need this oracle that knows everything. You kind of speciate it. And then you put it on a specific task. And we should be seeing some of that because you should be able to have much smaller models that still have the cognitive core. The frontier labs still have the very best models, but the open-weight models are not that far behind. So it’s not hard to see pre-training as a commodity of sorts. Where we think the frontier will move next is to post-training. And Karpathy’s prediction is exactly what we’re seeing with Apex (and Cursor’s Composer 2) and what we’re going to see significantly going forward. As such, the labs are in an interesting position where on one hand the horizontal, general purpose models are actually over-serving the market for specific use cases. E.g. their models are more generally intelligent than is needed for customer service. And on the other hand, the open-weight models are more than good enough where high quality domain specific post-training can make the resulting models superior at the special purpose jobs, and in the ways that matter to that particular job. E.g. in service, the soft factors really matter, like judgement, pleasantness, attentiveness (as well as the hard factors mentioned prior, like the ability to effectively resolve problems, quickly and cheaply). Personally, I’m still very bullish on the labs. And we remain very heavy customers of Anthropic, whether as part of the broader system of models used for the Fin Engine, or with the depth of our usage of Claude Code in our engineering org. Yet classic disruption (a la the late, great Clay Christensen) is now at their door. The only way out is to disrupt themselves by building cheaper specialized models too. And the only way to do that is to acquire the evals (or the companies with the evals) needed for that specific task. Which means there will be some interesting data partnerships, or M&A consolidation, or you’re going to see some hyper specific model providers who go it alone and compete with the labs head to head. Likely all of the above. In the meantime, we’re happy to be the only vendor in our space with a custom model that’s also objectively superior to everything else out there. And we’re excited to deploy it far and wide for the benefit of end customers everywhere. Our next announcement that’s coming very soon will help us do exactly that. Posted Mar 26, 2026 at 3:59PM

Likes: 0 Reposts: 0 Views: 0
Score 10
m
mem0ai Unknown date Ai llm

@alazycoder2 retweeted The #1 repo on GitHub right now is a superagent harness, DeerFlow by ByteDance. It stores memory in JSON. Here’s how it works. Most frameworks simulate memory by replaying chat history, stuffing everything into the context window and hoping the model picks what matters. DeerFlow does something different. It doesn’t store conversations. It extracts facts about the user, scores them by confidence, and injects what fits within a 2,000-token budget into each prompt, async, without bloating the context. In this article, I’ll break down how it works based on reverse-engineering a self-hosted DeerFlow instance I run in my docker and inspecting its JSON memory.What Is DeerFlowDeerFlow (Deep Exploration and Efficient Research Flow) is an open-source super agent harness that orchestrates sub-agents, memory, and sandboxes to do almost anything, powered by extensible skills. They are currently trending #1 in Github with 49.2K stars. Check out their repository here. The DeerFlow UI looks like a ChatGPT or Claude interface where you have chat interface + agent options on your left sidebar. I tested it by having conversations for about 3 hours, running it locally on Docker Container in my Macbook. What I tested: I conversed with DeerFlow about technical topics; memory benchmarks (LOCOMO, LongMemEval), understanding DeerFlow architecture, and general topics like my hobby (paragliding recommendations). I tested memory retrieval by asking questions and asking the agent to omit certain memories. The part that doesn’t get enough attention is the memory layer. The Memory Interface The Memory panel organizes everything into: User Context (Work + Personal) Current Focus (Top of Mind) History layers (recent months, earlier context, long-term background) A Facts table You can see everything the system learned about you, when it learned it, and how confident it is. The source field on each fact is the actual thread UUID, so the system knows exactly which conversation each piece of knowledge came from. Where the memory sitsMemory in DeerFlow lives as a file on disk: backend/.deer-flow/memory.json. It’s local and it persists across every session. A structured JSON file. Within one session, DeerFlow builds a structured profile. Each fact has similar content like we have seen in the Memory panel UI. Facts below 0.7 confidence don’t get included. The store caps at 100 facts total, evicting the lowest-confidence ones first when it overflows. How It Actually WorksDeerFlow’s Lead Agent runs on LangGraph, a graph-based orchestration framework where each agent turn is a node in a stateful execution graph. Memory isn’t a plugin or sidecar. It’s baked directly into the middleware chain as position #8, meaning it runs on every agent turn automatically. The key component is MemoryMiddleware. MemoryMiddleware sits right after TitleMiddleware and before the vision and loop-detection layers. This is intentional, memory updates should happen after the title is generated (so the LLM knows what the conversation was about) but before any loop or clarification checks cut the session short. MemoryMiddleware doesn't update memory synchronously. It queues the conversation for async processing. The flow works like this: User sends a message. Agent responds. MemoryMiddleware filters the exchange, only user inputs and final AI responses are considered. The conversation is added to an async queue with a 30-second debounce timer. If another message arrives from any thread within 30 seconds, the timer resets. LLM extractor runs against the conversation and produces a diff: newFacts to add, factsToRemove to delete, and shouldUpdate flags for each summary section. Memory JSON is updated Before appending a new fact, the system checks for exact content duplication (normalized by stripping whitespace). Note: different phrasing of the same semantic fact will still get added, the dedup is text-based, not semantic. You can point model_name at a cheaper model for the extraction step, the memory LLM doesn't need to be your best model, it just needs to be good at structured extraction. What I found interesting on step 3 is if the same thread_id already has a pending update in the queue, the new entry replaces it rather than appending. You never process stale mid-conversation snapshots, only the final state of each thread gets processed. Memory ingestion doesn’t affect the user experience on getting response from the agent from latency perspective. If a user sends a message, they will get a response right away. This is example of a good harness in agentic system. Where the Memory Actually Shows UpWhen you start a new conversation, every fact that fits within a 2,000-token budget, sorted by confidence, plus all the user, history, and topOfMind summaries get injected into the system prompt inside <memory> tags. There's no hardcoded "top N" limit; it's a token budget, not a count. Tiktoken counts the tokens precisely as facts are added one by one until the budget runs out. The agent sees something like this at the start of every session, from my actual session, formatted exactly as format_memory_for_injection() . The agent never has to ask "what are you working on?" It already knows based on my Top of Mind History. What It Looks Like In PracticeAfter a 3 hours session, this is what DeerFlow showed in its Fact section in Memory panel. This is what's being compacted into 2000 token. Memory Retrieval TestI tried testing more to understand if it’s able to retrieve my information based on the conversation I had with them. Retrieval worked as expected. I tried deleting the memories through the chat.It only adds more preference in the fact table that "I don't want to talk about this topic despite the LLM Extractor steps showing it has factsToRemove diff recorded. There is definitely a workaround, which is modifying the json within the codebase. This shows limitations on this memory system. Another limitation I encountered was when I talked about memory benchmark in one of my sessions, despite my user profile showing English preference, it shows questions recommendation in Chinese. ConclusionHighlight from Deerflow Memory Local and yours. No vendor lock-in on the memory store. It's a JSON file. You can read it, edit it, reset it. Async by default. Memory updates never slow down your response. The 30-second debounce is clever, it absorbs conversational noise without burning LLM calls on every single message. Confidence-scored storage. A single mention of something ("I might try Rust someday") won't permanently alter your memory profile. The confidence threshold filters out weak signals before they accumulate. Atomic writes. Write-then-rename is the right pattern here. Memory corruption on process crash is a real failure mode that most toy implementations ignore. Deduplication. The duplicate-check before appending means the system won't keep layering the same preference 50 times in 50 different phrasings. What It Doesn't Do No semantic search. Injection is the highest-confidence facts that fit in the token budget, not the most relevant facts for your current query. If you have 100 stored facts and ask a highly specific question, you get the top facts by confidence, not by relevance to that question. No semantic deduplication. The duplicate check is text-based, it strips whitespace and compares strings exactly. I can see this in my actual memory file: facts about comparing DeerFlow's memory against a filesystem + markdown approach, same semantic intent, different phrasing, stored as two separate entries. At 9 facts after one session, this isn't a problem. At 80 facts after a month, it starts to matter. Limited intelligence layer + potential compaction issues. Memory is capped at 100 facts, which means it can struggle as context grows. While the confidence score helps prioritize which memories to keep or omit. No vector embeddings. This is a deliberate tradeoff, no embedding pipeline, no similarity search. It keeps the system simple and dependency-light, but it brings limitation Why It MattersMost agent frameworks treat memory as a retrieval problem:embed everything, store it in a vector database, and retrieve the most similar chunks at query time. That works, but it comes at a cost.You’re adding an embedding model, a vector store, and a retrieval step to every request. That means more infrastructure, more latency, and more complexity. DeerFlow takes a different approach: don’t store conversations, store understanding. It runs an asynchronous LLM extraction pass that distills raw conversations into structured facts, then injects those facts directly into future prompts. The expensive step happens after the response is already delivered, on a debounced 30-second timer, using any model you choose. Overall, it delivers on its promise as a SuperAgent harness. The memory system is efficient and production-ready in its current form, with some clear limitations. It combines an agentic approach to scoring and injecting facts, but still lacks a robust memory intelligence layers on some usecase.Overall, as a SuperAgent Harness deep research and agentic tool. Deerflow team did a great work on implementing memory system on this release. The result is an agent that knows you, not because it remembers everything you said, but because it builds a persistent model of your preferences, goals, and context. And that model lives as a simple JSON file on your machine. DeerFlow builds memory directly into its middleware, so it works out of the box, on every conversation, by default. That’s the part worth paying attention to. Try the repo below, let me know what you think! ReferenceDeerflow Repository: https://github.com/bytedance/deer-flow Image generated from @claudeai based on personal prompts. In Context #1 This blog is part of In Context, a mem0 blog series covering AI Agent memory and context engineering. mem0 is an intelligent, open-source memory layer designed for LLMs and AI agents to provide long-term, personalized, and context-aware interactions across sessions. Get your free API Key here --> app.mem0.ai or self-host mem0 from our open source github repository Author:Livia Ellen (@ellen_in_sf) - Growth engineer at mem0Disclaimer: This is a personal view based on the current analysis and personal testing on the deerflow codebase when the article is written (March 27, 2026). If the source code changes upstream, this analysis might age. Current analysis is tested using Azure OpenAI GPT 5.2 on Docker Container using default config setup. You might have different experiences based on the model that you use.For question regarding this publication reach me at livia[dot]ellen[at]mem0[dot]ai Posted Mar 27, 2026 at 3:01PM

Likes: 0 Reposts: 0 Views: 0
Score 10
t
thecodenamev Unknown date Ai llm

This is one hell of an edition you're not gonna wont to miss! A huge thank you to @mattsolt for keeping this up (he curates this by hand!). And a feature segment for ups.dev ❣️ Matt Solt @mattsolt It's finally here. The one year anniversary edition of Ruby AI News is live! The 27th edition features the rise of AI agent-driven business creation, tooling to deploy your AI experiments more than ONCE, a new cognitive architecture for Ruby AI, and so much more 🔻 Posted Mar 27, 2026 at 4:38PM Posted Mar 27, 2026 at 6:19PM

Likes: 0 Reposts: 0 Views: 0
Score 10
r
rbqconf Unknown date Ai llm

Chris built an AI feature into his companies app. He shares the steps he took to implement, orchestrate, track, and protect outcomes.RBQConf Posted Mar 27, 2026 at 5:00PM

Likes: 0 Reposts: 0 Views: 0
Score 10
w
wilbertliu Unknown date Ai llm

One thing I learned from this launch: early access mode is doomed! 💥In this "you can just prompt things" era, the best way to validate the market is to show, not tell (and money, of course).That's why I'm rushing towards Rinhelp MVP.Locked in now. Wilbert Liu @wilbertliu Introducing rinhelp.com ✨It's a slight pivot from Chatlane.While building Chatlane, I realized the hardest part of support was never the routine questions. It was the technical tickets that completely broke the flow.The integration problem. The billing mismatch. The webhook failure. The tickets that turn into 30 minutes of jumping between Intercom, Sentry, the DB, and the codebase just to figure out what broke.That’s why I built Rinhelp.Rinhelp investigates technical support issues and gives back a draft to review, with evidence, before you reply or touch the product.If you’re a SaaS founder or engineer dealing with the same thing, join our early access:rinhelp.com Posted Mar 26, 2026 at 1:37PM Posted Mar 27, 2026 at 1:05PM

Likes: 0 Reposts: 0 Views: 0
Score 10
w
wilbertliu Unknown date Ai llm

Introducing rinhelp.com ✨It's a slight pivot from Chatlane.While building Chatlane, I realized the hardest part of support was never the routine questions. It was the technical tickets that completely broke the flow.The integration problem. The billing mismatch. The webhook failure. The tickets that turn into 30 minutes of jumping between Intercom, Sentry, the DB, and the codebase just to figure out what broke.That’s why I built Rinhelp.Rinhelp investigates technical support issues and gives back a draft to review, with evidence, before you reply or touch the product.If you’re a SaaS founder or engineer dealing with the same thing, join our early access:rinhelp.com Posted Mar 26, 2026 at 1:37PM

Likes: 0 Reposts: 0 Views: 0
Score 10
j
joker1007 Unknown date Ai llm

今だとシェルスクリプト全部claudeが書くから覚えられないから覚えなくていいやに変わった。 #terminalnight Posted Mar 27, 2026 at 11:26AM

Likes: 0 Reposts: 0 Views: 0
Score 10
a
awilkinson Unknown date Ai llm

@supamaiku retweeted Software is about to look a lot like ecommerce.Shitty margins. Unlimited competition. A hard way to make a living.Why? Because over the next few years, Anthropic, Google, and OpenAI are going to drink the software industry's milkshake 🥤If you were looking for a hotel in 2010, this is how it went:2010: Google "hotels in New York" → Google links you to TripAdvisor.But by 2020...2020: Google "hotels in New York" → Google shows its own hotel booking system integrated directly into the search results.RIP TripAdvisor 🪦📉 (check their stock price 2015 vs today)Google made a fortune by building products that captured demand on the keywords where they had the most traffic, like travel.But Google had finite resources. They only had so many developers to build these products, so it only made sense to do this for the largest categories: hotels, flights, shopping.This same thing is about to happen to most digital services and software products. Except this time, the constraint that protected smaller categories is gone.2025: Ask ChatGPT for the best CRM software → It directs you to Attio, Pipedrive, and Zoho.2028: Ask ChatGPT for the best CRM → It builds one, imports your data, and runs it for you at a fraction of the cost.The difference between OG Google and today's frontier models is that OG Google needed human engineers to build each vertical product. OpenAI, Anthropic, and the Google of today (Gemini) won't have this constraint.When the cost to build and maintain software approaches zero, there's no reason to stop at hotels and flights. You do it for everything, on demand.Right now, vibe coding is still fiddly. It requires a human in the loop, it's insecure, and it depends on third-party hosting and infrastructure. But I expect the frontier model companies to build out their own vertical infrastructure to run the software they generate, removing the current friction entirely. Think Claude's artifacts, except full-fledged digital products—hosted, maintained, and updated by the same AI that built them.The moat for most software companies isn't the code. It's the switching cost and the ecosystem lock-in. When an AI can rebuild your tool in seconds and migrate your data automatically, that moat disappears.Everyone understands that vibe coding = infinite competition. But this is different. They're taking your customer before they can even get to you.So, software becomes a lot like ecommerce. Near zero margin unless you own distribution and aren't reliant on Google/Meta for customers.TLDR: They drink your milkshake. They'll drink it up. Posted Mar 26, 2026 at 2:43PM

Likes: 0 Reposts: 0 Views: 0
Score 10
v
vitaliidodonov Unknown date Ai llm

@supamaiku retweeted I’m launching Stanley (AI Head of Content) on June 1, 2026.Over the next 6-12 months, I'll grow it from $0 → $10M ARR. In public. (While running my $30M ARR business full time)I will also be using only AI employees to do it, so you can copy the playbook.I'll update this thread as I go. Bookmark it and follow along to hold me accountable 🤝Comment "alpha" to get early access.My social media stats as of today: Posted Mar 26, 2026 at 12:00PM

Likes: 0 Reposts: 0 Views: 0
Score 10
G
GitHub_Daily Unknown date Ai llm

@ruby_runner retweeted 传统客服工单系统,大多只是把问题记录下来,回复还是靠人一条条手打,效率提升有限。恰巧看到 Tentix 这个开源项目,主打 AI 原生的客服平台,使工单处理效率直接翻倍。它能自动分析用户问题,从知识库中检索相关内容,再生成回复,整个流程几乎不需要人工介入。GitHub:https://t.co/Pwfa313sfD对历史工单、高价值对话和常规文档会沉淀到统一的向量知识库,让 AI 的回答越来越懂业务。原生支持飞书等多渠道消息推送,并预留了模块化接口,方便后续接入其他通讯工具。如果你正在为团队找一套能真正减少重复劳动的客服系统,可以用 Docker 快速部署试试。 Posted Mar 27, 2026 at 6:40AM

Likes: 0 Reposts: 0 Views: 0
Score 10
b
badlogicgames Unknown date Ai llm

@5katkov retweeted it should also be "fucking obvious" that the rate of technical debt a team of humans can add to a codebase is much lower than that of a team of agents.humans will eventually fix some of that debt, due to the self-inflicted pain. agents feel no such pain. Erik Meijer @headinthebox It seems that with the advent of AI coding people have completely forgotten that human authored code suffered badly from quality degradation. That is why we coined the term "technical debt" and why companies like Meta incentivized "Better Engineering" as part of their performance review cycle.But suddenly, just like # of times of sex per week, IQ, ...., many humans now believe their code quality is higher than average. Sorry to disturb your dreams folks, but your code sucks louder than a jet-engine.The real opportunity here is to play the movie below in reverse. You have an unhealthy codebase but you use agents as some kind of white blood cells to continuously fight the disease and clean shit up.This is so fucking obvious, I keep wondering if I live in some alternate universe full of midwits and I am the only reasonable person around. twitter.com/i... Posted Mar 26, 2026 at 4:29PM Posted Mar 26, 2026 at 9:45PM

Likes: 0 Reposts: 0 Views: 0
Score 10
h
headinthebox Unknown date Ai llm

It seems that with the advent of AI coding people have completely forgotten that human authored code suffered badly from quality degradation. That is why we coined the term "technical debt" and why companies like Meta incentivized "Better Engineering" as part of their performance review cycle.But suddenly, just like # of times of sex per week, IQ, ...., many humans now believe their code quality is higher than average. Sorry to disturb your dreams folks, but your code sucks louder than a jet-engine.The real opportunity here is to play the movie below in reverse. You have an unhealthy codebase but you use agents as some kind of white blood cells to continuously fight the disease and clean shit up.This is so fucking obvious, I keep wondering if I live in some alternate universe full of midwits and I am the only reasonable person around. twitter.com/i... Posted Mar 26, 2026 at 4:29PM

Likes: 0 Reposts: 0 Views: 0
Score 10
s
strzibnyj Unknown date Ai llm

There are two hard things in computer science: cache invalidationnaming thingsoff-by-one errorsBut AI will fix it 😌 Posted Mar 27, 2026 at 11:09AM

Likes: 0 Reposts: 0 Views: 0
Score 10
y
yarotheslav Unknown date Ai llm

I'm feeling like a 🤡 paying $240/m for "overloaded_error" every 20 minutes.Seriously considering giving Codex and other tools another chance Posted Mar 27, 2026 at 10:00AM

Likes: 0 Reposts: 0 Views: 0
Score 10
b
bitforth Unknown date Ai llm

@pantulis retweeted Yo fui ingeniero en Meta, y siempre seguía FAIR desde adentro. Lo que acaban de publicar es la versión que les dejan publicar.Pero con eso, es más que suficiente para decirles exactamente que es lo que está pasando. TRIBE v2 predice, vértice por vértice sobre la corteza cerebral, qué zonas activa cualquier video. Sin escáneres. Sin humanos. Subes el contenido, obtienes el mapa neural (activación emocional, supresión de razonamiento crítico, modulación prefrontal) antes de que el video lo vea un solo usuario.Ahora considera la posición de Meta:Tiene años de datos de Reels sobre qué contenido retiene atención, genera enojo, provoca compartir. Saben empíricamente qué funciona. TRIBE v2 les da el mecanismo causal de por qué funciona (a nivel de tejido cortical) Eso convierte correlación histórica en capacidad predictiva sobre contenido nuevo.Internamente hay herramientas que se llaman Gatekeepers y Quick Promotions que sirven para inyectar contenido en el feed de poblaciones arbitrarias a escala.Simulador de respuesta cerebral + conocimiento empírico de contenido efectivo + maquinaria de distribución selectiva. El pipeline está completo.Y luego está Thiel. Inversor y amigo personal de Zuck. Fundador de Palantir, cuyo negocio es análisis de poblaciones a escala para gobiernos e inteligencia. NO es descabellado observar que confluyen los incentivos de plataformas construidas por las mismas personas.La licencia CC BY-NC dice que Meta retiene los derechos comerciales del predictor de respuesta cerebral más preciso jamás construido.Y recuerda, esto es lo que decidieron hacer público. AI at Meta @AIatMeta Today we're introducing TRIBE v2 (Trimodal Brain Encoder), a foundation model trained to predict how the human brain responds to almost any sight or sound.Building on our Algonauts 2025 award-winning architecture, TRIBE v2 draws on 500+ hours of fMRI recordings from 700+ people to create a digital twin of neural activity and enable zero-shot predictions for new subjects, languages, and tasks.Try the demo and learn more here: https://t.co/VkMd1YpQWI Posted Mar 26, 2026 at 1:04PM Posted Mar 27, 2026 at 1:52AM

Likes: 0 Reposts: 0 Views: 0
Score 10
V
Vtrivedy10 Unknown date Ai llm

@OrtegaCManuel retweeted TLDR: The best agent evals directly measure an agent behavior we care about. Here’s how we source data, create metrics, and run well-scoped, targeted experiments over time to make agents more accurate and reliable. Evals shape agent behaviorWe’ve been curating evaluations to measure and improve Deep Agents. Deep Agents is an open source, model agnostic agent harness that powers products like Fleet and Open SWE. Evals define and shape agent behavior, which is why it’s so important to design them thoughtfully. Every eval is a vector that shifts the behavior of your agentic system. For example, if an eval for efficient file reading fails, you’ll likely tweak the system prompt or the read_file tool description to nudge behavior until it passes. Every eval you keep applies pressure on the overall system over time. It is crucial to be thoughtful when adding evals. It can be tempting to blindly add hundreds (or thousands) of tests. This leads to an illusion of “improving your agent” by scoring well on an eval suite that may not accurately reflect behaviors you care about in production. More evals ≠ better agents. Instead, build targeted evals that reflect desired behaviors in production. When building Deep Agents, we catalog the behaviors that matter in production, such as retrieving content across multiple files in the filesystem or accurately composing 5+ tool calls in sequence. Rather than using benchmark tasks in aggregate, we take the following approach to eval curation: Decide which behaviors we want our agent to follow. Then research and curate targeted evals that measure those behaviors in a verifiable way. For each eval, add a docstring that explains how it measures an agent capability. This ensures each eval is self-documenting. We also tag each eval with categories like tool_use to enable grouped runs. Review output traces to understand failure modes and update eval coverage. Because we trace every eval run to a shared LangSmith project, anyone on the team can jump in to analyze issues, make fixes, and reassess the value of a given eval. This creates shared responsibility for adding and maintaining good evals. Running many models across many evals can also get expensive, so targeted evals save money while improving your agent. In this blog we cover: How we curate data How we define metrics How we run the evals How we curate dataThere’s a few ways we source evals: Using feedback from dogfooding our agents Pulling selected evals from external benchmarks (like Terminal Bench 2.0 or BFCL) and often adapting them for a particular agent Writing our own (artisanal) evals and unit tests by hand for behaviors we think are important We dogfood our agents every day. Every error becomes an opportunity to write an eval and update our agent definition & context engineering practices.Note: We separate SDK unit and integration tests (system prompt passthrough, interrupt config, subagent routing) from model capability evals. Any model passes those tests, so including them in scoring adds no signal. You should absolutely write unit and integration tests, but this blog focuses solely on model capability evals. Dogfooding agents & reading traces are great sources of evalsThis makes finding mistakes possible. Traces give us data to understand agent behavior. Because traces are often large, we use a built-in agent like Polly or Insights to analyze them at scale. You can do the same with other agents (like Claude Code or the Deep Agents CLI) plus a way to pull down traces, like the LangSmith CLI. Our goal is to understand each failure mode, propose a fix, rerun the agent, and track progress and regressions over time. For example, a large fraction of bug-fix PRs are now driven through Open SWE, our open source background coding agent. Teams using it touch many different codebases with different context, conventions, and goals. This naturally leads to mistakes. Every interaction of Open SWE is traced, so those can easily become evals to make sure the mistake doesn’t happen again. Other evals are pulled and adjusted from existing benchmarks like BFCL for function calling. For coding tasks, we integrate with Harbor to run selected tasks from datasets like Terminal Bench 2.0 tasks in sandboxed environments. Many evals are written from scratch and act as focused tests to observe isolated behavior, like testing a read_file tool. We group evals by what they testIt’s helpful to have a taxonomy of evals to get a middle view of how agents perform (not a single number, not individual runs).Tip: Create that taxonomy by looking at what they test, not where they come from. For example, tasks from FRAMES and BFCL could be tagged "external benchmarks," but that would not show how they measure retrieval and tool use, respectively. Here are some categories we define and what they test: Today, all evals are end-to-end runs of an agent on a task. We intentionally encourage diversity in eval structure. Some tasks finish in a single step from an input prompt, while others take 10+ turns with another model simulating a user. How we define metricsWhen choosing a model for our agent, we start with correctness. If a model can’t reliably complete the tasks we care about, nothing else matters. We run multiple models on our evals and refine the harness over time to address the issues we uncover. Measuring correctness depends on what's being tested. Most internal evals use custom assertions such as “did the agent parallelize tool calls?”. External benchmarks like BFCL use exact matching against ground truth answers from the dataset. For evals where correctness is semantic like whether the agent persisted the correct thing in memory, we use LLM-as-a-judge. Once several models clear that bar, we move to efficiency. Two models that solve the same task can behave very differently as in practice. One might take extra turns, make unnecessary tool calls, or move through the task more slowly because of model size. In production, those differences show up as higher latency, higher cost, and a worse overall user experience. All together, the metrics we measure for each evaluator run are: Solve rate measures how quickly an agent solves a task, normalized by the expected number of steps. Like latency ratio, it captures end-to-end time to solve the task, including model round trips, provider latency, wrong turns, and tool execution time. For simple tasks where we can define an ideal trajectory, solve rate can be easier to work with than latency ratio because it only requires measuring the given agent's task duration. This gives us a simple way to choose models with a targeted eval set: Check correctness first: which models are accurate enough on the tasks you actually care about? Then, compare efficiency: among the models that are good enough, which one gives the best tradeoff between correctness, latency, and cost? Example of useful metrics around evalsTo make model comparisons actionable, we examine how models succeed and fail. That requires a concrete reference point for what “good” execution looks like beyond accuracy. One primitive we use is an ideal trajectory. This is a sequence of steps that produces a correct outcome with no “unnecessary” actions. For simple, well-scoped tasks, the variables are defined tightly enough that the optimal path is usually obvious. For more open-ended tasks, we approximate a trajectory using the best-performing model we’ve seen so far, then revisit the baseline as models and harnesses improve. In this way, observing agent behavior helps us refine our priors about ideal trajectories. Consider a simple request:"What is the current time and weather where I live?" An agent’s ideal trajectory might look like this: It makes the fewest necessary tool calls (e.g., resolve user → resolve location → fetch time and weather) It parallelizes independent tool calls where possible It produces the final answer without unnecessary intermediate turns Ideal trajectory: 4 steps, 4 tool calls, ~8 seconds Now compare that with a trajectory that is still technically correct, but less efficient. Inefficient trajectory: 6 steps, 5 tool calls, ~14 seconds. Correct but inefficient trajectory: 6 agent steps, 5 tool calls, includes an unnecessary tool call, and doesn’t parallelize tool calls. The above examples are illustrative: a REPL could solve this task even faster, but the tool-calling version makes the idea easier to explain. Both runs are correct, but the second run increases latency and cost, and creates more opportunities for failure. This framing lets us evaluate both correctness and efficiency over evals. We maintain and update metrics to distill the runs into measurable numbers we can use to compare experiments. From the example above, the inefficient but correct run would score: How we run evalsWe use pytest with GitHub Actions to run evals in CI so changes run in a clean, reproducible environment. Each eval creates a Deep Agent instance with a given model, feeds it a task, and computes correctness and efficiency metrics. We can also run a subset of eval using tags save costs and measure targeted experiments. For example, if building an agent that requires a lot of local file processing and synthesis, we may focus on the file_operations and tool_use tagged subsets. export LANGSMITH_API_KEY="lsv2_..."uv run pytest tests/evals --eval-category file_operations --eval-category tool_use --model baseten:nvidia/zai-org/GLM-5Our eval architecture and implementation is open sourced in the Deep Agents repository. What’s nextWe’re expanding our eval suite and doing more work around open source LLMs! Some things we’re excited to share soon: How Open Models measure against closed frontier models across eval categories Evals as a mechanism to auto-improve agents for tasks in real time Openly share how we maintain, reduce, and expand evals per agent over time Thanks to the great team who helped review and co-write this blog @masondrxy @veryboldbagel @hwchase17. Also published on the LangChain blog here. Deep Agents is fully open source. Try it and let us know what you think! We’re excited to help teams build great agents & evals. Posted Mar 26, 2026 at 4:23PM

Likes: 0 Reposts: 0 Views: 0
Score 10
r
rhiannon_io Unknown date Ai llm

"The system is more mature than I expected" - Claude on the system it built... 🤔 Posted Mar 26, 2026 at 11:47AM

Likes: 0 Reposts: 0 Views: 0
Score 10
b
brunoborges Unknown date Ai llm

@lazaronixon retweeted When you write code, you review it as you write it. The two acts are inseparable.When using someone else's code, you extend some degree of trust because you know they did the same.AI-generated code breaks this contract. The review may only happen after, by someone who wasn't there, or no one at all.How do we bring trust back to this new contract? Posted Mar 26, 2026 at 3:42PM

Likes: 0 Reposts: 0 Views: 0
Score 10
i
ibuildthecloud Unknown date Ai llm

@lazaronixon retweeted I'm not kidding. Has anyone seen a unit test that AI has written that provided any value whatsoever? This is truly fascinating how useless they are. Posted Mar 26, 2026 at 8:02PM

Likes: 0 Reposts: 0 Views: 0
Score 10
M
MatthewRideout Unknown date Ai llm

@lazaronixon retweeted Anyone who thinks LLMs are good at coding is really bad at coding. Posted Mar 26, 2026 at 2:47PM

Likes: 0 Reposts: 0 Views: 0
Score 10
b
bradgessler Unknown date Ai llm

❤️ @claudeai code Posted Mar 27, 2026 at 12:04AM

Likes: 0 Reposts: 0 Views: 0
Score 10
t
taggy Unknown date Ai llm

hey @PortkeyAI possible to track tokens or cost consumption at prompt level ? looking to set right budgets and track which ones need work! Posted Mar 27, 2026 at 11:05AM

Likes: 0 Reposts: 0 Views: 0
Score 10
c
cayt3r Unknown date Ai llm

Pure vibecoders in this era are just like startups that raised a lot of fundings in the ZIRP era.My take: try to ship as much as you can using the heavily subsidized AI coding tools before they flip the switch. Gergely Orosz @GergelyOrosz Devs who can code also WITHOUT AI as well looking to became 10x more valuableThey are the ones who won’t panic or be idle when their Claude quota runs out…So much for all the advice on how learning to code is not worth it any more… twitter.com/i... Posted Mar 27, 2026 at 6:22AM Posted Mar 27, 2026 at 7:07AM

Likes: 0 Reposts: 0 Views: 0
Score 10
G
GergelyOrosz Unknown date Ai llm

Devs who can code also WITHOUT AI as well looking to became 10x more valuableThey are the ones who won’t panic or be idle when their Claude quota runs out…So much for all the advice on how learning to code is not worth it any more… twitter.com/i... Posted Mar 27, 2026 at 6:22AM

Likes: 0 Reposts: 0 Views: 0
Score 10
s
samsaffron Unknown date Ai llm

Very thankful we invented AI cause at least something understands all the nuance of Google Cloud OAuth Posted Mar 27, 2026 at 4:28AM

Likes: 0 Reposts: 0 Views: 0
Score 10
w
wintonARK Unknown date Ai llm

We have been surpassed:AI written output exceeded human written output in 2025 Posted Mar 26, 2026 at 4:40PM

Likes: 0 Reposts: 0 Views: 0
Score 10
s
samcraigjohnson Unknown date Ai llm

we truly live in the dumbest timeline Chubby♨️ @kimmonismus OpenAI is backing Isara, a new startup founded by two 23-year-old AI researchers that coordinates thousands of AI agents to solve complex problems, like using ~2,000 agents to forecast gold prices. The company just raised $94M at a $650M valuation and plans to sell predictive modeling tools to finance firms first. twitter.com/i... Posted Mar 26, 2026 at 1:17PM Posted Mar 27, 2026 at 2:06AM

Likes: 0 Reposts: 0 Views: 0
Score 10
k
kimmonismus Unknown date Ai llm

OpenAI is backing Isara, a new startup founded by two 23-year-old AI researchers that coordinates thousands of AI agents to solve complex problems, like using ~2,000 agents to forecast gold prices. The company just raised $94M at a $650M valuation and plans to sell predictive modeling tools to finance firms first. twitter.com/i... Posted Mar 26, 2026 at 1:17PM

Likes: 0 Reposts: 0 Views: 0
Score 10
S
Shpigford Unknown date Ai llm

first big screw up by claude. it churned on a problem for 2 full hours, then misread one of my responses as approval to go a different route and so it deleted all its work. 😭 Posted Mar 27, 2026 at 1:30AM

Likes: 0 Reposts: 0 Views: 0
Score 10
c
coorasse Unknown date Ai llm

This. 100%. Matt @MatthewRideout Anyone who thinks LLMs are good at coding is really bad at coding. Posted Mar 26, 2026 at 2:47PM Posted Mar 27, 2026 at 6:26AM

Likes: 0 Reposts: 0 Views: 0
Score 10