Papers
Research papers from arXiv and related sources
Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in conversational LLMs: turn amplification, in which ...
Zachary Coalson, Bo Fang, Sanghyun Hong
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
Hands play a central role in daily life, yet modeling natural hand motions remains underexplored. Existing methods that tackle text-to-hand-motion generation or hand animation captioning rely on st...
Balamurugan Thambiraja, Omid Taheri, Radek Danecek, Giorgio Becherini, Gerard Pons-Moll, Justus T...
Sink-Aware Pruning for Diffusion Language Models
Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typ...
Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecos...
Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A. Pinar Ozisik, Stephen...
Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval
We propose a two-stage "Mine and Refine" contrastive training framework for semantic text embeddings to enhance multi-category e-commerce search retrieval. Large scale e-commerce search demands emb...
Jiaqi Xi, Raghav Saboo, Luming Chen, Martin Wang, Sudeep Das
Multi-Round Human-AI Collaboration with User-Specified Requirements
As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a hum...
Sima Noorani, Shayan Kiyani, Hamed Hassani, George Pappas
FAMOSE: A ReAct Approach to Automated Feature Discovery
Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space tradi...
Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li
When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Extern...
Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani
Unmasking the Factual-Conceptual Gap in Persian Language Models
While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. W...
Alireza Sakhaeirad, Ali Ma'manpoosh, Arshia Hemmat
What Makes a Good LLM Agent for Real-world Penetration Testing?
LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28 LLM-based penetration testing systems and evalu...
Gelei Deng, Yi Liu, Yuekang Li, Ruozhao Yang, Xiaofei Xie, Jie Zhang, Han Qiu, Tianwei Zhang
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for w...
Luke Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han
Exploring Novel Data Storage Approaches for Large-Scale Numerical Weather Prediction
Driven by scientific and industry ambition, HPC and AI applications such as operational Numerical Weather Prediction (NWP) require processing and storing ever-increasing data volumes as fast as pos...
Nicolau Manubens Gil
Towards Anytime-Valid Statistical Watermarking
The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promisi...
Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan
AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing
PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-base...
Jianda Du, Youran Sun, Haizhao Yang
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the ...
Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui ...
Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment
Music generation has advanced markedly through multimodal deep learning, enabling models to synthesize audio from text and, more recently, from images. However, existing image-conditioned systems s...
Ivan Rinaldi, Matteo Mendula, Nicola Fanelli, Florence Levé, Matteo Testi, Giovanna Castellano, G...
The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?
Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through ...
Jayadev Billa
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Con...
Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L....
Building an AI-native Research Ecosystem for Experimental Particle Physics: A Community Vision
Experimental particle physics seeks to understand the universe by probing its fundamental particles and forces and exploring how they govern the large-scale processes that shape cosmic evolution. T...
Thea Klaeboe Aarrestad, Alaa Abdelhamid, Haider Abidi, Jahred Adelman, Jennifer Adelman-McCarthy,...
Momentum Measurement of Charged Particles in FASER's Emulsion Detector at the LHC
We present a momentum measurement method based on multiple Coulomb scattering (MCS) in the FASER$ν$ emulsion detector. The measurement of charged-particle momenta is essential for studying neutrino...
FASER Collaboration, Roshan Mammen Abraham, Xiaocong Ai, Saul Alonso Monsalve, John Anders, Emma...