Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs

Multi-turn interaction length is a dominant factor in the operational costs of conversational LLMs. In this work, we present a new failure mode in conversational LLMs: turn amplification, in which ...

Zachary Coalson, Bo Fang, Sanghyun Hong

2602.17778 2026-02-19
AI LLM

CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

Hands play a central role in daily life, yet modeling natural hand motions remains underexplored. Existing methods that tackle text-to-hand-motion generation or hand animation captioning rely on st...

Balamurugan Thambiraja, Omid Taheri, Radek Danecek, Giorgio Becherini, Gerard Pons-Moll, Justus T...

2602.17770 2026-02-19
AI LLM

Sink-Aware Pruning for Diffusion Language Models

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typ...

Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen

2602.17664 2026-02-19
AI LLM

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecos...

Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A. Pinar Ozisik, Stephen...

2602.17753 2026-02-19
AI LLM

Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval

We propose a two-stage "Mine and Refine" contrastive training framework for semantic text embeddings to enhance multi-category e-commerce search retrieval. Large scale e-commerce search demands emb...

Jiaqi Xi, Raghav Saboo, Luming Chen, Martin Wang, Sudeep Das

2602.17654 2026-02-19
AI LLM

Multi-Round Human-AI Collaboration with User-Specified Requirements

As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a hum...

Sima Noorani, Shayan Kiyani, Hamed Hassani, George Pappas

2602.17646 2026-02-19
AI LLM

FAMOSE: A ReAct Approach to Automated Feature Discovery

Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space tradi...

Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li

2602.17641 2026-02-19
AI LLM

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Extern...

Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani

2602.17633 2026-02-19
AI LLM

Unmasking the Factual-Conceptual Gap in Persian Language Models

While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. W...

Alireza Sakhaeirad, Ali Ma'manpoosh, Arshia Hemmat

2602.17623 2026-02-19
AI LLM

What Makes a Good LLM Agent for Real-world Penetration Testing?

LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28 LLM-based penetration testing systems and evalu...

Gelei Deng, Yi Liu, Yuekang Li, Ruozhao Yang, Xiaofei Xie, Jie Zhang, Han Qiu, Tianwei Zhang

2602.17622 2026-02-19
AI LLM

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for w...

Luke Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han

2602.17616 2026-02-19
AI LLM

Exploring Novel Data Storage Approaches for Large-Scale Numerical Weather Prediction

Driven by scientific and industry ambition, HPC and AI applications such as operational Numerical Weather Prediction (NWP) require processing and storing ever-increasing data volumes as fast as pos...

Nicolau Manubens Gil

2602.17610 2026-02-19
AI LLM

Towards Anytime-Valid Statistical Watermarking

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promisi...

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan

2602.17608 2026-02-19
AI LLM

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-base...

Jianda Du, Youran Sun, Haizhao Yang

2602.17607 2026-02-19
AI LLM

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the ...

Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui ...

2602.17602 2026-02-19
AI LLM

Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment

Music generation has advanced markedly through multimodal deep learning, enabling models to synthesize audio from text and, more recently, from images. However, existing image-conditioned systems s...

Ivan Rinaldi, Matteo Mendula, Nicola Fanelli, Florence Levé, Matteo Testi, Giovanna Castellano, G...

2602.17599 2026-02-19
AI LLM

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through ...

Jayadev Billa

2602.17598 2026-02-19
AI LLM

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Con...

Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L....

2602.17594 2026-02-19
AI LLM

Building an AI-native Research Ecosystem for Experimental Particle Physics: A Community Vision

Experimental particle physics seeks to understand the universe by probing its fundamental particles and forces and exploring how they govern the large-scale processes that shape cosmic evolution. T...

Thea Klaeboe Aarrestad, Alaa Abdelhamid, Haider Abidi, Jahred Adelman, Jennifer Adelman-McCarthy,...

2602.17582 2026-02-19
AI LLM

Momentum Measurement of Charged Particles in FASER's Emulsion Detector at the LHC

We present a momentum measurement method based on multiple Coulomb scattering (MCS) in the FASER$ν$ emulsion detector. The measurement of charged-particle momenta is essential for studying neutrino...

FASER Collaboration, Roshan Mammen Abraham, Xiaocong Ai, Saul Alonso Monsalve, John Anders, Emma...

2602.17575 2026-02-19