Papers
Research papers from arXiv and related sources
Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs
A search for $CP$ violation with an entangled system of $Ξ^-$-$\barΞ^+$ pairs is performed, using $(10,087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII experiment. A nine-dimensional h...
BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...
Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of...
Rakshit Trivedi, Kartik Sharma, David C Parkes
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context. Sparse attention significantly reduces self-atte...
Rakshith Jayanth, Viktor Prasanna
From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility
As large language models (LLMs) continue to improve at completing discrete tasks, they are being integrated into increasingly complex and diverse real-world systems. However, task-level success alo...
Gavin Levinson, Keith Feldman
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too heavy for deployment on edge devices. We...
Abdelrahman Shaker, Ahmed Heakl, Jaseel Muhammad, Ritesh Thawkar, Omkar Thawakar, Senmao Li, Hish...
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code,...
David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
Agentic AI for Scalable and Robust Optical Systems Control
We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics interprets natural language tasks and exe...
Zehao Wang, Mingzhe Han, Wei Cheng, Yue-Kai Huang, Philip Ji, Denton Wu, Mahdi Safari, Flemming H...
Do Large Language Models Understand Data Visualization Rules?
Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large language models (LLMs) can generate ...
Martin Sinnona, Valentin Bonas, Emmanuel Iarussi, Viviana Siless
KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration
With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the t...
Mohammad Amanlou, Erfan Shafiee Moghaddam, Yasaman Amou Jafari, Mahdi Noori, Farhan Farsi, Behnam...
AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
The paradigm of automated program generation is shifting from one-shot generation to inference-time search, where Large Language Models (LLMs) function as semantic mutation operators within evoluti...
Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lut...
LAD: Learning Advantage Distribution for Reasoning
Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dominant reward signals, while neglectin...
Wendi Li, Sharon Li
To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Sele...
Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang
NanoKnow: How to Know What Your Language Model Knows
How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a "black box" -- unknown or inaccessible. The recent release o...
Lingwei Gu, Nour Jedidi, Jimmy Lin
Benchmarking Unlearning for Vision Transformers
Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, research into transformer architecture...
Kairan Zhao, Iurie Luca, Peter Triantafillou
Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration
In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance ...
Hasan Amin, Ming Yin, Rajiv Khanna
BarrierSteer: LLM Safety via Learning Barrier Steering
Despite the state-of-the-art performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a major obstacle to ...
Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao
Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine
The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wav...
Soumick Chatterjee
CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching
As large language models (LLMs) witness increasing deployment in complex, high-stakes decision-making scenarios, it becomes imperative to ground their reasoning in causality rather than spurious co...
Yuzhe Wang, Yaochen Zhu, Jundong Li
How Retrieved Context Shapes Internal Representations in RAG
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retrieved context is often non-trivial. In r...
Samuel Yeh, Sharon Li
Do Large Language Models Understand Data Visualization Principles?
Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can genera...
Martin Sinnona, Valentin Bonas, Viviana Siless, Emmanuel Iarussi