Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs

A search for $CP$ violation with an entangled system of $Ξ^-$-$\barΞ^+$ pairs is performed, using $(10,087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII experiment. A nine-dimensional h...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...

2602.20524 2026-02-24
AI LLM

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of...

Rakshit Trivedi, Kartik Sharma, David C Parkes

2602.20517 2026-02-24
AI LLM

FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill

In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context. Sparse attention significantly reduces self-atte...

Rakshith Jayanth, Viktor Prasanna

2602.20515 2026-02-24
AI LLM

From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility

As large language models (LLMs) continue to improve at completing discrete tasks, they are being integrated into increasingly complex and diverse real-world systems. However, task-level success alo...

Gavin Levinson, Keith Feldman

2602.20513 2026-02-24
AI LLM

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too heavy for deployment on edge devices. We...

Abdelrahman Shaker, Ahmed Heakl, Jaseel Muhammad, Ritesh Thawkar, Omkar Thawakar, Senmao Li, Hish...

2602.20161 2026-02-23
AI LLM

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code,...

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko

2602.20156 2026-02-23
AI LLM

Agentic AI for Scalable and Robust Optical Systems Control

We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics interprets natural language tasks and exe...

Zehao Wang, Mingzhe Han, Wei Cheng, Yue-Kai Huang, Philip Ji, Denton Wu, Mahdi Safari, Flemming H...

2602.20144 2026-02-23
AI LLM

Do Large Language Models Understand Data Visualization Rules?

Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large language models (LLMs) can generate ...

Martin Sinnona, Valentin Bonas, Emmanuel Iarussi, Viviana Siless

2602.20137 2026-02-23
AI LLM

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the t...

Mohammad Amanlou, Erfan Shafiee Moghaddam, Yasaman Amou Jafari, Mahdi Noori, Farhan Farsi, Behnam...

2602.20135 2026-02-23
AI LLM

AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

The paradigm of automated program generation is shifting from one-shot generation to inference-time search, where Large Language Models (LLMs) function as semantic mutation operators within evoluti...

Mert Cemri, Shubham Agrawal, Akshat Gupta, Shu Liu, Audrey Cheng, Qiuyang Mang, Ashwin Naren, Lut...

2602.20133 2026-02-23
AI LLM

LAD: Learning Advantage Distribution for Reasoning

Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dominant reward signals, while neglectin...

Wendi Li, Sharon Li

2602.20132 2026-02-23
AI LLM

To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Sele...

Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang

2602.20130 2026-02-23
AI LLM

NanoKnow: How to Know What Your Language Model Knows

How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a "black box" -- unknown or inaccessible. The recent release o...

Lingwei Gu, Nour Jedidi, Jimmy Lin

2602.20122 2026-02-23
AI LLM

Benchmarking Unlearning for Vision Transformers

Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, research into transformer architecture...

Kairan Zhao, Iurie Luca, Peter Triantafillou

2602.20114 2026-02-23
AI LLM

Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration

In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance ...

Hasan Amin, Ming Yin, Rajiv Khanna

2602.20104 2026-02-23
AI LLM

BarrierSteer: LLM Safety via Learning Barrier Steering

Despite the state-of-the-art performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a major obstacle to ...

Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao

2602.20102 2026-02-23
AI LLM

Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine

The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wav...

Soumick Chatterjee

2602.20100 2026-02-23
AI LLM

CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching

As large language models (LLMs) witness increasing deployment in complex, high-stakes decision-making scenarios, it becomes imperative to ground their reasoning in causality rather than spurious co...

Yuzhe Wang, Yaochen Zhu, Jundong Li

2602.20094 2026-02-23
AI LLM

How Retrieved Context Shapes Internal Representations in RAG

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retrieved context is often non-trivial. In r...

Samuel Yeh, Sharon Li

2602.20091 2026-02-23
AI LLM

Do Large Language Models Understand Data Visualization Principles?

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can genera...

Martin Sinnona, Valentin Bonas, Viviana Siless, Emmanuel Iarussi

2602.20084 2026-02-23