Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition

Multimodal intent recognition aims to infer human intents by jointly modeling various modalities, playing a pivotal role in real-world dialogue systems. However, current methods struggle to model h...

Qianrui Zhou, Hua Xu, Yunjin Gu, Yifan Wang, Songze Li, Hanlei Zhang

2603.03827 2026-03-04
AI LLM

In-Context Environments Induce Evaluation-Awareness in Language Models

Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness...

Maheep Chaudhary

2603.03824 2026-03-04
AI LLM

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. Howeve...

Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, Bing Zhao

2603.03823 2026-03-04
AI LLM

Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning

The goal of Open-Vocabulary Compositional Zero-Shot Learning (OV-CZSL) is to recognize attribute-object compositions in the open-vocabulary setting, where compositions of both seen and unseen attri...

Yihang Duan, Jiong Wang, Pengpeng Zeng, Ji Zhang, Lei Zhao, Chong Wang, Jingkuan Song, Lianli Gao

2603.03815 2026-03-04
AI LLM

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Audio-Visual Speech Recognition (AVSR) integrates acoustic and visual information to enhance robustness in adverse acoustic conditions. Recent advances in Large Language Models (LLMs) have yielded ...

Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, Ming Li

2603.03811 2026-03-04
TESTING

A Rubric-Supervised Critic from Sparse Real-World Outcomes

Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In contrast, real-world coding agents operate with humans ...

Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig

2603.03800 2026-03-04
TESTING

Enhancing Variational Quantum Eigensolvers for SU(2) Lattice Gauge Theory via Systematic State Preparation

Computing the vacuum and energy spectrum in non-Abelian, interacting lattice gauge theories remains an open challenge, in part because approximating the continuum limit requires large lattices and ...

Klaus Liegener, Dominik Mattern, Alexander Korobov, Lisa Krüger, Manuel Geiger, Malay Singh, Long...

2603.03799 2026-03-04
TESTING

The Stellar Mass Function for Nine Massive Galaxy Clusters in the Local Universe

We measure galaxy stellar mass functions (SMFs) for nine of the most massive galaxy clusters in the local universe ($0.07 < z < 0.11$) using deep and complete spectroscopy from the MAssive Cluster ...

Jong-In Park, Jubee Sohn, Margaret J. Geller, Ken J. Rines, Antonaldo Diaferio

2603.03797 2026-03-04
TESTING

When and Where to Reset Matters for Long-Term Test-Time Adaptation

When continual test-time adaptation (TTA) persists over the long term, errors accumulate in the model and further cause it to predict only a few classes for all inputs, a phenomenon known as model ...

Taejun Lim, Joong-Won Hwang, Kibok Lee

2603.03796 2026-03-04
TESTING

Loading of Relativistic Maxwellian-type Distributions Revisited

A simple numerical method for loading of a relativistic Maxwellian-type distribution is proposed based on inverse transform sampling. The relativistic Maxwellian energy distribution is introduced a...

Takayuki Umeda

2603.03791 2026-03-04
AI LLM

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large langu...

Qinsi Wang, Hancheng Ye, Jinhee Kim, Jinghan Ke, Yifei Wang, Martin Kuo, Zishan Shao, Dongting Li...

2603.03790 2026-03-04
AI LLM

Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

World models are essential for planning and evaluation in agentic systems, yet existing approaches lie at two extremes: hand-engineered simulators that offer consistency and reproducibility but are...

Zheyu Chen, Zhuohuan Li, Chuanhao Li

2603.03784 2026-03-04
AI LLM

LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily t...

Zihao Cheng, Weixin Wang, Yu Zhao, Ziyang Ren, Jiaxuan Chen, Ruiyang Xu, Shuai Huang, Yang Chen, ...

2603.03781 2026-03-04
AI LLM

MACC: Multi-Agent Collaborative Competition for Scientific Exploration

Scientific discovery still relies heavily on the manual efforts of individual researchers, leading to limited exploration, redundant trials, and reduced reproducibility. Human-participant data anal...

Satoshi Oyama, Yuko Sakurai, Hisashi Kashima

2603.03780 2026-03-04
AI LLM

Towards Effective Orchestration of AI x DB Workloads

AI-driven analytics are increasingly crucial to data-centric decision-making. The practice of exporting data to machine learning runtimes incurs high overhead, limits robustness to data drift, and ...

Naili Xing, Haotian Gao, Zhanhao Zhao, Shaofeng Cai, Zhaojing Luo, Yuncheng Wu, Zhongle Xie, Meih...

2603.03772 2026-03-04
AI LLM

LiDAR Prompted Spatio-Temporal Multi-View Stereo for Autonomous Driving

Accurate metric depth is critical for autonomous driving perception and simulation, yet current approaches struggle to achieve high metric accuracy, multi-view and temporal consistency, and cross-d...

Qihao Sun, Jiarun Liu, Ziqian Ni, Jianyun Xu, Tao Xie, Lijun Zhao, Ruifeng Li, Sheng Yang

2603.03765 2026-03-04
TESTING

Seeing as Experts Do: A Knowledge-Augmented Agent for Open-Set Fine-Grained Visual Understanding

Fine-grained visual understanding is shifting from static classification to knowledge-augmented reasoning, where models must justify as well as recognise. Existing approaches remain limited by clos...

Junhan Chen, Zilu Zhou, Yujun Tong, Dongliang Chang, Yitao Luo, Zhanyu Ma

2603.03762 2026-03-04
AI LLM

AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation

LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM l...

Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, ...

2603.03761 2026-03-04
AI LLM

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning ...

Zonglin Yang, Lidong Bing

2603.03756 2026-03-04
AI LLM

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

The ongoing shift of AI models from centralized cloud APIs to local AI agents on edge devices is enabling \textit{Client-Side Autonomous Agents (CSAAs)} -- persistent personal agents that can plan,...

Taotao Wang, Lizhao You, Jingwen Tong, Chonghe Zhao, Shengli Zhang

2603.03753 2026-03-04