Papers
Research papers from arXiv and related sources
Evolutionary Multimodal Reasoning via Hierarchical Semantic Representation for Intent Recognition
Multimodal intent recognition aims to infer human intents by jointly modeling various modalities, playing a pivotal role in real-world dialogue systems. However, current methods struggle to model h...
Qianrui Zhou, Hua Xu, Yunjin Gu, Yifan Wang, Songze Li, Hanlei Zhang
In-Context Environments Induce Evaluation-Awareness in Language Models
Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness...
Maheep Chaudhary
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. Howeve...
Jialong Chen, Xander Xu, Hu Wei, Chuan Chen, Bing Zhao
Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning
The goal of Open-Vocabulary Compositional Zero-Shot Learning (OV-CZSL) is to recognize attribute-object compositions in the open-vocabulary setting, where compositions of both seen and unseen attri...
Yihang Duan, Jiong Wang, Pengpeng Zeng, Ji Zhang, Lei Zhao, Chong Wang, Jingkuan Song, Lianli Gao
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
Audio-Visual Speech Recognition (AVSR) integrates acoustic and visual information to enhance robustness in adverse acoustic conditions. Recent advances in Large Language Models (LLMs) have yielded ...
Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, Ming Li
A Rubric-Supervised Critic from Sparse Real-World Outcomes
Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In contrast, real-world coding agents operate with humans ...
Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig
Enhancing Variational Quantum Eigensolvers for SU(2) Lattice Gauge Theory via Systematic State Preparation
Computing the vacuum and energy spectrum in non-Abelian, interacting lattice gauge theories remains an open challenge, in part because approximating the continuum limit requires large lattices and ...
Klaus Liegener, Dominik Mattern, Alexander Korobov, Lisa Krüger, Manuel Geiger, Malay Singh, Long...
The Stellar Mass Function for Nine Massive Galaxy Clusters in the Local Universe
We measure galaxy stellar mass functions (SMFs) for nine of the most massive galaxy clusters in the local universe ($0.07 < z < 0.11$) using deep and complete spectroscopy from the MAssive Cluster ...
Jong-In Park, Jubee Sohn, Margaret J. Geller, Ken J. Rines, Antonaldo Diaferio
When and Where to Reset Matters for Long-Term Test-Time Adaptation
When continual test-time adaptation (TTA) persists over the long term, errors accumulate in the model and further cause it to predict only a few classes for all inputs, a phenomenon known as model ...
Taejun Lim, Joong-Won Hwang, Kibok Lee
Loading of Relativistic Maxwellian-type Distributions Revisited
A simple numerical method for loading of a relativistic Maxwellian-type distribution is proposed based on inverse transform sampling. The relativistic Maxwellian energy distribution is introduced a...
Takayuki Umeda
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large langu...
Qinsi Wang, Hancheng Ye, Jinhee Kim, Jinghan Ke, Yifei Wang, Martin Kuo, Zishan Shao, Dongting Li...
Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism
World models are essential for planning and evaluation in agentic systems, yet existing approaches lie at two extremes: hand-engineered simulators that offer consistency and reproducibility but are...
Zheyu Chen, Zhuohuan Li, Chuanhao Li
LifeBench: A Benchmark for Long-Horizon Multi-Source Memory
Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily t...
Zihao Cheng, Weixin Wang, Yu Zhao, Ziyang Ren, Jiaxuan Chen, Ruiyang Xu, Shuai Huang, Yang Chen, ...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Scientific discovery still relies heavily on the manual efforts of individual researchers, leading to limited exploration, redundant trials, and reduced reproducibility. Human-participant data anal...
Satoshi Oyama, Yuko Sakurai, Hisashi Kashima
Towards Effective Orchestration of AI x DB Workloads
AI-driven analytics are increasingly crucial to data-centric decision-making. The practice of exporting data to machine learning runtimes incurs high overhead, limits robustness to data drift, and ...
Naili Xing, Haotian Gao, Zhanhao Zhao, Shaofeng Cai, Zhaojing Luo, Yuncheng Wu, Zhongle Xie, Meih...
LiDAR Prompted Spatio-Temporal Multi-View Stereo for Autonomous Driving
Accurate metric depth is critical for autonomous driving perception and simulation, yet current approaches struggle to achieve high metric accuracy, multi-view and temporal consistency, and cross-d...
Qihao Sun, Jiarun Liu, Ziqian Ni, Jianyun Xu, Tao Xie, Lijun Zhao, Ruifeng Li, Sheng Yang
Seeing as Experts Do: A Knowledge-Augmented Agent for Open-Set Fine-Grained Visual Understanding
Fine-grained visual understanding is shifting from static classification to knowledge-augmented reasoning, where models must justify as well as recognise. Existing approaches remain limited by clos...
Junhan Chen, Zilu Zhou, Yujun Tong, Dongliang Chang, Yitao Luo, Zhanyu Ma
AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM l...
Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, ...
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning ...
Zonglin Yang, Lidong Bing
Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing
The ongoing shift of AI models from centralized cloud APIs to local AI agents on edge devices is enabling \textit{Client-Side Autonomous Agents (CSAAs)} -- persistent personal agents that can plan,...
Taotao Wang, Lizhao You, Jingwen Tong, Chonghe Zhao, Shengli Zhang