Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS

Current Text-to-Speech (TTS) systems typically use separate models for speech-prompted and text-prompted timbre control. While unifying both control signals into a single model is desirable, the ch...

Zihao Zheng, Wen Wu, Chao Zhang, Mengyue Wu, Xuenan Xu

2603.16280 2026-03-17
AI LLM

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

Theory of Mind (ToM) refers to the ability to reason about others' mental states, and higher-order ToM involves considering that others also possess their own ToM. Equipping large language model (L...

Chunjiang Mu, Ya Zeng, Qiaosheng Zhang, Kun Shao, Chen Chu, Hao Guo, Danyang Jia, Zhen Wang, Shuy...

2603.16264 2026-03-17
AI LLM

Human/AI Collective Intelligence for Deliberative Democracy: A Human-Centred Design Approach

This chapter introduces the concept of Collective Intelligence for Deliberative Democracy (CI4DD). We propose that the use of computational tools, specifically artificial intelligence to advance de...

Anna De Liddo, Lucas Anastasiou, Simon Buckingham Shum

2603.16260 2026-03-17
AI LLM

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition

Recently, Multimodal Large Language Models (MLLMs) have demonstrated significant potential in complex visual tasks through the integration of Chain-of-Thought (CoT) reasoning. However, in Video Que...

Xiaokun Sun, Yubo Wang, Haoyu Cao, Linli Xu

2603.16256 2026-03-17
AI LLM

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box ...

Junxin Wang, Dai Guan, Weijie Qiu, Zhihang Li, Yongbo Gai, Zhengyi Yang, Mengyu Zhou, Erchao Zhao...

2603.16253 2026-03-17
AI LLM

Visual Prompt Discovery via Semantic Exploration

LVLMs encounter significant challenges in image understanding and visual reasoning, leading to critical perception failures. Visual prompts, which incorporate image manipulation code, have shown pr...

Jaechang Kim, Yotaro Shimose, Zhao Wang, Kuang-Da Wang, Jungseul Ok, Shingo Takamatsu

2603.16250 2026-03-17
AI LLM

How to Utilize Complementary Vision-Text Information for 2D Structure Understanding

LLMs typically linearize 2D tables into 1D sequences to fit their autoregressive architecture, which weakens row-column adjacency and other layout cues. In contrast, purely visual encoders can capt...

Jiancheng Dong, Pengyue Jia, Derong Xu, Jiawei Cheng, Jingyu Peng, Chao Zhang, Bowen Liu, Xin Sun...

2603.16245 2026-03-17
AI LLM

More Rounds, More Noise: Why Multi-Turn Review Fails to Improve Cross-Context Verification

Cross-Context Review (CCR) improves LLM verification by separating production and review into independent sessions. A natural extension is multi-turn review: letting the reviewer ask follow-up ques...

Song Tae-Eun

2603.16244 2026-03-17
AI LLM

Mixture-of-Depths Attention

Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually di...

Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang, Chen Chen, La...

2603.15619 2026-03-16
AI LLM

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for robotic manipulation, in which reliable action prediction critically depends on accurately interpreting and int...

Yulin Luo, Hao Chen, Zhuangzhe Wu, Bowen Sui, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Qiuxuan Fen...

2603.15618 2026-03-16
AI LLM

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel r...

Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, ...

2603.15617 2026-03-16
AI LLM

Mechanistic Origin of Moral Indifference in Language Models

Existing behavioral alignment techniques for Large Language Models (LLMs) often neglect the discrepancy between surface compliance and internal unaligned representations, leaving LLMs vulnerable to...

Lingyu Li, Yan Teng, Yingchun Wang

2603.15615 2026-03-16
AI LLM

Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion

Recent video diffusion models have made remarkable strides in visual quality, yet precise, fine-grained control remains a key bottleneck that limits practical customizability for content creation. ...

Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen, Soo Ye Kim, Nanxuan Zhao, Haitian Zheng, Qing Liu, He...

2603.15614 2026-03-16
AI LLM

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view images and monocular videos. Existing m...

Yukang Cao, Haozhe Xie, Fangzhou Hong, Long Zhuo, Zhaoxi Chen, Liang Pan, Ziwei Liu

2603.15612 2026-03-16
AI LLM

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewa...

Aozhe Wang, Yuchen Yan, Nan Zhou, Zhengxi Lu, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

2603.15611 2026-03-16
AI LLM

SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves...

Jesper Derehag, Carlos Calva, Timmy Ghiurau

2603.15599 2026-03-16
AI LLM

AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer

Existing video-to-audio (V2A) generation methods predominantly rely on text prompts alongside visual information to synthesize audio. However, two critical bottlenecks persist: semantic granularity...

Pengjun Fang, Yingqing He, Yazhou Xing, Qifeng Chen, Ser-Nam Lim, Harry Yang

2603.15597 2026-03-16
AI LLM

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industria...

Yuwen Du, Rui Ye, Shuo Tang, Xinyu Zhu, Yijun Lu, Yuzhu Cai, Siheng Chen

2603.15594 2026-03-16
AI LLM

Effective Distillation to Hybrid xLSTM Architectures

There have been numerous attempts to distill quadratic attention-based large language models (LLMs) into sub-quadratic linearized architectures. However, despite extensive research, such distilled ...

Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied, Anamaria-Roberta Hartl, David Stap, Piete...

2603.15590 2026-03-16
AI LLM

LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs

Data movement overheads increase the inference latency of state-of-the-art large language models (LLMs). These models commonly use the bfloat16 (BF16) format for stable training. Floating-point sta...

Miao Sun, Alish Kanani, Kaushik Shroff, Umit Ogras

2603.15589 2026-03-16