Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts

Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existing research on the behaviour of LLM-based multi-agents relies on ad hoc prompts and lacks a princ...

Hongbo Bo, Jingyu Hu, Weiru Liu

2603.09890 2026-03-10
AI LLM

Benchmarking Political Persuasion Risks Across Frontier Large Language Models

Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campai...

Zhongren Chen, Joshua Kalla, Quan Le

2603.09884 2026-03-10
AI LLM

DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary

Human-centric video generation has advanced rapidly, yet existing methods struggle to produce controllable and physically consistent Human-Object Interaction (HOI) videos. Existing works rely on de...

Jiazhi Guan, Quanwei Yang, Luying Huang, Junhao Liang, Borong Liang, Haocheng Feng, Wei He, Kaisi...

2603.09883 2026-03-10
AI LLM

Do What I Say: A Spoken Prompt Dataset for Instruction-Following

Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios whe...

Maike Züfle, Sara Papi, Fabian Retkowski, Szymon Mazurek, Marek Kasztelnik, Alexander Waibel, Lui...

2603.09881 2026-03-10
AI LLM

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Advances in large language models (LLMs) have enabled significant capabilities in audio processing, resulting in state-of-the-art models now known as Large Audio Language Models (LALMs). However, m...

Laya Iyer, Angelina Wang, Sanmi Koyejo

2603.09853 2026-03-10
AI LLM

RecThinker: An Agentic Framework for Tool-Augmented Reasoning in Recommendation

Large Language Models (LLMs) have revolutionized recommendation agents by providing superior reasoning and flexible decision-making capabilities. However, existing methods mainly follow a passive i...

Haobo Zhang, Yutao Zhu, Kelong Mao, Tianhao Li, Zhicheng Dou

2603.09843 2026-03-10
AI LLM

Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents

Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agent...

Naman Gupta, Vaibhav Singh, Arun Iyer, Kirankumar Shiragur, Pratham Grover, Ramakrishna B. Bairi,...

2603.09835 2026-03-10
AI LLM

Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models

Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficie...

Xinyao Zhuang, Jose Echevarria, Kaan Akşit

2603.09832 2026-03-10
AI LLM

MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the mul...

Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju Hwang

2603.09827 2026-03-10
AI LLM

One-Eval: An Agentic System for Automated and Traceable LLM Evaluation

Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks,...

Chengyu Shen, Yanheng Hou, Minghui Pan, Runming He, Zhen Hao Wong, Meiyi Qiang, Zhou Liu, Hao Lia...

2603.09821 2026-03-10
AI LLM

EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions

Recent advancements in speech captioning models have enabled the generation of rich, fine-grained captions for emotional speech. However, the evaluation of such captions remains a critical bottlene...

Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang, Shahin Amiriparian, Jun Luo, Björn Schuller

2603.09820 2026-03-10
AI LLM

RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding

Audio-Visual Learning (AVL) is one fundamental task of multi-modality learning and embodied intelligence, displaying the vital role in scene understanding and interaction. However, previous researc...

Muyi Sun, Yixuan Wang, Hong Wang, Chen Su, Man Zhang, Xingqun Qi, Qi Li, Zhenan Sun

2603.09809 2026-03-10
AI LLM

A Hybrid Model-Assisted Approach for Path Loss Prediction in Suburban Scenarios

Accurate path loss prediction is crucial for wireless network planning and optimization in suburban environments with complex terrain variation and diverse land cover. This paper proposes a model a...

Chenlong Wang, Bo Ai, Ruiming Chen, Ruisi He, Mi Yang, Yuxin Zhang, Weirong Liu, Liu Liu

2603.09808 2026-03-10
AI LLM

MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations

Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landsca...

Abhishikth Mallampalli, Sridhara Dasu

2603.09800 2026-03-10
AI LLM

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Tr...

Jonah Brown-Cohen, David Lindner, Rohin Shah

2603.09786 2026-03-10
AI LLM

TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions

As robotic systems execute increasingly difficult task sequences, so does the number of ways in which they can fail. Video Anomaly Detection (VAD) frameworks typically focus on singular, low-level ...

Nerea Gallego, Fernando Salanova, Claudio Mannarano, Cristian Mahulea, Eduardo Montijano

2603.09782 2026-03-10
AI LLM

CLIOPATRA: Extracting Private Information from LLM Insights

As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use. Clio's privacy protections rely on layering mult...

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz

2603.09781 2026-03-10
AI LLM

Ego: Embedding-Guided Personalization of Vision-Language Models

AI assistants that support humans in daily life are becoming increasingly feasible, driven by the rapid advancements in multimodal language models. A key challenge lies in overcoming the generic na...

Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi

2603.09771 2026-03-10
AI LLM

LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existi...

Mingyu Kang, Hyein Seo, Yuna Jeong, Junhyeong Park, Yong Suk Choi

2603.09759 2026-03-10
AI LLM

Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG

Standardizing food terms from product labels and menus into ontology concepts is a prerequisite for trustworthy dietary assessment and safety reporting. The dominant approach to Named Entity Linkin...

Jan Drole, Ana Gjorgjevikj, Barbara Korouši'c Seljak, Tome Eftimov

2603.09758 2026-03-10