Papers
Research papers from arXiv and related sources
Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts
Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existing research on the behaviour of LLM-based multi-agents relies on ad hoc prompts and lacks a princ...
Hongbo Bo, Jingyu Hu, Weiru Liu
Benchmarking Political Persuasion Risks Across Frontier Large Language Models
Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campai...
Zhongren Chen, Joshua Kalla, Quan Le
DISPLAY: Directable Human-Object Interaction Video Generation via Sparse Motion Guidance and Multi-Task Auxiliary
Human-centric video generation has advanced rapidly, yet existing methods struggle to produce controllable and physically consistent Human-Object Interaction (HOI) videos. Existing works rely on de...
Jiazhi Guan, Quanwei Yang, Luying Huang, Junhao Liang, Borong Liang, Haocheng Feng, Wei He, Kaisi...
Do What I Say: A Spoken Prompt Dataset for Instruction-Following
Speech Large Language Models (SLLMs) have rapidly expanded, supporting a wide range of tasks. These models are typically evaluated using text prompts, which may not reflect real-world scenarios whe...
Maike Züfle, Sara Papi, Fabian Retkowski, Szymon Mazurek, Marek Kasztelnik, Alexander Waibel, Lui...
SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases
Advances in large language models (LLMs) have enabled significant capabilities in audio processing, resulting in state-of-the-art models now known as Large Audio Language Models (LALMs). However, m...
Laya Iyer, Angelina Wang, Sanmi Koyejo
RecThinker: An Agentic Framework for Tool-Augmented Reasoning in Recommendation
Large Language Models (LLMs) have revolutionized recommendation agents by providing superior reasoning and flexible decision-making capabilities. However, existing methods mainly follow a passive i...
Haobo Zhang, Yutao Zhu, Kelong Mao, Tianhao Li, Zhicheng Dou
Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents
Sequential multi-agent reasoning frameworks such as Chain-of-Agents (CoA) handle long-context queries by decomposing inputs into chunks and processing them sequentially using LLM-based worker agent...
Naman Gupta, Vaibhav Singh, Arun Iyer, Kirankumar Shiragur, Pratham Grover, Ramakrishna B. Bairi,...
Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models
Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficie...
Xinyao Zhuang, Jose Echevarria, Kaan Akşit
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the mul...
Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju Hwang
One-Eval: An Agentic System for Automated and Traceable LLM Evaluation
Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks,...
Chengyu Shen, Yanheng Hou, Minghui Pan, Runming He, Zhen Hao Wong, Meiyi Qiang, Zhou Liu, Hao Lia...
EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions
Recent advancements in speech captioning models have enabled the generation of rich, fine-grained captions for emotional speech. However, the evaluation of such captions remains a critical bottlene...
Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang, Shahin Amiriparian, Jun Luo, Björn Schuller
RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding
Audio-Visual Learning (AVL) is one fundamental task of multi-modality learning and embodied intelligence, displaying the vital role in scene understanding and interaction. However, previous researc...
Muyi Sun, Yixuan Wang, Hong Wang, Chen Su, Man Zhang, Xingqun Qi, Qi Li, Zhenan Sun
A Hybrid Model-Assisted Approach for Path Loss Prediction in Suburban Scenarios
Accurate path loss prediction is crucial for wireless network planning and optimization in suburban environments with complex terrain variation and diverse land cover. This paper proposes a model a...
Chenlong Wang, Bo Ai, Ruiming Chen, Ruisi He, Mi Yang, Yuxin Zhang, Weirong Liu, Liu Liu
MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations
Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landsca...
Abhishikth Mallampalli, Sridhara Dasu
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth
Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Tr...
Jonah Brown-Cohen, David Lindner, Rohin Shah
TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions
As robotic systems execute increasingly difficult task sequences, so does the number of ways in which they can fail. Video Anomaly Detection (VAD) frameworks typically focus on singular, low-level ...
Nerea Gallego, Fernando Salanova, Claudio Mannarano, Cristian Mahulea, Eduardo Montijano
CLIOPATRA: Extracting Private Information from LLM Insights
As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use. Clio's privacy protections rely on layering mult...
Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz
Ego: Embedding-Guided Personalization of Vision-Language Models
AI assistants that support humans in daily life are becoming increasingly feasible, driven by the rapid advancements in multimodal language models. A key challenge lies in overcoming the generic na...
Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi
LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control
Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existi...
Mingyu Kang, Hyein Seo, Yuna Jeong, Junhyeong Park, Yong Suk Choi
Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG
Standardizing food terms from product labels and menus into ontology concepts is a prerequisite for trustworthy dietary assessment and safety reporting. The dominant approach to Named Entity Linkin...
Jan Drole, Ana Gjorgjevikj, Barbara Korouši'c Seljak, Tome Eftimov