Papers
Research papers from arXiv and related sources
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse l...
Xiang Li, Jiabao Gao, Sipei Lin, Xuan Zhou, Chi Zhang, Bo Cheng, Jiale Han, Benyou Wang
A Novel Hierarchical Multi-Agent System for Payments Using LLMs
Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Existing agentic solutions have gained significant...
Joon Kiat Chua, Donghao Huang, Zhaoxia Wang
Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a compr...
Donghao Huang, Zhaoxia Wang
CIRCLE: A Framework for Evaluating AI from a Real-World Lens
This paper proposes CIRCLE, a six-stage, lifecycle-based framework to bridge the reality gap between model-centric performance metrics and AI's materialized outcomes in deployment. While existing f...
Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, F...
Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving
Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems where hundreds of adapters must be h...
Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, ...
RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models
Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward m...
Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna Pás...
Designing AI Tutors for Interest-Based Learning: Insights from Human Instructors
Interest-based learning (IBL) is a paradigm of instruction in which educational content is contextualized using learners' interests to enhance content relevance. IBL has been shown to result in imp...
Abhishek Kulkarni, Sharon Lynn Chu
Breaking the Illusion of Artificial Consensus: Clone-Robust Weighting for Arbitrary Metric Spaces
Independent media are central to democratic decision-making, yet recent technological developments, such as social media, pseudonymous identities, and generative AI, have made them more vulnerable ...
Damien Berriaud, Roger Wattenhofer
Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
Video anomaly detection (VAD) aims to identify abnormal events in videos. Traditional VAD methods generally suffer from the high costs of labeled data and full training, thus some recent works have...
Zhaolin Cai, Fan Li, Huiyu Duan, Lijun He, Guangtao Zhai
Interpretable Debiasing of Vision-Language Models for Social Fairness
The rapid advancement of Vision-Language models (VLMs) has raised growing concerns that their black-box reasoning processes could lead to unintended forms of social bias. Current debiasing approach...
Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, an...
Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
Ask don't tell: Reducing sycophancy in large language models
Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and ...
Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau
HotelQuEST: Balancing Quality and Efficiency in Agentic Search
Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs). However, existing benchmarks primarily focus on quality, overlooking effic...
Guy Hadad, Shadi Iskander, Oren Kalinsky, Sofia Tolmach, Ran Levy, Haggai Roitman
High-Modularity Graph Partitioning Through NLP Techniques and Maximal Clique Enumeration
Natural Language Processing (NLP) provides highly effective tools for interpreting and handling human language, offering a broad spectrum of applications. In this paper, we address a classic combin...
Marco D'Elia, Irene Finocchi, Maurizio Patrignani
The Astonishing Ability of Large Language Models to Parse Jabberwockified Language
We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsens...
Gary Lupyan, Senyi Yang
The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments
Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured anima...
Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente
Personal Data as a Human Right: A New Social Contract Based on Data Sovereignty, Human Dignity and Data Personalism
In an era of ubiquitous data collection, platform dominance, and AI-mediated governance, the social contract of digital life is increasingly shaped by a few private actors rather than democratic de...
J. M. Alvarez-Pallete, R. Calderón, M. T. Corzo, E. C. Garrido-Merchán, G. López, I. Navarro-Mend...
Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding
AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vibe coders with less development experi...
Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain ...
Qihua Dong, Kuo Yang, Lin Ju, Handong Zhao, Yitian Zhang, Yizhou Wang, Huimin Zeng, Jianglin Lu, ...
AoE: Always-on Egocentric Human Video Collection for Embodied AI
Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure cost...
Bowen Yang, Zishuo Li, Yang Sun, Changtao Miao, Yifan Yang, Man Luo, Xiaotong Yan, Feng Jiang, Ji...