Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse l...

Xiang Li, Jiabao Gao, Sipei Lin, Xuan Zhou, Chi Zhang, Bo Cheng, Jiale Han, Benyou Wang

2602.24080 2026-02-27
AI LLM

A Novel Hierarchical Multi-Agent System for Payments Using LLMs

Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Existing agentic solutions have gained significant...

Joon Kiat Chua, Donghao Huang, Zhaoxia Wang

2602.24068 2026-02-27
AI LLM

Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis

Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a compr...

Donghao Huang, Zhaoxia Wang

2602.24060 2026-02-27
AI LLM

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

This paper proposes CIRCLE, a six-stage, lifecycle-based framework to bridge the reality gap between model-centric performance metrics and AI's materialized outcomes in deployment. While existing f...

Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, F...

2602.24055 2026-02-27
AI LLM

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems where hundreds of adapters must be h...

Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, ...

2602.24044 2026-02-27
AI LLM

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward m...

Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna Pás...

2602.24040 2026-02-27
AI LLM

Designing AI Tutors for Interest-Based Learning: Insights from Human Instructors

Interest-based learning (IBL) is a paradigm of instruction in which educational content is contextualized using learners' interests to enhance content relevance. IBL has been shown to result in imp...

Abhishek Kulkarni, Sharon Lynn Chu

2602.24036 2026-02-27
AI LLM

Breaking the Illusion of Artificial Consensus: Clone-Robust Weighting for Arbitrary Metric Spaces

Independent media are central to democratic decision-making, yet recent technological developments, such as social media, pseudonymous identities, and generative AI, have made them more vulnerable ...

Damien Berriaud, Roger Wattenhofer

2602.24024 2026-02-27
AI LLM

Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection

Video anomaly detection (VAD) aims to identify abnormal events in videos. Traditional VAD methods generally suffer from the high costs of labeled data and full training, thus some recent works have...

Zhaolin Cai, Fan Li, Huiyu Duan, Lijun He, Guangtao Zhai

2602.24021 2026-02-27
AI LLM

Interpretable Debiasing of Vision-Language Models for Social Fairness

The rapid advancement of Vision-Language models (VLMs) has raised growing concerns that their black-box reasoning processes could lead to unintended forms of social bias. Current debiasing approach...

Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim

2602.24014 2026-02-27
AI LLM

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, an...

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu

2602.24009 2026-02-27
AI LLM

Ask don't tell: Reducing sycophancy in large language models

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and ...

Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau

2602.23971 2026-02-27
AI LLM

HotelQuEST: Balancing Quality and Efficiency in Agentic Search

Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs). However, existing benchmarks primarily focus on quality, overlooking effic...

Guy Hadad, Shadi Iskander, Oren Kalinsky, Sofia Tolmach, Ran Levy, Haggai Roitman

2602.23949 2026-02-27
AI LLM

High-Modularity Graph Partitioning Through NLP Techniques and Maximal Clique Enumeration

Natural Language Processing (NLP) provides highly effective tools for interpreting and handling human language, offering a broad spectrum of applications. In this paper, we address a classic combin...

Marco D'Elia, Irene Finocchi, Maurizio Patrignani

2602.23948 2026-02-27
AI LLM

The Astonishing Ability of Large Language Models to Parse Jabberwockified Language

We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsens...

Gary Lupyan, Senyi Yang

2602.23928 2026-02-27
AI LLM

The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments

Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured anima...

Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente

2602.23920 2026-02-27
AI LLM

Personal Data as a Human Right: A New Social Contract Based on Data Sovereignty, Human Dignity and Data Personalism

In an era of ubiquitous data collection, platform dominance, and AI-mediated governance, the social contract of digital life is increasingly shaped by a few private actors rather than democratic de...

J. M. Alvarez-Pallete, R. Calderón, M. T. Corzo, E. C. Garrido-Merchán, G. López, I. Navarro-Mend...

2602.23918 2026-02-27
AI LLM

Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding

AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vibe coders with less development experi...

Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash

2602.23905 2026-02-27
AI LLM

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain ...

Qihua Dong, Kuo Yang, Lin Ju, Handong Zhao, Yitian Zhang, Yizhou Wang, Huimin Zeng, Jianglin Lu, ...

2602.23898 2026-02-27
AI LLM

AoE: Always-on Egocentric Human Video Collection for Embodied AI

Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure cost...

Bowen Yang, Zishuo Li, Yang Sun, Changtao Miao, Yifan Yang, Man Luo, Xiaotong Yan, Feng Jiang, Ji...

2602.23893 2026-02-27