Personal Assistant Web

AI LLM

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

The pursuit of human-like conversational agents has long been guided by the Turing test. For modern speech-to-speech (S2S) systems, a critical yet unanswered question is whether they can converse l...

Xiang Li, Jiabao Gao, Sipei Lin, Xuan Zhou, Chi Zhang, Bo Cheng, Jiale Han, Benyou Wang

2602.24080 • 2026-02-27

View PDF

AI LLM

A Novel Hierarchical Multi-Agent System for Payments Using LLMs

Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Existing agentic solutions have gained significant...

Joon Kiat Chua, Donghao Huang, Zhaoxia Wang

2602.24068 • 2026-02-27

View PDF

AI LLM

Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis

Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a compr...

Donghao Huang, Zhaoxia Wang

2602.24060 • 2026-02-27

View PDF

AI LLM

CIRCLE: A Framework for Evaluating AI from a Real-World Lens

This paper proposes CIRCLE, a six-stage, lifecycle-based framework to bridge the reality gap between model-centric performance metrics and AI's materialized outcomes in deployment. While existing f...

Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, F...

2602.24055 • 2026-02-27

View PDF

AI LLM

Data Driven Optimization of GPU efficiency for Distributed LLM Adapter Serving

Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems where hundreds of adapters must be h...

Ferran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, ...

2602.24044 • 2026-02-27

View PDF

AI LLM

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward m...

Daniel Yang, Samuel Stante, Florian Redhardt, Lena Libon, Parnian Kassraie, Ido Hakimi, Barna Pás...

2602.24040 • 2026-02-27

View PDF

AI LLM

Designing AI Tutors for Interest-Based Learning: Insights from Human Instructors

Interest-based learning (IBL) is a paradigm of instruction in which educational content is contextualized using learners' interests to enhance content relevance. IBL has been shown to result in imp...

Abhishek Kulkarni, Sharon Lynn Chu

2602.24036 • 2026-02-27

View PDF

AI LLM

Breaking the Illusion of Artificial Consensus: Clone-Robust Weighting for Arbitrary Metric Spaces

Independent media are central to democratic decision-making, yet recent technological developments, such as social media, pseudonymous identities, and generative AI, have made them more vulnerable ...

Damien Berriaud, Roger Wattenhofer

2602.24024 • 2026-02-27

View PDF

AI LLM

Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection

Video anomaly detection (VAD) aims to identify abnormal events in videos. Traditional VAD methods generally suffer from the high costs of labeled data and full training, thus some recent works have...

Zhaolin Cai, Fan Li, Huiyu Duan, Lijun He, Guangtao Zhai

2602.24021 • 2026-02-27

View PDF

AI LLM

Interpretable Debiasing of Vision-Language Models for Social Fairness

The rapid advancement of Vision-Language models (VLMs) has raised growing concerns that their black-box reasoning processes could lead to unintended forms of social bias. Current debiasing approach...

Na Min An, Yoonna Jang, Yusuke Hirota, Ryo Hachiuma, Isabelle Augenstein, Hyunjung Shim

2602.24014 • 2026-02-27

View PDF

AI LLM

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Jailbreak techniques for large language models (LLMs) evolve faster than benchmarks, making robustness estimates stale and difficult to compare across papers due to drift in datasets, harnesses, an...

Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu

2602.24009 • 2026-02-27

View PDF

AI LLM

Ask don't tell: Reducing sycophancy in large language models

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and ...

Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau

2602.23971 • 2026-02-27

View PDF

AI LLM

HotelQuEST: Balancing Quality and Efficiency in Agentic Search

Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs). However, existing benchmarks primarily focus on quality, overlooking effic...

Guy Hadad, Shadi Iskander, Oren Kalinsky, Sofia Tolmach, Ran Levy, Haggai Roitman

2602.23949 • 2026-02-27

View PDF

AI LLM

High-Modularity Graph Partitioning Through NLP Techniques and Maximal Clique Enumeration

Natural Language Processing (NLP) provides highly effective tools for interpreting and handling human language, offering a broad spectrum of applications. In this paper, we address a classic combin...

Marco D'Elia, Irene Finocchi, Maurizio Patrignani

2602.23948 • 2026-02-27

View PDF

AI LLM

The Astonishing Ability of Large Language Models to Parse Jabberwockified Language

We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsens...

Gary Lupyan, Senyi Yang

2602.23928 • 2026-02-27

View PDF

AI LLM

The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments

Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured anima...

Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente

2602.23920 • 2026-02-27

View PDF

AI LLM

Personal Data as a Human Right: A New Social Contract Based on Data Sovereignty, Human Dignity and Data Personalism

In an era of ubiquitous data collection, platform dominance, and AI-mediated governance, the social contract of digital life is increasingly shaped by a few private actors rather than democratic de...

J. M. Alvarez-Pallete, R. Calderón, M. T. Corzo, E. C. Garrido-Merchán, G. López, I. Navarro-Mend...

2602.23918 • 2026-02-27

View PDF

AI LLM

Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding

AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vibe coders with less development experi...

Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash

2602.23905 • 2026-02-27

View PDF

AI LLM

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain ...

Qihua Dong, Kuo Yang, Lin Ju, Handong Zhao, Yitian Zhang, Yizhou Wang, Huimin Zeng, Jianglin Lu, ...

2602.23898 • 2026-02-27

View PDF

AI LLM

AoE: Always-on Egocentric Human Video Collection for Embodied AI

Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure cost...

Bowen Yang, Zishuo Li, Yang Sun, Changtao Miao, Yifan Yang, Man Luo, Xiaotong Yan, Feng Jiang, Ji...

2602.23893 • 2026-02-27

View PDF