Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Ask don't tell: Reducing sycophancy in large language models

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and ...

Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau

2602.23971 2026-02-27
TESTING

RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

Generative Retrieval (GR) has emerged as a powerful paradigm in e-commerce search, retrieving items via autoregressive decoding of Semantic IDs (SIDs). However, aligning GR with complex user prefer...

Zhiguo Chen, Guohao Sun, Yiming Qiu, Xingzhi Yao, Mingming Li, Huimu Wang, Yangqi Zhang, Songlin ...

2602.23964 2026-02-27
TESTING

SHINE: Sequential Hierarchical Integration Network for EEG and MEG

How natural speech is represented in the brain constitutes a major challenge for cognitive neuroscience, with cortical envelope-following responses playing a central role in speech decoding. This p...

Xiran Xu, Yujie Yan, Xihong Wu, Jing Chen

2602.23960 2026-02-27
TESTING

The Vocabulary of Flaky Tests in the Context of SAP HANA

Background. Automated test execution is an important activity to gather information about the quality of a software project. So-called flaky tests, however, negatively affect this process. Such tes...

Alexander Berndt, Zoltán Nochta, Thomas Bach

2602.23957 2026-02-27
AI LLM

HotelQuEST: Balancing Quality and Efficiency in Agentic Search

Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs). However, existing benchmarks primarily focus on quality, overlooking effic...

Guy Hadad, Shadi Iskander, Oren Kalinsky, Sofia Tolmach, Ran Levy, Haggai Roitman

2602.23949 2026-02-27
AI LLM

High-Modularity Graph Partitioning Through NLP Techniques and Maximal Clique Enumeration

Natural Language Processing (NLP) provides highly effective tools for interpreting and handling human language, offering a broad spectrum of applications. In this paper, we address a classic combin...

Marco D'Elia, Irene Finocchi, Maurizio Patrignani

2602.23948 2026-02-27
TESTING

Hierarchical Concept-based Interpretable Models

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEM...

Oscar Hill, Mateo Espinosa Zarlenga, Mateja Jamnik

2602.23947 2026-02-27
TESTING

EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates

This paper introduces a dataset of enriched geographic coordinates retrieved from Diderot and d'Alembert's eighteenth-century Encyclopedie. Automatically recovering geographic coordinates from hist...

Ludovic Moncla, Pierre Nugues, Thierry Joliveau, Katherine McDonough

2602.23941 2026-02-27
TESTING

Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language

Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, rem...

Nischal Karki, Bipesh Subedi, Prakash Poudyal, Rupak Raj Ghimire, Bal Krishna Bal

2602.23940 2026-02-27
AI LLM

The Astonishing Ability of Large Language Models to Parse Jabberwockified Language

We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsens...

Gary Lupyan, Senyi Yang

2602.23928 2026-02-27
TESTING

Mixed Choice in Asynchronous Multiparty Session Types

We present a multiparty session type (MST) framework with asynchronous mixed choice (MC). We propose a core construct for MC that allows transient inconsistencies in protocol state between distribu...

Laura Bocchi, Raymond Hu, Adriana Laura Voinea, Simon Thompson

2602.23927 2026-02-27
TESTING

Teleoperated Omni-directional Dual Arm Mobile Manipulation Robotic System with Shared Control for Retail Store

The swiftly expanding retail sector is increasingly adopting autonomous mobile robots empowered by artificial intelligence and machine learning algorithms to gain an edge in the competitive market....

Rolif Lima, Somdeb Saha, Nijil George, Vismay Vakharia, Shubham Parab, Sahil Gaonkar, Vighnesh Va...

2602.23923 2026-02-27
TESTING

Invariant-Driven Automated Testing

Microservice architectures are an emergent technology that builds business logic into a suite of small services. Each microservice runs in its process and the communication is made through lightwei...

Ana Catarina Ribeiro

2602.23922 2026-02-27
AI LLM

The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments

Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured anima...

Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente

2602.23920 2026-02-27
AI LLM

Personal Data as a Human Right: A New Social Contract Based on Data Sovereignty, Human Dignity and Data Personalism

In an era of ubiquitous data collection, platform dominance, and AI-mediated governance, the social contract of digital life is increasingly shaped by a few private actors rather than democratic de...

J. M. Alvarez-Pallete, R. Calderón, M. T. Corzo, E. C. Garrido-Merchán, G. López, I. Navarro-Mend...

2602.23918 2026-02-27
TESTING

Impact of non-standard neutrino-electron interactions on Big Bang Nucleosynthesis

Neutrino non-standard interactions (NSI) with electrons, predicted in many extended theoretical models of particle physics, are known to alter the picture of neutrino decoupling from the cosmic pla...

Stefano Gariazzo, Jaume Moncho, Sergio Pastor, Ofelia Pisanti

2602.23915 2026-02-27
TESTING

Online Bootstrap Inference for the Trend of Nonstationary Time Series

This article proposes an online bootstrap scheme for nonparametric level estimation in nonstationary time series. Our approach applies to a broad class of level estimators expressible as weighted s...

Thomas Nagler, Tobias Brock, Nicolai Palm

2602.23911 2026-02-27
TESTING

Automated selection of r for stationary and nonstationary models for r largest order statistics

In generalized extreme value model for the r largest order statistics, denoted by rGEV, the selection of r is critical. The existing entropy difference test for selecting r is applicable to large s...

Yire Shin, Jihong Park, Jeong-Soo Park

2602.23909 2026-02-27
AI LLM

Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding

AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vibe coders with less development experi...

Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash

2602.23905 2026-02-27
AI LLM

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain ...

Qihua Dong, Kuo Yang, Lin Ju, Handong Zhao, Yitian Zhang, Yizhou Wang, Huimin Zeng, Jianglin Lu, ...

2602.23898 2026-02-27