Papers
Research papers from arXiv and related sources
Ask don't tell: Reducing sycophancy in large language models
Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and ...
Magda Dubois, Cozmin Ududec, Christopher Summerfield, Lennart Luettgau
RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce
Generative Retrieval (GR) has emerged as a powerful paradigm in e-commerce search, retrieving items via autoregressive decoding of Semantic IDs (SIDs). However, aligning GR with complex user prefer...
Zhiguo Chen, Guohao Sun, Yiming Qiu, Xingzhi Yao, Mingming Li, Huimu Wang, Yangqi Zhang, Songlin ...
SHINE: Sequential Hierarchical Integration Network for EEG and MEG
How natural speech is represented in the brain constitutes a major challenge for cognitive neuroscience, with cortical envelope-following responses playing a central role in speech decoding. This p...
Xiran Xu, Yujie Yan, Xihong Wu, Jing Chen
The Vocabulary of Flaky Tests in the Context of SAP HANA
Background. Automated test execution is an important activity to gather information about the quality of a software project. So-called flaky tests, however, negatively affect this process. Such tes...
Alexander Berndt, Zoltán Nochta, Thomas Bach
HotelQuEST: Balancing Quality and Efficiency in Agentic Search
Agentic search has emerged as a promising paradigm for adaptive retrieval systems powered by large language models (LLMs). However, existing benchmarks primarily focus on quality, overlooking effic...
Guy Hadad, Shadi Iskander, Oren Kalinsky, Sofia Tolmach, Ran Levy, Haggai Roitman
High-Modularity Graph Partitioning Through NLP Techniques and Maximal Clique Enumeration
Natural Language Processing (NLP) provides highly effective tools for interpreting and handling human language, offering a broad spectrum of applications. In this paper, we address a classic combin...
Marco D'Elia, Irene Finocchi, Maurizio Patrignani
Hierarchical Concept-based Interpretable Models
Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEM...
Oscar Hill, Mateo Espinosa Zarlenga, Mateja Jamnik
EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates
This paper introduces a dataset of enriched geographic coordinates retrieved from Diderot and d'Alembert's eighteenth-century Encyclopedie. Automatically recovering geographic coordinates from hist...
Ludovic Moncla, Pierre Nugues, Thierry Joliveau, Katherine McDonough
Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language
Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, rem...
Nischal Karki, Bipesh Subedi, Prakash Poudyal, Rupak Raj Ghimire, Bal Krishna Bal
The Astonishing Ability of Large Language Models to Parse Jabberwockified Language
We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsens...
Gary Lupyan, Senyi Yang
Mixed Choice in Asynchronous Multiparty Session Types
We present a multiparty session type (MST) framework with asynchronous mixed choice (MC). We propose a core construct for MC that allows transient inconsistencies in protocol state between distribu...
Laura Bocchi, Raymond Hu, Adriana Laura Voinea, Simon Thompson
Teleoperated Omni-directional Dual Arm Mobile Manipulation Robotic System with Shared Control for Retail Store
The swiftly expanding retail sector is increasingly adopting autonomous mobile robots empowered by artificial intelligence and machine learning algorithms to gain an edge in the competitive market....
Rolif Lima, Somdeb Saha, Nijil George, Vismay Vakharia, Shubham Parab, Sahil Gaonkar, Vighnesh Va...
Invariant-Driven Automated Testing
Microservice architectures are an emergent technology that builds business logic into a suite of small services. Each microservice runs in its process and the communication is made through lightwei...
Ana Catarina Ribeiro
The Moment of Capture: How the First Seconds of a Speaker's Nonverbal and Verbal Performance Shapes Audience Judgments
Why do some speakers capture a room almost instantly while others fail to connect? The real-time architecture of audience engagement remains largely a black box. Here, we used motion-captured anima...
Ralf Schmälzle, Yuetong Du, Sue Lim, Gary Bente
Personal Data as a Human Right: A New Social Contract Based on Data Sovereignty, Human Dignity and Data Personalism
In an era of ubiquitous data collection, platform dominance, and AI-mediated governance, the social contract of digital life is increasingly shaped by a few private actors rather than democratic de...
J. M. Alvarez-Pallete, R. Calderón, M. T. Corzo, E. C. Garrido-Merchán, G. López, I. Navarro-Mend...
Impact of non-standard neutrino-electron interactions on Big Bang Nucleosynthesis
Neutrino non-standard interactions (NSI) with electrons, predicted in many extended theoretical models of particle physics, are known to alter the picture of neutrino decoupling from the cosmic pla...
Stefano Gariazzo, Jaume Moncho, Sergio Pastor, Ofelia Pisanti
Online Bootstrap Inference for the Trend of Nonstationary Time Series
This article proposes an online bootstrap scheme for nonparametric level estimation in nonstationary time series. Our approach applies to a broad class of level estimators expressible as weighted s...
Thomas Nagler, Tobias Brock, Nicolai Palm
Automated selection of r for stationary and nonstationary models for r largest order statistics
In generalized extreme value model for the r largest order statistics, denoted by rGEV, the selection of r is critical. The existing entropy difference test for selecting r is applicable to large s...
Yire Shin, Jihong Park, Jeong-Soo Park
Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding
AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vibe coders with less development experi...
Syed Ammar Asdaque, Imran Haider, Muhammad Umar Malik, Maryam Abdul Ghafoor, Abdul Ali Bangash
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
Referring Expression Comprehension (REC) links language to region level visual perception. Standard benchmarks (RefCOCO, RefCOCO+, RefCOCOg) have progressed rapidly with multimodal LLMs but remain ...
Qihua Dong, Kuo Yang, Lin Ju, Handong Zhao, Yitian Zhang, Yizhou Wang, Huimin Zeng, Jianglin Lu, ...