Papers
Research papers from arXiv and related sources
LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations
Large language models (LLMs) are increasingly deployed as autonomous agents, yet evaluations focus primarily on task success rather than cultural appropriateness or evaluator reliability. We introd...
Viet-Thanh Pham, Lizhen Qu, Thuy-Trang Vu, Gholamreza Haffari, Dinh Phung
PreSight: Preoperative Outcome Prediction for Parkinson's Disease via Region-Prior Morphometry and Patient-Specific Weighting
Preoperative improvement rate prediction for Parkinson's disease surgery is clinically important yet difficult because imaging signals are subtle and patients are heterogeneous. We address this set...
Yand Wang, Chen Zhang, Lanyun Zhu, Yixin Chen, Qunbo Wang, Yutong Bai, Jurgen Germann, Yinghong W...
When Numbers Tell Half the Story: Human-Metric Alignment in Topic Model Evaluation
Topic models uncover latent thematic structures in text corpora, yet evaluating their quality remains challenging, particularly in specialized domains. Existing methods often rely on automated metr...
Thibault Prouteau, Francis Lareau, Nicolas Dugué, Jean-Charles Lamirel, Christophe Malaterre
A Simulation Study to Compare Inferential Properties when Modelling Ordinal Outcomes: The Case for the (Plain but Robust) Proportional Odds Model
Ordinal measurements are common outcomes in studies within psychology, as well as in the social and behavioral sciences. Choosing an appropriate regression model for analysing such data poses a dif...
Stefan Inerle, Markus Pauly, Moritz Berger
Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots
Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platfo...
Huw Day, Adrianna Jezierska, Jessica Woodgate
CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification
Developing multi-turn interactive tool-use agents is challenging because real-world user needs are often complex and ambiguous, yet agents must execute deterministic actions to satisfy them. To add...
Jinpeng Chen, Cheng Gong, Hanbo Li, Ziru Liu, Zichen Tian, Xinyu Fu, Shi Wu, Chenyang Zhang, Wu Z...
Dream2Learn: Structured Generative Dreaming for Continual Learning
Continual learning requires balancing plasticity and stability while mitigating catastrophic forgetting. Inspired by human dreaming as a mechanism for internal simulation and knowledge restructurin...
Salvatore Calcagno, Matteo Pennisi, Federica Proietto Salanitri, Amelia Sorrenti, Simone Palazzo,...
Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferat...
Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang
Fast Entropy Decoding for Sparse MVM on GPUs
We present a novel, practical approach to speed up sparse matrix-vector multiplication (SpMVM) on GPUs. The novel key idea is to apply lossless entropy coding to further compress the sparse matrix ...
Emil Schätzle, Tommaso Pegolotti, Markus Püschel
AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth
Test-time scaling via recurrent/iterative Transformers enables large language models to spend more computation at inference, but most pretrained recurrent LMs run a fixed number of iterations, wast...
Shixiang Song, He Li, Zitong Wang, Boyi Zeng, Feichen Song, Yixuan Wang, Zhiqin John Xu, Ziwei He...
Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration
Interactive articles help readers engage with complex ideas through exploration, yet creating them remains costly, requiring both domain expertise and web development skills. Recent LLM-based agent...
Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Wei Chen
FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures
This system paper describes our participation in the SemEval-2025 Task-7 ``Everyday Knowledge Across Diverse Languages and Cultures''. We attended two subtasks, i.e., Track 1: Short Answer Question...
Liliia Bogdanova, Shiran Sun, Lifeng Han, Natalia Amat Lefort, Flor Miriam Plaza-del-Arco
Exploring $\widetilde{R}_2$ Leptoquarks and Majorana Neutrinos via same-sign dimuons at the HL-LHC
We study the phenomenology of scalar leptoquark (sLQ) $\widetilde{R}_2$ coupled to right-handed neutrinos (RHNs) at the High-Luminosity Large Hadron Collider (HL-LHC), focusing on signatures that d...
Subham Saha, Arvind Bhaskar, Manimala Mitra
Dynamic Connectivity and Local Frequency Strength under Stochastic Variations
This paper introduces a novel metric, termed the Generalized Fiedler Vector (GFV), to evaluate the \textit{dynamic connectivity} in power systems. The proposed metric leverages the network connecti...
Bruno Pinheiro, Daniel Dotta
B-fields And dust in interstelLar fiLAments using Dust POLarization (BALLAD-POL): VI. Grain alignment mechanisms in the massive quiescent filament G16.96+0.27 using dust polarization observations from JCMT/POL-2
Dust polarization induced by aligned non-spherical grains acts as an important tool to trace the magnetic field (B-field) morphologies and strengths in molecular clouds and constrain grain properti...
Saikhom Pravash, Thiem Hoang, Archana Soam, Qi-Lao Gu, Tie Liu, Pham Ngoc Diep, Le Ngoc Tram, Ngu...
Agentic Code Reasoning
Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a s...
Shubham Ugare, Satish Chandra
VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications
We introduce VietSuperSpeech, a large-scale Vietnamese automatic speech recognition (ASR) dataset of 52,023 audio-text pairs totaling 267.39 hours, with a distinctive focus on casual conversational...
Loan Do, Thanh Ngoc Nguyen, Thanh Pham, Vinh Do, Hien Nguyen, Charlotte Nguyen
Generative Visual Chain-of-Thought for Image Editing
Existing image editing methods struggle to perceive where to edit, especially under complex scenes and nuanced spatial instructions. To address this issue, we propose Generative Visual Chain-of-Tho...
Zijin Yin, Tiankai Hang, Yiji Cheng, Shiyi Zhang, Runze He, Yu Xu, Chunyu Wang, Bing Li, Zheng Ch...
Asymptotic Analysis of Shallow Water Moment Equations
The Shallow Water Moment Equations (SWME) are an extension of the Shallow Water Equations (SWE) for improved modelling of free-surface flows. In contrast to the SWE, the SWME incorporate vertical v...
Mieke Daemen, Julio Careaga, Zhenning Cai, Julian Koellermeier
Growth factor in teleparallel Gauss-Bonnet gravity
Teleparallel gravity offers a competing geometric framework on which to build cosmological models. The Gauss-Bonnet invariant captures key aspects of the underlying geometry that has been shown to ...
Shivam Kumar Mishra, Jackson Levi Said, B. Mishra