Papers
Research papers from arXiv and related sources
Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
Weight-space merging aims to combine multiple fine-tuned LLMs into a single model without retraining, yet most existing approaches remain fundamentally parameter-space heuristics. This creates thre...
Jiayu Wang, Zuojun Ye, Wenpeng Yin
MPCEval: A Benchmark for Multi-Party Conversation Generation
Multi-party conversation generation, such as smart reply and collaborative assistants, is an increasingly important capability of generative AI, yet its evaluation remains a critical bottleneck. Co...
Minxing Zhang, Yi Yang, Zhuofan Jia, Xuan Yang, Jian Pei, Yuchen Zang, Xingwang Deng, Xianglong Chen
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Preference alignment is an essential step in adapting large language models (LLMs) to human values, but existing approaches typically depend on costly human annotations or large-scale API-based mod...
Amirabbas Afzali, Myeongho Jeon, Maria Brbic
WaterSIC: information-theoretically (near) optimal linear layer quantization
This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT)....
Egor Lifar, Semyon Savkin, Or Ordentlich, Yury Polyanskiy
Retrieval-Augmented Generation with Covariate Time Series
While RAG has greatly enhanced LLMs, extending this paradigm to Time-Series Foundation Models (TSFMs) remains a challenge. This is exemplified in the Predictive Maintenance of the Pressure Regulati...
Kenny Ye Liang, Zhongyi Pei, Huan Zhang, Yuhui Liu, Shaoxu Song, Jianmin Wang
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space
Scaling inference-time compute for Large Language Models (LLMs) has unlocked unprecedented reasoning capabilities. However, existing inference-time scaling methods typically rely on inefficient and...
Peihao Wang, Ruisi Cai, Zhen Wang, Hongyuan Mei, Qiang Liu, Pan Li, Zhangyang Wang
LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services
In local-life service platforms, the query suggestion module plays a crucial role in enhancing user experience by generating candidate queries based on user input prefixes, thus reducing user effor...
Jinwen Chen, Shuai Gong, Shiwen Zhang, Zheng Zhang, Yachao Zhao, Lingxiang Wang, Haibo Zhou, Yuan...
Detecting RAG Advertisements Across Advertising Styles
Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of...
Sebastian Heineking, Wilhelm Pertsch, Ines Zelch, Janek Bevendorff, Benno Stein, Matthias Hagen, ...
AILS-NTUA at SemEval-2026 Task 10: Agentic LLMs for Psycholinguistic Marker Extraction and Conspiracy Endorsement Detection
This paper presents a novel agentic LLM pipeline for SemEval-2026 Task 10 that jointly extracts psycholinguistic conspiracy markers and detects conspiracy endorsement. Unlike traditional classifier...
Panagiotis Alexios Spanakis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorg...
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning
Proximal constraints are fundamental to the stability of the Large Language Model reinforcement learning. While the canonical clipping mechanism in PPO serves as an efficient surrogate for trust re...
Yuan Li, Bo Wang, Yufei Gao, Yuqian Yao, Xinyuan Wang, Zhangyue Yin, Xipeng Qiu
Roomify: Spatially-Grounded Style Transformation for Immersive Virtual Environments
We present Roomify, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semant...
Xueyang Wang, Qinxuan Cen, Weitao Bi, Yunxiang Ma, Xin Yi, Robert Xiao, Xinyi Fu, Hewu Li
EVMbench: Evaluating AI Agents on Smart Contract Security
Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, a...
Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Y...
Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems
In perpetrator treatment, a recurring observation is the dissociation between insight and action: offenders articulate remorse yet behavioral change does not follow. We report four preregistered st...
Hiroki Fukui
AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
Agentic systems are increasingly acting on users' behalf, accessing calendars, email, and personal files to complete everyday tasks. Privacy evaluation for these systems has focused on the input an...
Ivoline C. Ngong, Keerthiram Murugesan, Swanand Kadhe, Justin D. Weisz, Amit Dhurandhar, Karthike...
EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection
LLM-based agents depend on effective tool-use policies to solve complex tasks, yet optimizing these policies remains challenging due to delayed supervision and the difficulty of credit assignment i...
Shuo Yang, Soyeon Caren Han, Xueqi Ma, Yan Li, Mohammad Reza Ghasemi Madani, Eduard Hovy
U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning
This demonstration presents U-Parking, a distributed Ultra-Wideband (UWB)-assisted autonomous parking system. By integrating Large Language Models (LLMs)-assisted planning with robust fusion locali...
Yiang Wu, Qiong Wu, Pingyi Fan, Kezhi Wang, Wen Chen, Guoqiang Mao, Khaled B. Letaief
Can LLMs Capture Expert Uncertainty? A Comparative Analysis of Value Alignment in Ethnographic Qualitative Research
Qualitative analysis of open-ended interviews plays a central role in ethnographic and economic research by uncovering individuals' values, motivations, and culturally embedded financial behaviors....
Arina Kostina, Marios Dikaiakos, Alejandro Porcel, Tassos Stassopoulos
SEA-TS: Self-Evolving Agent for Autonomous Code Generation of Time Series Forecasting Algorithms
Accurate time series forecasting underpins decision-making across domains, yet conventional ML development suffers from data scarcity in new deployments, poor adaptability under distribution shift,...
Longkun Xu, Xiaochun Zhang, Qiantu Tuo, Rui Li
Diffusion-Based sRGB Real Noise Generation via Prompt-Driven Noise Representation Learning
Denoising in the sRGB image space is challenging due to noise variability. Although end-to-end methods perform well, their effectiveness in real-world scenarios is limited by the scarcity of real n...
Jaekyun Ko, Dongjin Kim, Soomin Lee, Guanghui Wang, Tae Hyun Kim
K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation
Generating realistic and diverse trajectories is a critical challenge in autonomous driving simulation. While Large Language Models (LLMs) show promise, existing methods often rely on structured da...
Mingxuan Mu, Guo Yang, Lei Chen, Ping Wu, Jianxun Cui