Papers
Research papers from arXiv and related sources
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware ...
Hongliu Cao, Ilias Driouich, Eoin Thomas
Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems
Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later t...
Raad Khraishi, Iman Zafar, Katie Myles, Greig A Cowan
RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy
Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the sh...
Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu
Area minimising hypersurfaces mod $p$ do not admit immersed branch points
We show that area minimising hypersurfaces mod $p$ do not admit immersed branch points, namely branch points about which all classical singularities are immersed. Furthermore, we show that if an $n...
Paul Minter, Sidney Stanbury
Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection
Argumentative component detection (ACD) is a core subtask of Argument(ation) Mining (AM) and one of its most challenging aspects, as it requires jointly delimiting argumentative spans and classifyi...
Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata
Safe and Robust Domains of Attraction for Discrete-Time Systems: A Set-Based Characterization and Certifiable Neural Network Estimation
Analyzing nonlinear systems with attracting robust invariant sets (RISs) requires estimating their domains of attraction (DOAs). Despite extensive research, accurately characterizing DOAs for gener...
Mohamed Serry, Maxwell Fitzsimmons, Jun Liu
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Large language models (LLMs) have achieved remarkable success across diverse applications but remain vulnerable to jailbreak attacks, where attackers craft prompts that bypass safety alignment and ...
Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu
Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation
LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such pref...
Chengkai Wang, Baisong Liu
Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast
We evaluate the forecasting performance of a deep learning model, originally introduced as a pattern-extraction framework, that operates on the spatiotemporal evolution of seismic b-values in a sho...
Jonas Köhler, Wei Li, Johannes Faber, Georg Rümpker, Nishtha Srivastava
RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, too...
Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang
TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference
Accurate sea ice mapping is essential for safe maritime navigation in polar regions, where rapidly changing ice conditions require timely and reliable information. While Sentinel-1 Synthetic Apertu...
Mhd Rashed Al Koutayni, Mohamed Selim, Gerd Reis, Alain Pagani, Didier Stricker
Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice
Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) become...
Xiaohan Peng, Wendy E. Mackay, Janin Koch
Context Adaptive Extended Chain Coding for Semantic Map Compression
Semantic maps are increasingly utilized in areas such as robotics, autonomous systems, and extended reality, motivating the investigation of efficient compression methods that preserve structured s...
Runyu Yang, Junqi Liao, Hyomin Choi, Fabien Racapé, Ivan V. Bajić
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
Large language models (LLMs) are increasingly used to assist scientists across diverse workflows. A key challenge is generating high-quality figures from textual descriptions, often represented as ...
Christian Greisinger, Steffen Eger
EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely ...
Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots
Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of h...
Zipeng Qiu, Wenjie Qu, Jiaheng Zhang, Binhang Yuan
Radius-Flow Entanglement in Hadron States and Gravitational Form Factors
We propose a lattice-ready entanglement observable for QCD hadrons: the vacuum-subtracted radius flow of the ball Rényi entropy, $\mathfrak{s}_n(R;h)\equiv R\,\partial_RΔS_n(B_R;h)$, defined via th...
Kiminad A. Mamo
DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming
We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent win...
Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao
An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education
Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. I...
Alex Binh Vinh Duc Nguyen
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from docto...
Sudip Bhujel