Personal Assistant Web

AI LLM

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware ...

Hongliu Cao, Ilias Driouich, Eoin Thomas

2603.03116 • 2026-03-03

View PDF

AI LLM

Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems

Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later t...

Raad Khraishi, Iman Zafar, Katie Myles, Greig A Cowan

2603.03111 • 2026-03-03

View PDF

TESTING

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the sh...

Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu

2603.03108 • 2026-03-03

View PDF

TESTING

Area minimising hypersurfaces mod $p$ do not admit immersed branch points

We show that area minimising hypersurfaces mod $p$ do not admit immersed branch points, namely branch points about which all classical singularities are immersed. Furthermore, we show that if an $n...

Paul Minter, Sidney Stanbury

2603.03100 • 2026-03-03

View PDF

AI LLM

Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

Argumentative component detection (ACD) is a core subtask of Argument(ation) Mining (AM) and one of its most challenging aspects, as it requires jointly delimiting argumentative spans and classifyi...

Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata

2603.03095 • 2026-03-03

View PDF

TESTING

Safe and Robust Domains of Attraction for Discrete-Time Systems: A Set-Based Characterization and Certifiable Neural Network Estimation

Analyzing nonlinear systems with attracting robust invariant sets (RISs) requires estimating their domains of attraction (DOAs). Despite extensive research, accurately characterizing DOAs for gener...

Mohamed Serry, Maxwell Fitzsimmons, Jun Liu

2603.03082 • 2026-03-03

View PDF

AI LLM

TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models

Large language models (LLMs) have achieved remarkable success across diverse applications but remain vulnerable to jailbreak attacks, where attackers craft prompts that bypass safety alignment and ...

Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu

2603.03081 • 2026-03-03

View PDF

AI LLM

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such pref...

Chengkai Wang, Baisong Liu

2603.03080 • 2026-03-03

View PDF

TESTING

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

We evaluate the forecasting performance of a deep learning model, originally introduced as a pattern-extraction framework, that operates on the spatiotemporal evolution of seismic b-values in a sho...

Jonas Köhler, Wei Li, Johannes Faber, Georg Rümpker, Nishtha Srivastava

2603.03079 • 2026-03-03

View PDF

AI LLM

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, too...

Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang

2603.03078 • 2026-03-03

View PDF

AI LLM

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference

Accurate sea ice mapping is essential for safe maritime navigation in polar regions, where rapidly changing ice conditions require timely and reliable information. While Sentinel-1 Synthetic Apertu...

Mhd Rashed Al Koutayni, Mohamed Selim, Gerd Reis, Alain Pagani, Didier Stricker

2603.03075 • 2026-03-03

View PDF

AI LLM

Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice

Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) become...

Xiaohan Peng, Wendy E. Mackay, Janin Koch

2603.03074 • 2026-03-03

View PDF

TESTING

Context Adaptive Extended Chain Coding for Semantic Map Compression

Semantic maps are increasingly utilized in areas such as robotics, autonomous systems, and extended reality, motivating the investigation of efficient compression methods that preserve structured s...

Runyu Yang, Junqi Liao, Hyomin Choi, Fabien Racapé, Ivan V. Bajić

2603.03073 • 2026-03-03

View PDF

AI LLM

TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Large language models (LLMs) are increasingly used to assist scientists across diverse workflows. A key challenge is generating high-quality figures from textual descriptions, often represented as ...

Christian Greisinger, Steffen Eger

2603.03072 • 2026-03-03

View PDF

AI LLM

EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely ...

Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui

2603.03066 • 2026-03-03

View PDF

TESTING

V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots

Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of h...

Zipeng Qiu, Wenjie Qu, Jiaheng Zhang, Binhang Yuan

2603.03065 • 2026-03-03

View PDF

TESTING

Radius-Flow Entanglement in Hadron States and Gravitational Form Factors

We propose a lattice-ready entanglement observable for QCD hadrons: the vacuum-subtracted radius flow of the ball Rényi entropy, $\mathfrak{s}_n(R;h)\equiv R\,\partial_RΔS_n(B_R;h)$, defined via th...

Kiminad A. Mamo

2603.03064 • 2026-03-03

View PDF

AI LLM

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent win...

Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao

2603.03060 • 2026-03-03

View PDF

AI LLM

An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education

Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. I...

Alex Binh Vinh Duc Nguyen

2603.03059 • 2026-03-03

View PDF

AI LLM

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from docto...

Sudip Bhujel

2603.03054 • 2026-03-03

View PDF

Papers