Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware ...

Hongliu Cao, Ilias Driouich, Eoin Thomas

2603.03116 2026-03-03
AI LLM

Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems

Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later t...

Raad Khraishi, Iman Zafar, Katie Myles, Greig A Cowan

2603.03111 2026-03-03
TESTING

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

Secure aggregation is a foundational building block of privacy-preserving learning, yet achieving robustness under adversarial behavior remains challenging. Modern systems increasingly adopt the sh...

Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu

2603.03108 2026-03-03
TESTING

Area minimising hypersurfaces mod $p$ do not admit immersed branch points

We show that area minimising hypersurfaces mod $p$ do not admit immersed branch points, namely branch points about which all classical singularities are immersed. Furthermore, we show that if an $n...

Paul Minter, Sidney Stanbury

2603.03100 2026-03-03
AI LLM

Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

Argumentative component detection (ACD) is a core subtask of Argument(ation) Mining (AM) and one of its most challenging aspects, as it requires jointly delimiting argumentative spans and classifyi...

Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata

2603.03095 2026-03-03
TESTING

Safe and Robust Domains of Attraction for Discrete-Time Systems: A Set-Based Characterization and Certifiable Neural Network Estimation

Analyzing nonlinear systems with attracting robust invariant sets (RISs) requires estimating their domains of attraction (DOAs). Despite extensive research, accurately characterizing DOAs for gener...

Mohamed Serry, Maxwell Fitzsimmons, Jun Liu

2603.03082 2026-03-03
AI LLM

TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models

Large language models (LLMs) have achieved remarkable success across diverse applications but remain vulnerable to jailbreak attacks, where attackers craft prompts that bypass safety alignment and ...

Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu

2603.03081 2026-03-03
AI LLM

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such pref...

Chengkai Wang, Baisong Liu

2603.03080 2026-03-03
TESTING

Probabilistic and Alarm-Based Evaluation of a b-Value-Driven Deep Learning Earthquake Forecast

We evaluate the forecasting performance of a deep learning model, originally introduced as a pattern-extraction framework, that operates on the spatiotemporal evolution of seismic b-values in a sho...

Jonas Köhler, Wei Li, Johannes Faber, Georg Rümpker, Nishtha Srivastava

2603.03079 2026-03-03
AI LLM

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, too...

Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang

2603.03078 2026-03-03
AI LLM

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference

Accurate sea ice mapping is essential for safe maritime navigation in polar regions, where rapidly changing ice conditions require timely and reliable information. While Sentinel-1 Synthetic Apertu...

Mhd Rashed Al Koutayni, Mohamed Selim, Gerd Reis, Alain Pagani, Didier Stricker

2603.03075 2026-03-03
AI LLM

Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice

Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) become...

Xiaohan Peng, Wendy E. Mackay, Janin Koch

2603.03074 2026-03-03
TESTING

Context Adaptive Extended Chain Coding for Semantic Map Compression

Semantic maps are increasingly utilized in areas such as robotics, autonomous systems, and extended reality, motivating the investigation of efficient compression methods that preserve structured s...

Runyu Yang, Junqi Liao, Hyomin Choi, Fabien Racapé, Ivan V. Bajić

2603.03073 2026-03-03
AI LLM

TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Large language models (LLMs) are increasingly used to assist scientists across diverse workflows. A key challenge is generating high-quality figures from textual descriptions, often represented as ...

Christian Greisinger, Steffen Eger

2603.03072 2026-03-03
AI LLM

EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely ...

Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui

2603.03066 2026-03-03
TESTING

V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots

Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of h...

Zipeng Qiu, Wenjie Qu, Jiaheng Zhang, Binhang Yuan

2603.03065 2026-03-03
TESTING

Radius-Flow Entanglement in Hadron States and Gravitational Form Factors

We propose a lattice-ready entanglement observable for QCD hadrons: the vacuum-subtracted radius flow of the ball Rényi entropy, $\mathfrak{s}_n(R;h)\equiv R\,\partial_RΔS_n(B_R;h)$, defined via th...

Kiminad A. Mamo

2603.03064 2026-03-03
AI LLM

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent win...

Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao

2603.03060 2026-03-03
AI LLM

An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education

Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. I...

Alex Binh Vinh Duc Nguyen

2603.03059 2026-03-03
AI LLM

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from docto...

Sudip Bhujel

2603.03054 2026-03-03