Papers
Research papers from arXiv and related sources
RippleGUItester: Change-Aware Exploratory Testing
Software systems evolve continuously through frequent code changes, yet such changes often introduce unintended bugs despite extensive testing and code review. Existing testing approaches are large...
Yanqi Su, Michael Pradel, Chunyang Chen
AI Space Physics: Constitutive boundary semantics for open AI institutions
Agentic AI deployments increasingly behave as persistent institutions rather than one-shot inference endpoints: they accumulate state, invoke external tools, coordinate multiple runtimes, and modif...
Oleg Romanchuk, Roman Bondar
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware ...
Hongliu Cao, Ilias Driouich, Eoin Thomas
Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems
Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later t...
Raad Khraishi, Iman Zafar, Katie Myles, Greig A Cowan
Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection
Argumentative component detection (ACD) is a core subtask of Argument(ation) Mining (AM) and one of its most challenging aspects, as it requires jointly delimiting argumentative spans and classifyi...
Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Large language models (LLMs) have achieved remarkable success across diverse applications but remain vulnerable to jailbreak attacks, where attackers craft prompts that bypass safety alignment and ...
Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu
Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation
LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such pref...
Chengkai Wang, Baisong Liu
RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, too...
Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang
TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference
Accurate sea ice mapping is essential for safe maritime navigation in polar regions, where rapidly changing ice conditions require timely and reliable information. While Sentinel-1 Synthetic Apertu...
Mhd Rashed Al Koutayni, Mohamed Selim, Gerd Reis, Alain Pagani, Didier Stricker
Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice
Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) become...
Xiaohan Peng, Wendy E. Mackay, Janin Koch
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
Large language models (LLMs) are increasingly used to assist scientists across diverse workflows. A key challenge is generating high-quality figures from textual descriptions, often represented as ...
Christian Greisinger, Steffen Eger
EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education
While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely ...
Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui
DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming
We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent win...
Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao
An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education
Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. I...
Alex Binh Vinh Duc Nguyen
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from docto...
Sudip Bhujel
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the do...
Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang
Step-Level Sparse Autoencoder for Reasoning Process Interpretation
Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While...
Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (...
Yuvraj Agrawal
Reproducing and Comparing Distillation Techniques for Cross-Encoders
Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right...
Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski
OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents
Multi-agent large language model frameworks are promising for complex multi step reasoning, yet existing systems remain weak for scientific and knowledge intensive domains due to static prompts and...
Yichao Feng, Haoran Luo, Zhenghong Lin, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Anh Tuan Luu