Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

RippleGUItester: Change-Aware Exploratory Testing

Software systems evolve continuously through frequent code changes, yet such changes often introduce unintended bugs despite extensive testing and code review. Existing testing approaches are large...

Yanqi Su, Michael Pradel, Chunyang Chen

2603.03121 2026-03-03
AI LLM

AI Space Physics: Constitutive boundary semantics for open AI institutions

Agentic AI deployments increasingly behave as persistent institutions rather than one-shot inference endpoints: they accumulate state, invoke external tools, coordinate multiple runtimes, and modif...

Oleg Romanchuk, Roman Bondar

2603.03119 2026-03-03
AI LLM

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware ...

Hongliu Cao, Ilias Driouich, Eoin Thomas

2603.03116 2026-03-03
AI LLM

Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems

Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later t...

Raad Khraishi, Iman Zafar, Katie Myles, Greig A Cowan

2603.03111 2026-03-03
AI LLM

Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

Argumentative component detection (ACD) is a core subtask of Argument(ation) Mining (AM) and one of its most challenging aspects, as it requires jointly delimiting argumentative spans and classifyi...

Sofiane Elguendouze, Erwan Hain, Elena Cabrio, Serena Villata

2603.03095 2026-03-03
AI LLM

TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models

Large language models (LLMs) have achieved remarkable success across diverse applications but remain vulnerable to jailbreak attacks, where attackers craft prompts that bypass safety alignment and ...

Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu

2603.03081 2026-03-03
AI LLM

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such pref...

Chengkai Wang, Baisong Liu

2603.03080 2026-03-03
AI LLM

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, too...

Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang

2603.03078 2026-03-03
AI LLM

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference

Accurate sea ice mapping is essential for safe maritime navigation in polar regions, where rapidly changing ice conditions require timely and reliable information. While Sentinel-1 Synthetic Apertu...

Mhd Rashed Al Koutayni, Mohamed Selim, Gerd Reis, Alain Pagani, Didier Stricker

2603.03075 2026-03-03
AI LLM

Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice

Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) become...

Xiaohan Peng, Wendy E. Mackay, Janin Koch

2603.03074 2026-03-03
AI LLM

TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Large language models (LLMs) are increasingly used to assist scientists across diverse workflows. A key challenge is generating high-quality figures from textual descriptions, often represented as ...

Christian Greisinger, Steffen Eger

2603.03072 2026-03-03
AI LLM

EduVQA: Benchmarking AI-Generated Video Quality Assessment for Education

While AI-generated content (AIGC) models have achieved remarkable success in generating photorealistic videos, their potential to support visual, story-driven learning in education remains largely ...

Baoliang Chen, Xinlong Bu, Lingyu Zhu, Hanwei Zhu, Xiangjie Sui

2603.03066 2026-03-03
AI LLM

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent win...

Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao

2603.03060 2026-03-03
AI LLM

An HCI Perspective on Sustainable GenAI Integration in Architectural Design Education

Generative AI (genAI) is increasingly influencing architectural design practice and is expected to affect, or even transform, the profession, even though its benefits and costs remain unresolved. I...

Alex Binh Vinh Duc Nguyen

2603.03059 2026-03-03
AI LLM

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from docto...

Sudip Bhujel

2603.03054 2026-03-03
AI LLM

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the do...

Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang

2603.03047 2026-03-03
AI LLM

Step-Level Sparse Autoencoder for Reasoning Process Interpretation

Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning. However, their reasoning patterns remain too complicated to analyze. While...

Xuan Yang, Jiayu Liu, Yuhang Lai, Hao Xu, Zhenya Huang, Ning Miao

2603.03031 2026-03-03
AI LLM

REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Enterprise engineering organizations produce high-volume, heterogeneous telemetry from version control systems, CI/CD pipelines, issue trackers, and observability platforms. Large Language Models (...

Yuvraj Agrawal

2603.03018 2026-03-03
AI LLM

Reproducing and Comparing Distillation Techniques for Cross-Encoders

Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right...

Victor Morand, Mathias Vast, Basile Van Cooten, Laure Soulier, Josiane Mothe, Benjamin Piwowarski

2603.03010 2026-03-03
AI LLM

OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Multi-agent large language model frameworks are promising for complex multi step reasoning, yet existing systems remain weak for scientific and knowledge intensive domains due to static prompts and...

Yichao Feng, Haoran Luo, Zhenghong Lin, Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh, Anh Tuan Luu

2603.03005 2026-03-03