Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Reasoning over Semantic IDs Enhances Generative Recommendation

Recent advances in generative recommendation have leveraged pretrained LLMs by formulating sequential recommendation as autoregressive generation over a unified token space comprising language toke...

Yingzhi He, Yan Sun, Junfei Tan, Yuxin Chen, Xiaoyu Kong, Chunxu Shen, Xiang Wang, An Zhang, Tat-...

2603.23183 2026-03-24
TESTING

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Deepfakes generated by modern generative models pose a serious threat to information integrity, digital identity, and public trust. Existing detection methods are largely reactive, attempting to id...

Bibek Das, Chandranath Adak, Soumi Chattopadhyay, Zahid Akhtar, Soumya Dutta

2603.23178 2026-03-24
AI LLM

From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service

Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces....

Haoyu He, Jinyu Zhuang, Haoran Chu, Shuhang Yu, J, T AI Group, Hao Wang, Kunpeng Han

2603.23172 2026-03-24
AI LLM

Robust Safety Monitoring of Language Models via Activation Watermarking

Large language models (LLMs) can be misused to reveal sensitive information, such as weapon-making instructions or writing malware. LLM providers rely on $\emph{monitoring}$ to detect and flag unsa...

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

2603.23171 2026-03-24
AI LLM

UniDial-EvalKit: A Unified Toolkit for Evaluating Multi-Faceted Conversational Abilities

Benchmarking AI systems in multi-turn interactive scenarios is essential for understanding their practical capabilities in real-world applications. However, existing evaluation protocols are highly...

Qi Jia, Haodong Zhao, Dun Pei, Xiujie Song, Shibo Wang, Zijian Chen, Zicheng Zhang, Xiangyang Zhu...

2603.23160 2026-03-24
TESTING

PHANTOM Hand

Tendon-driven underactuated hands excel in adaptive grasping but often suffer from kinematic unpredictability and highly non-linear force transmission. This ambiguity limits their ability to perfor...

Teng Yan, Jiongxu Chen, Qixiang Hua, Yue Yu, Zihang Wang, Yaohua Liu, Bingzhuo Zhong

2603.23152 2026-03-24
AI LLM

Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

The widespread adoption of Large Language Models (LLMs) has made the detection of AI-Generated text a pressing and complex challenge. Although many detection systems report high benchmark accuracy,...

Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, Marisa Llorens Salvador

2603.23146 2026-03-24
AI LLM

Can Language Models Pass Software Testing Certification Exams? a case study

Large Language Models (LLMs) play a pivotal role in both academic research and broader societal applications. LLMs are increasingly used in software testing activities such as test case generation,...

Fitash Ul Haq, Jordi Cabot

2603.23142 2026-03-24
AI LLM

DAK-UCB: Diversity-Aware Prompt Routing for LLMs and Generative Models

The expansion of generative AI and LLM services underscores the growing need for adaptive mechanisms to select an appropriate available model to respond to a user's prompts. Recent works have propo...

Donya Jafari, Farzan Farnia

2603.23140 2026-03-24
AI LLM

HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature

Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entit...

Devvrat Joshi, Islem Rekik

2603.23136 2026-03-24
TESTING

Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair

Gödel agent realize recursive self-improvement: an agent inspects its own policy and traces and then modifies that policy in a tested loop. We introduce Polaris, a Gödel agent for compact models th...

Aditya Kakade, Vivek Srivastava, Shirish Karande

2603.23129 2026-03-24
TESTING

Agentic Verifier-in-the-Loop Solver Orchestration for Cell-Free Massive MIMO Downlink Power Control

Cell-free massive multiple-input multiple-output (MIMO) systems can provide uniformly strong service through distributed access points, but performance still depends critically on downlink power co...

Zhichao Gao

2603.23128 2026-03-24
AI LLM

From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track

The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both...

Ignacy Alwasiak, Kene Nnolim, Jaclyn Thi, Samy Ateia, Markus Bink, Gregor Donabauer, David Elswei...

2603.23125 2026-03-24
AI LLM

Automatic Segmentation of 3D CT scans with SAM2 using a zero-shot approach

Foundation models for image segmentation have shown strong generalization in natural images, yet their applicability to 3D medical imaging remains limited. In this work, we study the zero-shot use ...

Miquel Lopez Escoriza, Pau Amargant Alvarez

2603.23116 2026-03-24
AI LLM

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

The increasing realism of AI-Generated Images (AIGI) has created an urgent need for forensic tools capable of reliably distinguishing synthetic content from authentic imagery. Existing detectors ar...

Yangxin Yu, Yue Zhou, Bin Li, Kaiqing Lin, Haodong Li, Jiangqun Ni, Bo Cao

2603.23115 2026-03-24
AI LLM

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of mo...

Adrian Sauter, Mona Schirmer

2603.23114 2026-03-24
TESTING

Power System Studies Using Open-Access Software

The use of open-access software is an option that can be considered by those interested in power system studies. In addition, the combination of two or more of these tools can expand the capabiliti...

Juan A. Martinez-Velasco, Pau Casals-Torrens, Ricard Bosch-Tous, Alexandre Serrano-Fontova

2603.23103 2026-03-24
AI LLM

SpecXMaster Technical Report

Intelligent spectroscopy serves as a pivotal element in AI-driven closed-loop scientific discovery, functioning as the critical bridge between matter structure and artificial intelligence. However,...

Yutang Ge, Yaning Cui, Hanzheng Li, Jun-Jie Wang, Fanjie Xu, Jinhan Dong, Yongqi Jin, Dongxu Cui,...

2603.23101 2026-03-24
TESTING

DSO Led-Bilevel Optimization Framework for TSO-DSO Coordination across Active Distribution Networks

This work presents a bilevel coordination model that captures the hierarchical interaction between the transmission and distribution layers under a Distribution System Operator(DSO)-led configurati...

Fernando García-Muñoz, Martín Venegas Escalona

2603.23099 2026-03-24
AI LLM

When Language Models Lose Their Mind: The Consequences of Brain Misalignment

While brain-aligned large language models (LLMs) have garnered attention for their potential as cognitive models and for potential for enhanced safety and trustworthiness in AI, the role of this br...

Gabriele Merlin, Mariya Toneva

2603.23091 2026-03-24