Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering

Large language models (LLMs) combined with retrieval augmented generation have enabled the deployment of domain-specific chatbots, but these systems remain prone to generating unsupported or incorr...

Nhi Dang, Tung Le, Huy Tien Nguyen

2603.10570 2026-03-11
TESTING

TacLoc: Global Tactile Localization on Objects from a Registration Perspective

Pose estimation is essential for robotic manipulation, particularly when visual perception is occluded during gripper-object interactions. Existing tactile-based methods generally rely on tactile s...

Zirui Zhang, Boyang Zhang, Fumin Zhang, Huan Yin

2603.10565 2026-03-11
AI LLM

Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents

The integration of Generative AI models into AI-native network systems offers a transformative path toward achieving autonomous and adaptive control. However, the application of such models to cont...

Yuanhao Li, Haozhe Wang, Geyong Min, Nektarios Georgalas, Wang Miao

2603.10564 2026-03-11
AI LLM

PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation

PET/CT imaging is pivotal in oncology and nuclear medicine, yet summarizing complex findings into precise diagnostic impressions is labor-intensive. While LLMs have shown promise in medical text ge...

Yuchen Liu, Wenbo Zhang, Liling Peng, Yichi Zhang, Yu Fu, Xin Guo, Chao Qu, Yuan Qi, Le Xue

2603.10560 2026-03-11
TESTING

A Bipartite Graph Approach to U.S.-China Cross-Market Return Forecasting

This paper studies cross-market return predictability through a machine learning framework that preserves economic structure. Exploiting the non-overlapping trading hours of the U.S. and Chinese eq...

Jing Liu, Maria Grith, Xiaowen Dong, Mihai Cucuringu

2603.10559 2026-03-11
TESTING

FP-Predictor - False Positive Prediction for Static Analysis Reports

Static Application Security Testing (SAST) tools play a vital role in modern software development by automatically detecting potential vulnerabilities in source code. However, their effectiveness i...

Tom Ohlmer, Michael Schlichtig, Eric Bodden

2603.10558 2026-03-11
AI LLM

CD-Raft: Reducing the Latency of Distributed Consensus in Cross-Domain Sites

Today's massive AI computation loads push heavy data synchronization across sites, i.e., nodes in data centers. Any reduction in such consensus latency can significantly improve the overall perform...

Yangyang Wang, Ziqian Cheng, Yucong Dong, Zichen Xu

2603.10555 2026-03-11
AI LLM

Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues

Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon ...

Mohammed Salah, Eman Ouda, Giuseppe Dell'Avvocato, Fabrizio Sarasini, Ester D'Accardi, Jorge Dias...

2603.10549 2026-03-11
AI LLM

Automatic End-to-End Data Integration using Large Language Models

Designing data integration pipelines typically requires substantial manual effort from data engineers to configure pipeline components and label training data. While LLMs have shown promise in hand...

Aaron Steiner, Christian Bizer

2603.10547 2026-03-11
AI LLM

Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation

Promptable Foundation Models (FMs), initially introduced for natural image segmentation, have also revolutionized medical image segmentation. The increasing number of models, along with evaluations...

Caroline Magg, Maaike A. ter Wee, Johannes G. G. Dobbe, Geert J. Streekstra, Leendert Blankevoort...

2603.10541 2026-03-11
AI LLM

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior ...

Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhan...

2603.10535 2026-03-11
AI LLM

AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

We present the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response g...

Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos St...

2603.10524 2026-03-11
AI LLM

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete, trust-ordered policy for resolving instruction conflicts....

Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo, Steph Lin, Nikhil...

2603.10521 2026-03-11
AI LLM

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive lea...

Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski

2603.10512 2026-03-11
AI LLM

Safe and Scalable Web Agent Learning via Recreated Websites

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We pro...

Hyungjoo Chae, Jungsoo Park, Alan Ritter

2603.10505 2026-03-11
AI LLM

Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabili...

Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo

2603.10504 2026-03-11
AI LLM

Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling

As AI capabilities and deployment accelerate toward a post-AGI era, concerns are growing about electricity demand and carbon emissions from AI computing, yet it is rarely represented explicitly in ...

Doyi Kim, Jiseok Ahn, Haewon McJeon, Changick Kim

2603.10498 2026-03-11
AI LLM

VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can en...

Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu,...

2603.10494 2026-03-11
AI LLM

Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent

We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases....

Zhongzhen Huang, Yan Ling, Hong Chen, Ye Feng, Li Wu, Linjie Mu, Shaoting Zhang, Xiaofan Zhang, K...

2603.10492 2026-03-11
TESTING

From Verification to Herding: Exploiting Software's Sparsity of Influence

Software verification is now costly, taking over half the project effort while failing on modern complex systems. We hence propose a shift from verification and modeling to herding: treating testin...

Tim Menzies, Kishan Kumar Ganguly

2603.10478 2026-03-11