Papers
Research papers from arXiv and related sources
End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering
Large language models (LLMs) combined with retrieval augmented generation have enabled the deployment of domain-specific chatbots, but these systems remain prone to generating unsupported or incorr...
Nhi Dang, Tung Le, Huy Tien Nguyen
TacLoc: Global Tactile Localization on Objects from a Registration Perspective
Pose estimation is essential for robotic manipulation, particularly when visual perception is occluded during gripper-object interactions. Existing tactile-based methods generally rely on tactile s...
Zirui Zhang, Boyang Zhang, Fumin Zhang, Huan Yin
Adaptive RAN Slicing Control via Reward-Free Self-Finetuning Agents
The integration of Generative AI models into AI-native network systems offers a transformative path toward achieving autonomous and adaptive control. However, the application of such models to cont...
Yuanhao Li, Haozhe Wang, Geyong Min, Nektarios Georgalas, Wang Miao
PET-F2I: A Comprehensive Benchmark and Parameter-Efficient Fine-Tuning of LLMs for PET/CT Report Impression Generation
PET/CT imaging is pivotal in oncology and nuclear medicine, yet summarizing complex findings into precise diagnostic impressions is labor-intensive. While LLMs have shown promise in medical text ge...
Yuchen Liu, Wenbo Zhang, Liling Peng, Yichi Zhang, Yu Fu, Xin Guo, Chao Qu, Yuan Qi, Le Xue
A Bipartite Graph Approach to U.S.-China Cross-Market Return Forecasting
This paper studies cross-market return predictability through a machine learning framework that preserves economic structure. Exploiting the non-overlapping trading hours of the U.S. and Chinese eq...
Jing Liu, Maria Grith, Xiaowen Dong, Mihai Cucuringu
FP-Predictor - False Positive Prediction for Static Analysis Reports
Static Application Security Testing (SAST) tools play a vital role in modern software development by automatically detecting potential vulnerabilities in source code. However, their effectiveness i...
Tom Ohlmer, Michael Schlichtig, Eric Bodden
CD-Raft: Reducing the Latency of Distributed Consensus in Cross-Domain Sites
Today's massive AI computation loads push heavy data synchronization across sites, i.e., nodes in data centers. Any reduction in such consensus latency can significantly improve the overall perform...
Yangyang Wang, Ziqian Cheng, Yucong Dong, Zichen Xu
Towards Cognitive Defect Analysis in Active Infrared Thermography with Vision-Text Cues
Active infrared thermography (AIRT) is currently witnessing a surge of artificial intelligence (AI) methodologies being deployed for automated subsurface defect analysis of high performance carbon ...
Mohammed Salah, Eman Ouda, Giuseppe Dell'Avvocato, Fabrizio Sarasini, Ester D'Accardi, Jorge Dias...
Automatic End-to-End Data Integration using Large Language Models
Designing data integration pipelines typically requires substantial manual effort from data engineers to configure pipeline components and label training data. While LLMs have shown promise in hand...
Aaron Steiner, Christian Bizer
Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation
Promptable Foundation Models (FMs), initially introduced for natural image segmentation, have also revolutionized medical image segmentation. The increasing number of models, along with evaluations...
Caroline Magg, Maaike A. ter Wee, Johannes G. G. Dobbe, Geert J. Streekstra, Leendert Blankevoort...
Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning
Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior ...
Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhan...
AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations
We present the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response g...
Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos St...
IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs
Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete, trust-ordered policy for resolving instruction conflicts....
Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo, Steph Lin, Nikhil...
Resource-constrained Amazons chess decision framework integrating large language models and graph attention
Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive lea...
Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski
Safe and Scalable Web Agent Learning via Recreated Websites
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We pro...
Hyungjoo Chae, Jungsoo Park, Alan Ritter
Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection
Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabili...
Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling
As AI capabilities and deployment accelerate toward a post-AGI era, concerns are growing about electricity demand and carbon emissions from AI computing, yet it is rarely represented explicitly in ...
Doyi Kim, Jiseok Ahn, Haewon McJeon, Changick Kim
VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization
Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can en...
Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu,...
Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent
We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases....
Zhongzhen Huang, Yan Ling, Hong Chen, Ye Feng, Li Wu, Linjie Mu, Shaoting Zhang, Xiaofan Zhang, K...
From Verification to Herding: Exploiting Software's Sparsity of Influence
Software verification is now costly, taking over half the project effort while failing on modern complex systems. We hence propose a shift from verification and modeling to herding: treating testin...
Tim Menzies, Kishan Kumar Ganguly