Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Why Does RLAIF Work At All?

Reinforcement Learning from AI Feedback (RLAIF) enables language models to improve by training on their own preference judgments, yet no theoretical account explains why this self-improvement seemi...

Robin Young

2603.03000 2026-03-03
AI LLM

Contextualized Privacy Defense for LLM Agents

LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, s...

Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang

2603.02983 2026-03-03
AI LLM

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch: VLMs are primarily pretrained on static, disembodi...

Jiaxing Liu, Zexi Zhang, Xiaoyan Li, Boyue Wang, Yongli Hu, Baocai Yin

2603.02972 2026-03-03
AI LLM

Delegation and Verification Under AI

As AI systems enter institutional workflows, workers must decide whether to delegate task execution to AI and how much effort to invest in verifying AI outputs, while institutions evaluate workers ...

Lingxiao Huang, Wenyang Xiao, Nisheeth K. Vishnoi

2603.02961 2026-03-03
AI LLM

Architecting Trust in Artificial Epistemic Agents

Large language models increasingly function as epistemic agents -- entities that can 1) autonomously pursue epistemic goals and 2) actively shape our shared knowledge environment. They curate the i...

Nahema Marchal, Stephanie Chan, Matija Franklin, Manon Revel, Geoff Keeling, Roberta Fischli, Bil...

2603.02960 2026-03-03
AI LLM

Changing Pedagogical Paradigms: Integrating Generative AI in Mathematics to Enhance Digital Literacy through 'Mathematical Battles with AI'

This paper introduces `Math Battles with AI', an innovative competitive format designed at ITMO University to redefine the role of generative AI in mathematics education. Moving away from a purely ...

Maria Moskalenko, Alexander Trifanov, Roman Popkov, Arina Tabieva, Maria Smirnova, Konstantin Pra...

2603.02955 2026-03-03
AI LLM

The Geometry of Learning Under AI Delegation

As AI systems shift from tools to collaborators, a central question is how the skills of humans relying on them change over time. We study this question mathematically by modeling the joint evoluti...

Lingxiao Huang, Nisheeth K. Vishnoi

2603.02950 2026-03-03
AI LLM

SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment

Large Language Models are rapidly gaining traction in software engineering, yet their growing carbon footprint raises pressing sustainability concerns. While training emissions are substantial, inf...

Priyavanshi Pathania, Rohit Mehra, Vibhu Saujanya Sharma, Vikrant Kaulgud, Tiffani Nevels, Sanjay...

2603.02949 2026-03-03
AI LLM

Reducing Labeling Effort in Architecture Technical Debt Detection through Active Learning and Explainable AI

Self-Admitted Technical Debt (SATD) refers to technical compromises explicitly admitted by developers in natural language artifacts such as code comments, commit messages, and issue trackers. Among...

Edi Sutoyo, Paris Avgeriou, Andrea Capiluppi

2603.02944 2026-03-03
AI LLM

ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization...

Yang Zhan, Yunhao Li, Zhang Chao, Yuxu Lu, Yan Li

2603.02939 2026-03-03
AI LLM

Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models

Graph-based tasks in the zero-shot setting remain a significant challenge due to data scarcity and the inability of traditional Graph Neural Networks (GNNs) to generalize to unseen domains or label...

Fengzhi Li, Liang Zhang, Yuan Zuo, Ruiqing Zhao, YanSong Liu, Yunfei Ma, Fanyu Meng, Junlan Feng

2603.02938 2026-03-03
AI LLM

GloPath: An Entity-Centric Foundation Model for Glomerular Lesion Assessment and Clinicopathological Insights

Glomerular pathology is central to the diagnosis and prognosis of renal diseases, yet the heterogeneity of glomerular morphology and fine-grained lesion patterns remain challenging for current AI a...

Qiming He, Jing Li, Tian Guan, Yifei Ma, Zimo Zhao, Yanxia Wang, Hongjing Chen, Yingming Xu, Shua...

2603.02926 2026-03-03
AI LLM

Eliciting Numerical Predictive Distributions of LLMs Without Autoregression

Large Language Models (LLMs) have recently been successfully applied to regression tasks -- such as time series forecasting and tabular prediction -- by leveraging their in-context learning abiliti...

Julianna Piskorz, Katarzyna Kobalczyk, Mihaela van der Schaar

2603.02913 2026-03-03
AI LLM

Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement

Articulated objects are ubiquitous in daily life. Our goal is to achieve a high-quality reconstruction, segmentation of independent moving parts, and analysis of articulation. Recent methods analys...

Hao Ai, Wenjie Chang, Jianbo Jiao, Ales Leonardis, Ofek Eyal

2603.02910 2026-03-03
AI LLM

Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction

Document-level event argument extraction (DEAE) is essential for knowledge acquisition, aiming to extract participants of events from documents.In the zero-shot setting, existing methods employ LLM...

Guangjun Zhang, Hu Zhang, Yazhou Han, Yue Fan, Yuhang Shao, Ru Li, Hongye Tan

2603.02909 2026-03-03
AI LLM

SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

In recent years, pre-trained large language models have achieved remarkable success across diverse tasks. Besides the pivotal role of self-supervised pre-training, their effectiveness in downstream...

Qi Zhang, Yifei Wang, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Yisen Wang

2603.02908 2026-03-03
AI LLM

Speech recognition assisted by large language models to command software orally -- Application to an augmented and virtual reality web app for immersive molecular graphics

This project successfully developed, evaluated and integrated a Voice User Interface (VUI) into a web application that we are developing for immersive molecular graphics. Said app provides augmente...

Fabio Cortes Rodriguez, Luciano Abriata

2603.02901 2026-03-03
AI LLM

SpecLoop: An Agentic RTL-to-Specification Framework with Formal Verification Feedback Loop

RTL implementations frequently lack up-to-date or consistent specifications, making comprehension, maintenance, and verification costly and error-prone. While prior work has explored generating spe...

Fu-Chieh Chang, Yu-Hsin Yang, Hung-Ming Huang, Yun-Chia Hsu, Yin-Yu Lin, Ming-Fang Tsai, Chun-Chi...

2603.02895 2026-03-03
AI LLM

Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field

The multi-million dollar investment required for modern machine learning (ML) has made large ML models a prime target for theft. In response, the field of model stealing has emerged. Attacks based ...

Peter Horvath, Ilia Shumailov, Lukasz Chmielewski, Lejla Batina, Yuval Yarom

2603.02891 2026-03-03
AI LLM

LLandMark: A Multi-Agent Framework for Landmark-Aware Multimodal Interactive Video Retrieval

The increasing diversity and scale of video data demand retrieval systems capable of multimodal understanding, adaptive reasoning, and domain-specific knowledge integration. This paper presents LLa...

Minh-Chi Phung, Thien-Bao Le, Cam-Tu Tran-Thi, Thu-Dieu Nguyen-Thi, Vu-Hung Dao

2603.02888 2026-03-03