Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty

Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampli...

Azza Jenane, Nassim Walha, Lukas Kuhn, Florian Buettner

2603.06317 2026-03-06
AI LLM

The EpisTwin: A Knowledge Graph-Grounded Neuro-Symbolic Architecture for Personal AI

Personal Artificial Intelligence is currently hindered by the fragmentation of user data across isolated silos. While Retrieval-Augmented Generation offers a partial remedy, its reliance on unstruc...

Giovanni Servedio, Potito Aghilar, Alessio Mattiace, Gianni Carmosino, Francesco Musicco, Gabriel...

2603.06290 2026-03-06
AI LLM

Learning Where the Physics Is: Probabilistic Adaptive Sampling for Stiff PDEs

Modeling stiff partial differential equations (PDEs) with sharp gradients remains a significant challenge for scientific machine learning. While Physics-Informed Neural Networks (PINNs) struggle wi...

Akshay Govind Srinivasan, Balaji Srinivasan

2603.06287 2026-03-06
AI LLM

SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation

High-quality, long-horizon demonstrations are essential for embodied AI, yet acquiring such data for tightly coupled wheeled mobile manipulators remains a fundamental bottleneck. Unlike fixed-base ...

Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Zhu Jin, Lu Fang

2603.06280 2026-03-06
AI LLM

Story Point Estimation Using Large Language Models

This study investigates the use of large language models (LLMs) for story point estimation. Story points are unitless, project-specific effort estimates that help developers on the scrum team forec...

Pranam Prakash Shetty, Adarsh Balakrishnan, Mengqiao Xu, Xiaoyin Xi, Zhe Yu

2603.06276 2026-03-06
AI LLM

Stem: Rethinking Causal Information Flow in Sparse Attention

The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, particularly during the pre-filling phase. In th...

Lin Niu, Xin Luo, Linchuan Xie, Yifu Sun, Guanghua Yu, Jianchen Zhu, S Kevin Zhou

2603.06274 2026-03-06
AI LLM

Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering

Agentic retrieval-augmented reasoning pipelines are increasingly used to structure how large language models (LLMs) incorporate external evidence in clinical decision support. These systems iterati...

Mina Farajiamiri, Jeta Sopa, Saba Afza, Lisa Adams, Felix Barajas Ordonez, Tri-Thien Nguyen, Mahs...

2603.06271 2026-03-06
AI LLM

Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion

Large Language Models (LLMs) are increasingly being deployed in multilingual, multicultural settings, yet their reliance on predominantly English-centric training data risks misalignment with the d...

Hari Shankar, Vedanta S P, Sriharini Margapuri, Debjani Mazumder, Ponnurangam Kumaraguru, Abhijna...

2603.06264 2026-03-06
AI LLM

NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' h...

Kai Luo, Xu Wang, Rui Fan, Kailun Yang

2603.06254 2026-03-06
AI LLM

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

Group relative policy optimization (GRPO) has become a standard post-training paradigm for improving reasoning and preference alignment in large language models (LLMs), and has recently shown stron...

Yu Wang, Yonghui Yang, Le Wu, Jiancan Wu, Hefei Xu, Hui Lin

2603.06243 2026-03-06
AI LLM

Human, Algorithm, or Both? Gender Bias in Human-Augmented Recruiting

Recent years have seen rapid growth in the market for HR technology and AI-driven HR solutions in particular. This popularity has also resulted in increased attention to the negative aspects of usi...

Mesut Kaya, Toine Bogers

2603.06240 2026-03-06
AI LLM

What are AI researchers worried about?

As AI attracts vast investment and attention, there are competing concerns about the technology's opportunities and uncertainties that blend technical and social questions. The public debate, domin...

Cian O'Donovan, Sarp Gurakan, Ananya Karanam, Xiaomeng Wu, Jack Stilgoe

2603.06223 2026-03-06
AI LLM

Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI

Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer litt...

Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl, Hans Auer

2603.06217 2026-03-06
AI LLM

LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a framework in which a Generator, such as a Large Language Model (LLM), produces answers by retrieving documents from an external collection using a Retrieve...

Koki Itai, Shunichi Hasegawa, Yuta Yamamoto, Gouki Minegishi, Masaki Otsuki

2603.06198 2026-03-06
AI LLM

Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models

Large-scale content analysis is increasingly limited by the absence of observable ground truth or gold-standard labels, as creating such benchmarks through extensive human coding becomes impractica...

Luis de-Marcos, Manuel Goyanes, Adrián Domínguez-Díaz

2603.06197 2026-03-06
AI LLM

Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training

Pulse-shape discrimination (PSD) in high-purity germanium (HPGe) detectors is central to rare-event searches such as neutrinoless double-beta decay (0vBB), yet conventional approaches compress each...

Marta Babicz, Saúl Alonso-Monsalve, Alain Fauquex, Laura Baudis

2603.06192 2026-03-06
AI LLM

CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation

We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. U...

Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmai...

2603.06183 2026-03-06
AI LLM

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

Humanoid robots have achieved significant progress in motion generation and control, exhibiting movements that appear increasingly natural and human-like. Inspired by the Turing Test, we propose th...

Mingzhe Li, Mengyin Liu, Zekai Wu, Xincheng Lin, Junsheng Zhang, Ming Yan, Zengye Xie, Changwang ...

2603.06181 2026-03-06
AI LLM

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to...

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai

2603.06138 2026-03-06
AI LLM

Making Implicit Premises Explicit in Logical Understanding of Enthymemes

Real-world arguments in text and dialogues are normally enthymemes (i.e. some of their premises and/or claims are implicit). Natural language processing (NLP) methods for handling enthymemes can po...

Xuyao Feng, Anthony Hunter

2603.06114 2026-03-06