Papers
Research papers from arXiv and related sources
From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty
Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampli...
Azza Jenane, Nassim Walha, Lukas Kuhn, Florian Buettner
The EpisTwin: A Knowledge Graph-Grounded Neuro-Symbolic Architecture for Personal AI
Personal Artificial Intelligence is currently hindered by the fragmentation of user data across isolated silos. While Retrieval-Augmented Generation offers a partial remedy, its reliance on unstruc...
Giovanni Servedio, Potito Aghilar, Alessio Mattiace, Gianni Carmosino, Francesco Musicco, Gabriel...
Learning Where the Physics Is: Probabilistic Adaptive Sampling for Stiff PDEs
Modeling stiff partial differential equations (PDEs) with sharp gradients remains a significant challenge for scientific machine learning. While Physics-Informed Neural Networks (PINNs) struggle wi...
Akshay Govind Srinivasan, Balaji Srinivasan
SuperSuit: An Isomorphic Bimodal Interface for Scalable Mobile Manipulation
High-quality, long-horizon demonstrations are essential for embodied AI, yet acquiring such data for tightly coupled wheeled mobile manipulators remains a fundamental bottleneck. Unlike fixed-base ...
Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Zhu Jin, Lu Fang
Story Point Estimation Using Large Language Models
This study investigates the use of large language models (LLMs) for story point estimation. Story points are unitless, project-specific effort estimates that help developers on the scrum team forec...
Pranam Prakash Shetty, Adarsh Balakrishnan, Mengqiao Xu, Xiaoyin Xi, Zhe Yu
Stem: Rethinking Causal Information Flow in Sparse Attention
The quadratic computational complexity of self-attention remains a fundamental bottleneck for scaling Large Language Models (LLMs) to long contexts, particularly during the pre-filling phase. In th...
Lin Niu, Xin Luo, Linchuan Xie, Yifu Sun, Guanghua Yu, Jianchen Zhu, S Kevin Zhou
Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering
Agentic retrieval-augmented reasoning pipelines are increasingly used to structure how large language models (LLMs) incorporate external evidence in clinical decision support. These systems iterati...
Mina Farajiamiri, Jeta Sopa, Saba Afza, Lisa Adams, Felix Barajas Ordonez, Tri-Thien Nguyen, Mahs...
Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion
Large Language Models (LLMs) are increasingly being deployed in multilingual, multicultural settings, yet their reliance on predominantly English-centric training data risks misalignment with the d...
Hari Shankar, Vedanta S P, Sriharini Margapuri, Debjani Mazumder, Ponnurangam Kumaraguru, Abhijna...
NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving
Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' h...
Kai Luo, Xu Wang, Rui Fan, Kailun Yang
MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation
Group relative policy optimization (GRPO) has become a standard post-training paradigm for improving reasoning and preference alignment in large language models (LLMs), and has recently shown stron...
Yu Wang, Yonghui Yang, Le Wu, Jiancan Wu, Hefei Xu, Hui Lin
Human, Algorithm, or Both? Gender Bias in Human-Augmented Recruiting
Recent years have seen rapid growth in the market for HR technology and AI-driven HR solutions in particular. This popularity has also resulted in increased attention to the negative aspects of usi...
Mesut Kaya, Toine Bogers
What are AI researchers worried about?
As AI attracts vast investment and attention, there are competing concerns about the technology's opportunities and uncertainties that blend technical and social questions. The public debate, domin...
Cian O'Donovan, Sarp Gurakan, Ananya Karanam, Xiaomeng Wu, Jack Stilgoe
Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI
Residential demand response depends on sustained prosumer participation, yet existing coordination is either fully automated, or limited to one-way dispatch signals and price alerts that offer litt...
Reda El Makroum, Sebastian Zwickl-Bernhard, Lukas Kranzl, Hans Auer
LIT-RAGBench: Benchmarking Generator Capabilities of Large Language Models in Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is a framework in which a Generator, such as a Large Language Model (LLM), produces answers by retrieving documents from an external collection using a Retrieve...
Koki Itai, Shunichi Hasegawa, Yuta Yamamoto, Gouki Minegishi, Masaki Otsuki
Wisdom of the AI Crowd (AI-CROWD) for Ground Truth Approximation in Content Analysis: A Research Protocol & Validation Using Eleven Large Language Models
Large-scale content analysis is increasingly limited by the absence of observable ground truth or gold-standard labels, as creating such benchmarks through extensive human coding becomes impractica...
Luis de-Marcos, Manuel Goyanes, Adrián Domínguez-Díaz
Transformer-Based Pulse Shape Discrimination in HPGe Detectors with Masked Autoencoder Pre-training
Pulse-shape discrimination (PSD) in high-purity germanium (HPGe) detectors is central to rare-event searches such as neutrinoless double-beta decay (0vBB), yet conventional approaches compress each...
Marta Babicz, Saúl Alonso-Monsalve, Alain Fauquex, Laura Baudis
CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation
We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. U...
Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmai...
Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots
Humanoid robots have achieved significant progress in motion generation and control, exhibiting movements that appear increasingly natural and human-like. Inspired by the Turing Test, we propose th...
Mingzhe Li, Mengyin Liu, Zekai Wu, Xincheng Lin, Junsheng Zhang, Ming Yan, Zengye Xie, Changwang ...
Partial Policy Gradients for RL in LLMs
Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to...
Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai
Making Implicit Premises Explicit in Logical Understanding of Enthymemes
Real-world arguments in text and dialogues are normally enthymemes (i.e. some of their premises and/or claims are implicit). Natural language processing (NLP) methods for handling enthymemes can po...
Xuyao Feng, Anthony Hunter