Papers
Research papers from arXiv and related sources
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions
This study provides a cross-disciplinary examination of Explainable Artificial Intelligence (XAI) approaches-focusing on deep neural networks (DNNs) and large language models (LLMs)-and identifies ...
Saleh Afroogh, Seyd Ishtiaque Ahmed, Petra Ahrweiler, David Alvarez-Melis, Mansur Maturidi Arief,...
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textboo...
Antoine Peyronnet, Fabian Gloeckle, Amaury Hayat
ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models
Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with the aim of making the resulting decisions fai...
Adam Dejl, Deniz Gorur, Francesca Toni
Hypothesis Testing over Observable Regimes in Singular Models
Hypothesis testing in singular statistical models is often regarded as inherently problematic due to non-identifiability and degeneracy of the Fisher information. We show that the fundamental obstr...
Sean Plummer
What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs
At present, there are no easily understood explainable artificial intelligence (AI) methods for discrete token inputs, like text. Most explainable AI techniques do not extend well to token sequence...
Daniel S. Berman, Brian Merritt, Stanley Ta, Dana Udwin, Amanda Ernlund, Jeremy Ratcliff, Vijay N...
Robust Skills, Brittle Grounding: Diagnosing Restricted Generalization in Vision-Language Action Policies via Multi-Object Picking
Vision-language action (VLA) policies often report strong manipulation benchmark performance with relatively few demonstrations, but it remains unclear whether this reflects robust language-to-obje...
David Emukpere, Romain Deffayet, Jean-Michel Renders
"Make It Sound Like a Lawyer Wrote It": Scenarios of Potential Impacts of Generative AI for Legal Conflict Resolution
Generative AI (GenAI) tools are transforming critical societal domains, including the legal sector. While these tools create opportunities such as increased efficiency and potential improvements in...
Kimon Kieslich, Natali Helberger, Nicholas Diakopoulos
Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek
This study presents the first systematic, reference-free human evaluation of large language model (LLM) machine translation (MT) for Ancient Greek (AG) technical prose. We evaluate translations by ...
James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky
Agentic AI-RAN: Enabling Intent-Driven, Explainable and Self-Evolving Open RAN Intelligence
Open RAN (O-RAN) exposes rich control and telemetry interfaces across the Non-RT RIC, Near-RT RIC, and distributed units, but also makes it harder to operate multi-tenant, multi-objective RANs in a...
Zhizhou He, Yang Luo, Xinkai Liu, Mahdi Boloursaz Mashhadi, Mohammad Shojafar, Merouane Debbah, R...
Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
Vision-language models (VLMs) show promise in drafting radiology reports, yet they frequently suffer from logical inconsistencies, generating diagnostic impressions unsupported by their own percept...
Vikash Singh, Debargha Ganguly, Haotian Yu, Chengwei Zhou, Prerna Singh, Brandon Lee, Vipin Chaud...
Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reasoning Models. However, standard outcome-based ...
Yanwei Ren, Haotian Zhang, Likang Xiao, Xikai Zhang, Jiaxing Huang, Jiayan Qiu, Baosheng Yu, Quan...
ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts
Can narratives make arguments more persuasive? And to this end, which narrative features matter most? Although stories are often seen as powerful tools for persuasion, their specific role in online...
Sara Nabhani, Federico Pianzola, Khalid Al-Khatib, Malvina Nissim
Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation
Functional testing is essential for verifying that the business logic of mobile applications aligns with user requirements, serving as the primary methodology for quality assurance in software deve...
Yakun Zhang, Zihan Wang, Xinzhi Peng, Zihao Xie, Xiaodong Wang, Xutao Li, Dan Hao, Lu Zhang, Yunm...
Artificial Agency Program: Curiosity, compression, and communication in agents
This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity...
Richard Csaky
Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance
Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply ...
Yue Xie, Zizhen Xu, William Beazley, Fumiya Iida
The impacts of artificial intelligence on environmental sustainability and human well-being
Artificial Intelligence (AI) is changing the world, but its impacts on the environment and human well-being remain uncertain. We conducted a systematic literature review of 1,291 studies selected f...
Noemi Luna Carmeno, Tiago Domingos, Daniel W. O'Neill
Precision Studies and Searches for CP Asymmetries in the Inclusive Decay $Λ_{c}^{+}\to ΛX$
Based on $e^+e^-$ annihilation data collected with the BESIII detector at center-of-mass energies from 4.600 to 4.699 GeV, corresponding to an integrated luminosity of 4.5 fb$^{-1}$, we present the...
BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Alibert...
Shaping the Digital Future of ErUM Research: Sustainability & Ethics
This workshop report from "Shaping the Digital Future of ErUM Research: Sustainability & Ethics" (Aachen, 2025) reviews progress on sustainability measures in data-intensive ErUM-Data research sinc...
Luca Di Bella, Jan Bürger, Markus Demleitner, Torsten Enßlin, Johannes Erdmann, Martin Erdmann, B...
The Subjectivity of Monoculture
Machine learning models -- including large language models (LLMs) -- are often said to exhibit monoculture, where outputs agree strikingly often. But what does it actually mean for models to agree ...
Nathanael Jo, Nikhil Garg, Manish Raghavan
Preference Packing: Efficient Preference Optimization for Large Language Models
Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. In particular, batch packing is commonly used ...
Jaekyung Cho