Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

This study provides a cross-disciplinary examination of Explainable Artificial Intelligence (XAI) approaches-focusing on deep neural networks (DNNs) and large language models (LLMs)-and identifies ...

Saleh Afroogh, Seyd Ishtiaque Ahmed, Petra Ahrweiler, David Alvarez-Melis, Mansur Maturidi Arief,...

2602.24176 2026-02-27
AI LLM

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textboo...

Antoine Peyronnet, Fabian Gloeckle, Amaury Hayat

2602.24173 2026-02-27
AI LLM

ArgLLM-App: An Interactive System for Argumentative Reasoning with Large Language Models

Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with the aim of making the resulting decisions fai...

Adam Dejl, Deniz Gorur, Francesca Toni

2602.24172 2026-02-27
TESTING

Hypothesis Testing over Observable Regimes in Singular Models

Hypothesis testing in singular statistical models is often regarded as inherently problematic due to non-identifiability and degeneracy of the Fisher information. We show that the fundamental obstr...

Sean Plummer

2602.24165 2026-02-27
AI LLM

What You Read is What You Classify: Highlighting Attributions to Text and Text-Like Inputs

At present, there are no easily understood explainable artificial intelligence (AI) methods for discrete token inputs, like text. Most explainable AI techniques do not extend well to token sequence...

Daniel S. Berman, Brian Merritt, Stanley Ta, Dana Udwin, Amanda Ernlund, Jeremy Ratcliff, Vijay N...

2602.24149 2026-02-27
TESTING

Robust Skills, Brittle Grounding: Diagnosing Restricted Generalization in Vision-Language Action Policies via Multi-Object Picking

Vision-language action (VLA) policies often report strong manipulation benchmark performance with relatively few demonstrations, but it remains unclear whether this reflects robust language-to-obje...

David Emukpere, Romain Deffayet, Jean-Michel Renders

2602.24143 2026-02-27
AI LLM

"Make It Sound Like a Lawyer Wrote It": Scenarios of Potential Impacts of Generative AI for Legal Conflict Resolution

Generative AI (GenAI) tools are transforming critical societal domains, including the legal sector. While these tools create opportunities such as increased efficiency and potential improvements in...

Kimon Kieslich, Natali Helberger, Nicholas Diakopoulos

2602.24130 2026-02-27
AI LLM

Terminology Rarity Predicts Catastrophic Failure in LLM Translation of Low-Resource Ancient Languages: Evidence from Ancient Greek

This study presents the first systematic, reference-free human evaluation of large language model (LLM) machine translation (MT) for Ancient Greek (AG) technical prose. We evaluate translations by ...

James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky

2602.24119 2026-02-27
AI LLM

Agentic AI-RAN: Enabling Intent-Driven, Explainable and Self-Evolving Open RAN Intelligence

Open RAN (O-RAN) exposes rich control and telemetry interfaces across the Non-RT RIC, Near-RT RIC, and distributed units, but also makes it harder to operate multi-tenant, multi-objective RANs in a...

Zhizhou He, Yang Luo, Xinkai Liu, Mahdi Boloursaz Mashhadi, Mohammad Shojafar, Merouane Debbah, R...

2602.24115 2026-02-27
TESTING

Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification

Vision-language models (VLMs) show promise in drafting radiology reports, yet they frequently suffer from logical inconsistencies, generating diagnostic impressions unsupported by their own percept...

Vikash Singh, Debargha Ganguly, Haotian Yu, Chengwei Zhou, Prerna Singh, Brandon Lee, Vipin Chaud...

2602.24111 2026-02-27
TESTING

Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the complex reasoning capabilities of Large Reasoning Models. However, standard outcome-based ...

Yanwei Ren, Haotian Zhang, Likang Xiao, Xikai Zhang, Jiaxing Huang, Jiayan Qiu, Baosheng Yu, Quan...

2602.24110 2026-02-27
AI LLM

ARGUS: Seeing the Influence of Narrative Features on Persuasion in Argumentative Texts

Can narratives make arguments more persuasive? And to this end, which narrative features matter most? Although stories are often seen as powerful tools for persuasion, their specific role in online...

Sara Nabhani, Federico Pianzola, Khalid Al-Khatib, Malvina Nissim

2602.24109 2026-02-27
TESTING

Context-Aware Functional Test Generation via Business Logic Extraction and Adaptation

Functional testing is essential for verifying that the business logic of mobile applications aligns with user requirements, serving as the primary methodology for quality assurance in software deve...

Yakun Zhang, Zihan Wang, Xinzhi Peng, Zihao Xie, Xiaodong Wang, Xutao Li, Dan Hao, Lu Zhang, Yunm...

2602.24108 2026-02-27
AI LLM

Artificial Agency Program: Curiosity, compression, and communication in agents

This paper presents the Artificial Agency Program (AAP), a position and research agenda for building AI systems as reality embedded, resource-bounded agents whose development is driven by curiosity...

Richard Csaky

2602.24100 2026-02-27
AI LLM

Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance

Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply ...

Yue Xie, Zizhen Xu, William Beazley, Fumiya Iida

2602.24097 2026-02-27
AI LLM

The impacts of artificial intelligence on environmental sustainability and human well-being

Artificial Intelligence (AI) is changing the world, but its impacts on the environment and human well-being remain uncertain. We conducted a systematic literature review of 1,291 studies selected f...

Noemi Luna Carmeno, Tiago Domingos, Daniel W. O'Neill

2602.24091 2026-02-27
AI LLM

Precision Studies and Searches for CP Asymmetries in the Inclusive Decay $Λ_{c}^{+}\to ΛX$

Based on $e^+e^-$ annihilation data collected with the BESIII detector at center-of-mass energies from 4.600 to 4.699 GeV, corresponding to an integrated luminosity of 4.5 fb$^{-1}$, we present the...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Alibert...

2602.24089 2026-02-27
AI LLM

Shaping the Digital Future of ErUM Research: Sustainability & Ethics

This workshop report from "Shaping the Digital Future of ErUM Research: Sustainability & Ethics" (Aachen, 2025) reviews progress on sustainability measures in data-intensive ErUM-Data research sinc...

Luca Di Bella, Jan Bürger, Markus Demleitner, Torsten Enßlin, Johannes Erdmann, Martin Erdmann, B...

2602.24087 2026-02-27
AI LLM

The Subjectivity of Monoculture

Machine learning models -- including large language models (LLMs) -- are often said to exhibit monoculture, where outputs agree strikingly often. But what does it actually mean for models to agree ...

Nathanael Jo, Nikhil Garg, Manish Raghavan

2602.24086 2026-02-27
AI LLM

Preference Packing: Efficient Preference Optimization for Large Language Models

Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. In particular, batch packing is commonly used ...

Jaekyung Cho

2602.24082 2026-02-27