Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Why AI systems don't learn and what to do about it: Lessons on autonomous learning from cognitive science

We critically examine the limitations of current AI models in achieving autonomous learning and propose a learning architecture inspired by human and animal cognition. The proposed framework integr...

Emmanuel Dupoux, Yann LeCun, Jitendra Malik

2603.15381 2026-03-16
AI LLM

More Test-Time Compute Can Hurt: Overestimation Bias in LLM Beam Search

Wider beam search should improve LLM reasoning, but when should you stop widening? Prior work on beam width selection has focused on inference efficiency \citep{qin2025dsbd, freitag2017beam}, witho...

Gal Dalal, Assaf Hallak, Gal Chechik, Yftach Ziser

2603.15377 2026-03-16
AI LLM

Formalizing and validating properties in Asmeta with Large Language Models (Extended Abstract)

Writing temporal logic properties is often a challenging task for users of model-based development frameworks, particularly when translating informal requirements into formal specifications. In thi...

Andrea Bombarda, Silvia Bonfanti, Angelo Gargantini, Nico Pellegrinelli

2603.15375 2026-03-16
AI LLM

GradCFA: A Hybrid Gradient-Based Counterfactual and Feature Attribution Explanation Algorithm for Local Interpretation of Neural Networks

Explainable Artificial Intelligence (XAI) is increasingly essential as AI systems are deployed in critical fields such as healthcare and finance, offering transparency into AI-driven decisions. Two...

Jacob Sanderson, Hua Mao, Wai Lok Woo

2603.15373 2026-03-16
AI LLM

SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

As telecommunications operators accelerate adoption of AI-enabled automation, a practical question remains unresolved: can general-purpose large language model (LLM) agents reliably execute telecom...

Ivo Brett

2603.15372 2026-03-16
AI LLM

Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of language tasks, yet complex multi-step reasoning remains a fundamental challenge. While Large Reasoning...

Guangfu Hao, Yuming Dai, Xianzhe Qin, Shan Yu

2603.15371 2026-03-16
AI LLM

CRASH: Cognitive Reasoning Agent for Safety Hazards in Autonomous Driving

As AVs grow in complexity and diversity, identifying the root causes of operational failures has become increasingly complex. The heterogeneity of system architectures across manufacturers, ranging...

Erick Silva, Rehana Yasmin, Ali Shoker

2603.15364 2026-03-16
AI LLM

PMAx: An Agentic Framework for AI-Driven Process Mining

Process mining provides powerful insights into organizational workflows, but extracting these insights typically requires expertise in specialized query languages and data science tools. Large Lang...

Anton Antonov, Humam Kourani, Alessandro Berti, Gyunam Park, Wil M. P. van der Aalst

2603.15351 2026-03-16
AI LLM

Intelligent Co-Design: An Interactive LLM Framework for Interior Spatial Design via Multi-Modal Agents

In architectural interior design, miscommunication frequently arises as clients lack design knowledge, while designers struggle to explain complex spatial relationships, leading to delayed timeline...

Ren Jian Lim, Rushi Dai

2603.15341 2026-03-16
AI LLM

The Neuroscience of Transformers

Neuroscience has long informed the development of artificial neural networks, but the success of modern architectures invites, in turn, the converse: can modern networks teach us lessons about brai...

Peter Koenig, Mario Negrello

2603.15339 2026-03-16
AI LLM

PYTHEN: A Flexible Framework for Legal Reasoning in Python

This paper introduces PYTHEN, a novel Python-based framework for defeasible legal reasoning. PYTHEN is designed to model the inherently defeasible nature of legal argumentation, providing a flexibl...

Ha-Thanh Nguyen, Ken Satoh

2603.15317 2026-03-16
AI LLM

CCTU: A Benchmark for Tool Use under Complex Constraints

Solving problems through tool use under explicit constraints constitutes a highly challenging yet unavoidable scenario for large language models (LLMs), requiring capabilities such as function call...

Junjie Ye, Guoqiang Zhang, Wenjie Fu, Tao Gui, Qi Zhang, Xuanjing Huang

2603.15309 2026-03-16
AI LLM

The Impact of AI-Assisted Development on Software Security: A Study of Gemini and Developer Experience

The ongoing shortage of skilled developers, particularly in security-critical software development, has led organizations to increasingly adopt AI-powered development tools to boost productivity an...

Nadine Jost, Benjamin Berens, Manuel Karl, Stefan Albert Horstmann, Martin Johns, Alena Naiakshina

2603.15298 2026-03-16
AI LLM

Evolutionary Transfer Learning for Dragonchess

Dragonchess, a three-dimensional chess variant introduced by Gary Gygax, presents unique strategic and computational challenges that make it an ideal environment for studying the transfer of artifi...

Jim O'Connor, Annika Hoag, Sarah Goyette, Gary B. Parker

2603.15297 2026-03-16
AI LLM

Datasets for Verb Alternations across Languages: BLM Templates and Data Augmentation Strategies

Large language models (LLMs) have shown remarkable performance across various sentence-based linguistic phenomena, yet their ability to capture cross-sentence paradigmatic patterns, such as verb al...

Giuseppe Samo, Paola Merlo

2603.15295 2026-03-16
AI LLM

From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods demonstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for...

Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Kun Zhang, S. Kevin Zhou

2603.15270 2026-03-16
AI LLM

Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search

Modern e-commerce search is evolving to resolve complex user intents. While Large Language Models (LLMs) offer strong reasoning, existing LLM-based paradigms face a fundamental blindness-latency di...

Mengxiang Chen, Zhouwei Zhai, Jin Li

2603.15262 2026-03-16
AI LLM

Directional Embedding Smoothing for Robust Vision Language Models

The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain vulnerable to jailbreaking attacks that undermine th...

Ye Wang, Jing Liu, Toshiaki Koike-Akino

2603.15259 2026-03-16
AI LLM

SAGE: Multi-Agent Self-Evolution for LLM Reasoning

Reinforcement learning with verifiable rewards improves reasoning in large language models (LLMs), but many methods still rely on large human-labeled datasets. While self-play reduces this dependen...

Yulin Peng, Xinxin Zhu, Chenxing Wei, Nianbo Zeng, Leilei Wang, Ying Tiffany He, F. Richard Yu

2603.15255 2026-03-16
AI LLM

Mechanistic Foundations of Goal-Directed Control

Mechanistic interpretability has transformed the analysis of transformer circuits by decomposing model behavior into competing algorithms, identifying phase transitions during training, and derivin...

Alma Lago

2603.15248 2026-03-16