Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for under...

Isabelle Augenstein

2603.09654 2026-03-10
AI LLM

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

With the rapid advancement of Large Language Models (LLMs) in code generation, human-AI interaction is evolving from static text responses to dynamic, interactive HTML-based applications, which we ...

Zuhao Zhang, Chengyue Yu, Yuante Li, Chenyi Zhuang, Linjian Mo, Shuai Li

2603.09652 2026-03-10
AI LLM

MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat driven agents, these frameworks do not expose the persona of user to the agent, thus operating in a user agnos...

Anupam Purwar, Aditya Choudhary

2603.09643 2026-03-10
AI LLM

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

LLM agents that store knowledge as natural language suffer steep retrieval degradation as condition count grows, often struggle to compose learned rules reliably, and typically lack explicit mechan...

Arash Shahmansoori

2603.09641 2026-03-10
AI LLM

Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models

Radiology reports capture crucial longitudinal information on tumor burden, treatment response, and disease progression, yet their unstructured narrative format complicates automated analysis. Whil...

Luc Builtjes, Alessa Hering

2603.09638 2026-03-10
AI LLM

X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models

3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods are isol...

Yueen Ma, Irwin King

2603.09632 2026-03-10
TESTING

On the last time and the number of times an estimator is more than epsilon from its target value

Suppose $\widehatθ_n$ is a strongly consistent estimator for $θ_0$ in some i.i.d. situation. Let $N_\varepsilon$ and $Q_\varepsilon$ be respectively the last $n$ and the total number of $n$ for whi...

Nils Lid Hjort, Grete Fenstad

2603.09629 2026-03-10
TESTING

The Architecture of Inter-Level Representation

Inter-level connections in science routinely require constructs that neither of the connected theories contains. Statistical mechanics requires assumptions such as the Stosszahlansatz to generate t...

Harry Sticker

2603.09626 2026-03-10
AI LLM

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

As artificial intelligence (AI) systems evolve from stateless chatbots to autonomous multi-step agents, prompt engineering (PE), the discipline of crafting individual queries, proves necessary but ...

Vera V. Vishnyakova

2603.09619 2026-03-10
AI LLM

A saccade-inspired approach to image classification using visiontransformer attention maps

Human vision achieves remarkable perceptual performance while operating under strict metabolic constraints. A key ingredient is the selective attention mechanism, driven by rapid saccadic eye movem...

Matthis Dallain, Laurent Rodriguez, Laurent Udo Perrinet, Benoît Miramond

2603.09613 2026-03-10
TESTING

Analytic treatment of a polaron in a nonparabolic conduction band

We develop and compare several analytical approximations for the polaron problem in finite-width, non-parabolic conduction bands. The main focus of the work is an extension of the Feynman variation...

S. N. Klimin, J. Tempere, M. Houtput, I. Zappacosta, S. Ragni, T. Hahn, L. Celiberti, C. Franchin...

2603.09609 2026-03-10
AI LLM

A Variational Latent Equilibrium for Learning in Cortex

Brains remain unrivaled in their ability to recognize and generate complex spatiotemporal patterns. While AI is able to reproduce some of these capabilities, deep learning algorithms remain largely...

Simon Brandt, Paul Haider, Walter Senn, Federico Benitez, Mihai A. Petrovici

2603.09600 2026-03-10
AI LLM

Preparing Students for AI-Driven Agile Development: A Project-Based AI Engineering Curriculum

Generative AI and agentic tools are reshaping agile software development, yet many engineering curricula still teach agile methods and AI competencies separately and largely lecture-based. This pap...

Andreas Rausch, Stefan Wittek, Tobias Geger, David Inkermann

2603.09599 2026-03-10
TESTING

Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models

Political scientists increasingly face a consequential choice when adopting natural language processing tools: build a domain-specific model from scratch, borrow and adapt an existing one, or simpl...

Shreyas Meher

2603.09595 2026-03-10
TESTING

Benchmarking Dataset for Presence-Only Passive Reconnaissance in Wireless Smart-Grid Communications

Benchmarking presence-only passive reconnaissance in smart-grid communications is challenging because the adversary is receive-only, yet nearby observers can still alter propagation through additio...

Bochra Al Agha, Razane Tajeddine

2603.09590 2026-03-10
AI LLM

Routing without Forgetting

Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Altho...

Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto Spampinato

2603.09576 2026-03-10
TESTING

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Distilling humanoid locomotion control from offline datasets into deployable policies remains a challenge, as existing methods rely on privileged full-body states that require complex and often unr...

Milo Carroll, Tianhu Peng, Lingfan Bao, Chengxu Zhou, Zhibin Li

2603.09574 2026-03-10
TESTING

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized d...

Lionel Yelibi

2603.09564 2026-03-10
AI LLM

ALARM: Audio-Language Alignment for Reasoning Models

Large audio language models (ALMs) extend LLMs with auditory understanding. A common approach freezes the LLM and trains only an adapter on self-generated targets. However, this fails for reasoning...

Petr Grinberg, Hassan Shahmohammadi

2603.09556 2026-03-10
TESTING

GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

While Vision-Language Models (VLMs) have significantly advanced remote sensing interpretation, enabling them to perform complex, step-by-step reasoning remains highly challenging. Recent efforts to...

Lang Sun, Ronghao Fu, Zhuoran Duan, Haoran Liu, Xueyan Liu, Bo Yang

2603.09551 2026-03-10