Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Eval4Sim: An Evaluation Framework for Persona Simulation

Large Language Model (LLM) personas with explicit specifications of attributes, background, and behavioural tendencies are increasingly used to simulate human conversations for tasks such as user m...

Eliseo Bao, Anxo Perez, Xi Wang, Javier Parapar

2603.02876 2026-03-03
AI LLM

LaTeX Compilation: Challenges in the Era of LLMs

As large language models (LLMs) increasingly assist scientific writing, limitations and the significant token cost of TeX become more and more visible. This paper analyzes TeX's fundamental defects...

Tianyou Liu, Ziqiang Li, Yansong Li, Xurui Liu

2603.02873 2026-03-03
TESTING

Floating-point consistent cross-verification methodology for reproducible and interoperable DDA solvers with fair benchmarking

The discrete dipole approximation (DDA) is a widely used and versatile numerical method for solving electromagnetic scattering by arbitrarily shaped objects. Despite its popularity, quantitative co...

Clément Argentin, Patrick C. Chaumet, Michel Gross, Maxim A. Yurkin

2603.02871 2026-03-03
AI LLM

LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Large Language Models (LLMs) achieve strong performance in analyzing and generating text, yet they struggle with explicit, transparent, and verifiable reasoning over complex texts such as those con...

Gianvincenzo Alfano, Sergio Greco, Lucio La Cava, Stefano Francesco Monea, Irina Trubitsyna

2603.02858 2026-03-03
TESTING

Two-phase stratified MHD flows in rectangular ducts

The characteristics of two-phase stratified magnetohydrodynamic (MHD) flow in horizontal rectangular ducts are investigated for a system consisting of a conductive liquid and a non-conductive gas. ...

Subham Pal, Ilya Barmak, Arseniy Parfenov, Alexander Gelfgat, Neima Brauner

2603.02853 2026-03-03
TESTING

SPARC: Spatial-Aware Path Planning via Attentive Robot Communication

Efficient communication is critical for decentralized Multi-Robot Path Planning (MRPP), yet existing learned communication methods treat all neighboring robots equally regardless of their spatial p...

Sayang Mu, Xiangyu Wu, Bo An

2603.02845 2026-03-03
TESTING

Scale-invariant Gaussian derivative residual networks

Generalisation across image scales remains a fundamental challenge for deep networks, which often fail to handle images at scales not seen during training (the out-of-distribution problem). In this...

Andrzej Perzanowski, Tony Lindeberg

2603.02843 2026-03-03
AI LLM

A Browser-based Open Source Assistant for Multimodal Content Verification

Disinformation and false content produced by generative AI pose a significant challenge for journalists and fact-checkers who must rapidly verify digital media information. While there is an abunda...

Rosanna Milner, Michael Foster, Olesya Razuvayevskaya, Ian Roberts, Valentin Porcellini, Denis Te...

2603.02842 2026-03-03
AI LLM

Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs

Predicting future student responses to questions is particularly valuable for educational learning platforms where it enables effective interventions. One of the key approaches to do this has been ...

Prarthana Bhattacharyya, Joshua Mitton, Ralph Abboud, Simon Woodhead

2603.02830 2026-03-03
AI LLM

Toward Early Quality Assessment of Text-to-Image Diffusion Models

Recent text-to-image (T2I) diffusion and flow-matching models can produce highly realistic images from natural language prompts. In practical scenarios, T2I systems are often run in a ``generate--t...

Huanlei Guo, Hongxin Wei, Bingyi Jing

2603.02829 2026-03-03
TESTING

Grounded String Representations of Series-Parallel Graphs without Transitive Edges

In a {\em grounded string representation} of a graph there is a horizontal line $\ell$ and each vertex is represented as a simple curve below $\ell$ with one end point on $\ell$ such that two curve...

Sabine Cornelsen, Jan Kratochvíl, Miriam Münch, Giacomo Ortali, Alexandra Weinberger, Alexander W...

2603.02827 2026-03-03
TESTING

Charging power enhancement at the phase transition of a non-integrable quantum battery

Exploiting many-body interaction and critical phenomena to improve the performance of quantum batteries is an emerging and promising line of research. A central question in this direction is whethe...

D. Farina, M. Sassetti, V. Cataudella, D. Ferraro, N. Traverso Ziani

2603.02819 2026-03-03
TESTING

Merged amplitude encoding for Chebyshev quantum Kolmogorov--Arnold networks: trading qubits for circuit executions

Quantum Kolmogorov--Arnold networks based on Chebyshev polynomials (CCQKAN) evaluate each edge activation function as a quantum inner product, creating a trade-off between qubit count and the numbe...

Hikaru Wakaura

2603.02818 2026-03-03
TESTING

Single-star optical turbulence profiling techniques for the SHIMM and other Shack-Hartmann instruments

Atmospheric optical turbulence (OT) monitoring is crucial for site characterisation at astronomical observatories and optical communications ground stations. The Shack-Hartmann Image Motion Monitor...

Ryan Griffiths, Timothy Butterley, Richard Wilson, James Osborn

2603.02817 2026-03-03
AI LLM

BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seaml...

Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu

2603.02816 2026-03-03
AI LLM

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark focused on understanding go...

Dhanya E, Ankita Meena, Manas Nanivadekar, Noumida A, Victor Azad, Ashwini Nagaraj Shenoy, Pratik...

2603.02813 2026-03-03
TESTING

The Price of Robustness: Stable Classifiers Need Overparameterization

The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a gener...

Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok

2603.02806 2026-03-03
AI LLM

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

As LLM-powered agents have been used for high-stakes decision-making, such as clinical diagnosis, it becomes critical to develop reliable verification of their decisions to facilitate trustworthy d...

Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu, Mihaela van de Schaar

2603.02798 2026-03-03
AI LLM

From Heuristic Selection to Automated Algorithm Design: LLMs Benefit from Strong Priors

Large Language Models (LLMs) have already been widely adopted for automated algorithm design, demonstrating strong abilities in generating and evolving algorithms across various fields. Existing wo...

Qi Huang, Furong Ye, Ananta Shahane, Thomas Bäck, Niki van Stein

2603.02792 2026-03-03
TESTING

Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language

Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation t...

Michelle Stegeman, Lena Philipp, Fennie van der Graaf, Marina D'Amato, Clément Grisi, Luc Builtje...

2603.02790 2026-03-03