Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse

Building LLM-based agents has become increasingly important. Recent works on LLM-based agent self-evolution primarily record successful experiences as textual prompts or reflections, which cannot r...

Zhang Zhang, Shuqi Lu, Hongjin Qian, Di He, Zheng Liu

2603.18000 2026-03-18
AI LLM

The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering

We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional train...

Yigit Ekin, Yossi Gandelsman

2603.17998 2026-03-18
TESTING

On min-Storey estimators for multiple testing and conformal novelty detection

In a multiple testing task, finding an appropriate estimator of the proportion $π_0$ of non-signal in the data to boost power of false discovery rate (FDR) controlling procedures is a long-standing...

Gao Zijun, Roquain Etienne

2603.17984 2026-03-18
AI LLM

TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution ...

Pepe Alonso

2603.17973 2026-03-18
AI LLM

LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition

Media design layer generation enables the creation of fully editable, layered design documents such as posters, flyers, and logos using only natural language prompts. Existing methods either restri...

Vlad-Constantin Lungu-Stan, Ionut Mironica, Mariana-Iuliana Georgescu

2603.17965 2026-03-18
TESTING

The chemical DNA of the Magellanic Clouds VI. Origin and evolution of neutron-capture elements in the SMC

Context. In the context of galactic archaeology, the study of the Small Magellanic Cloud (SMC) is of crucial importance, as it represents a unique opportunity to study a nearby massive dwarf system...

Marco Palla, Alessio Mucciarelli, Donatella Romano, Samuele Anoardo, Francesca Matteucci

2603.17963 2026-03-18
AI LLM

ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation

Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphol...

Argentina Anna Rescigno, Eva Vanmassenhove, Johanna Monti

2603.17962 2026-03-18
TESTING

Unified Policy Value Decomposition for Rapid Adaptation

Rapid adaptation in complex control systems remains a central challenge in reinforcement learning. We introduce a framework in which policy and value functions share a low-dimensional coefficient v...

Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi

2603.17947 2026-03-18
TESTING

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multi...

Xuyang Cao, Qianying Liu, Chuan Xiao, Yusuke Oda, Pontus Stenetorp, Daisuke Kawahara, Makoto Oniz...

2603.17945 2026-03-18
AI LLM

Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing

Large language models (LLMs) exhibit latent multi-token prediction (MTP) capabilities despite being trained solely for next-token generation. We propose a simple, training-free MTP approach that pr...

Raghavv Goel, Mukul Gagrani, Mingu Lee, Chris Lott

2603.17942 2026-03-18
TESTING

State-dependent temperature control in Langevin diffusions using numerical exploratory Hamiltonian-Jacobi-Bellman equations

Choosing how much noise to add in Langevin dynamics is essential for making these algorithms effective in challenging optimization problems. One promising approach is to determine this noise by sol...

Taorui Wang, Xun Li, Gu Wang, Zhongqiang Zhang

2603.17934 2026-03-18
AI LLM

Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning

The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal prov...

Jingchun Yang, Jinchang Zhang

2603.17930 2026-03-18
AI LLM

A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans

Legal age estimation plays a critical role in forensic and medico-legal contexts, where decisions must be supported by accurate, robust, and reproducible methods with explicit uncertainty quantific...

Javier Venema, Stefano De Luca, Pablo Mesejo, Óscar Ibáñez

2603.17926 2026-03-18
TESTING

Multi-Armed Sequential Hypothesis Testing by Betting

We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We con...

Ricardo J. Sandoval, Ian Waudby-Smith, Michael I. Jordan

2603.17925 2026-03-18
AI LLM

Training Diffusion Language Models for Black-Box Optimization

We study offline black-box optimization (BBO), aiming to discover improved designs from an offline dataset of designs and labels, a problem common in robotics, DNA, and materials science with limit...

Zipeng Sun, Can Chen, Ye Yuan, Haolun Wu, Jiayao Gu, Christopher Pal, Xue Liu

2603.17919 2026-03-18
AI LLM

Only relative ranks matter in weight-clustered large language models

Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is strong...

Borja Aizpurua, Sukhbinder Singh, Román Orús

2603.17917 2026-03-18
AI LLM

IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic...

Priyaranjan Pattnayak, Sanchari Chowdhuri

2603.17915 2026-03-18
AI LLM

Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference

Collaborative inference of object classification Deep neural Networks (DNNs) where resource-constrained end-devices offload partially processed data to remote edge servers to complete end-to-end pr...

Shima Yousefi, Saptarshi Debroy

2603.17914 2026-03-18
AI LLM

Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages

Understanding the distance between human languages is central to linguistics, anthropology, and tracing human evolutionary history. Yet, while linguistics has long provided rich qualitative account...

Yue Zhao, Jiatao Gu, Paloma Jeretič, Weijie Su

2603.17912 2026-03-18
TESTING

In Perfect Harmony: Orchestrating Causality in Actor-Based Systems

Runtime verification has gained popularity as a lightweight approach for increasing assurance in systems under scrutiny. Performing runtime checks enables dynamic monitoring and alerts for unexpect...

Vladyslav Mikytiv, Bernardo Toninho, Carla Ferreira

2603.17909 2026-03-18