Papers
Research papers from arXiv and related sources
AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse
Building LLM-based agents has become increasingly important. Recent works on LLM-based agent self-evolution primarily record successful experiences as textual prompts or reflections, which cannot r...
Zhang Zhang, Shuqi Lu, Hongjin Qian, Di He, Zheng Liu
The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering
We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional train...
Yigit Ekin, Yossi Gandelsman
On min-Storey estimators for multiple testing and conformal novelty detection
In a multiple testing task, finding an appropriate estimator of the proportion $π_0$ of non-signal in the data to boost power of false discovery rate (FDR) controlling procedures is a long-standing...
Gao Zijun, Roquain Etienne
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution ...
Pepe Alonso
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
Media design layer generation enables the creation of fully editable, layered design documents such as posters, flyers, and logos using only natural language prompts. Existing methods either restri...
Vlad-Constantin Lungu-Stan, Ionut Mironica, Mariana-Iuliana Georgescu
The chemical DNA of the Magellanic Clouds VI. Origin and evolution of neutron-capture elements in the SMC
Context. In the context of galactic archaeology, the study of the Small Magellanic Cloud (SMC) is of crucial importance, as it represents a unique opportunity to study a nearby massive dwarf system...
Marco Palla, Alessio Mucciarelli, Donatella Romano, Samuele Anoardo, Francesca Matteucci
ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation
Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphol...
Argentina Anna Rescigno, Eva Vanmassenhove, Johanna Monti
Unified Policy Value Decomposition for Rapid Adaptation
Rapid adaptation in complex control systems remains a central challenge in reinforcement learning. We introduce a framework in which policy and value functions share a low-dimensional coefficient v...
Cristiano Capone, Luca Falorsi, Andrea Ciardiello, Luca Manneschi
ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws
In multilingual pretraining, the test loss of a pretrained model is heavily influenced by the proportion of each language in the pretraining data, namely the \textit{language mixture ratios}. Multi...
Xuyang Cao, Qianying Liu, Chuan Xiao, Yusuke Oda, Pontus Stenetorp, Daisuke Kawahara, Makoto Oniz...
Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing
Large language models (LLMs) exhibit latent multi-token prediction (MTP) capabilities despite being trained solely for next-token generation. We propose a simple, training-free MTP approach that pr...
Raghavv Goel, Mukul Gagrani, Mingu Lee, Chris Lott
State-dependent temperature control in Langevin diffusions using numerical exploratory Hamiltonian-Jacobi-Bellman equations
Choosing how much noise to add in Langevin dynamics is essential for making these algorithms effective in challenging optimization problems. One promising approach is to determine this noise by sol...
Taorui Wang, Xun Li, Gu Wang, Zhongqiang Zhang
Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning
The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal prov...
Jingchun Yang, Jinchang Zhang
A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans
Legal age estimation plays a critical role in forensic and medico-legal contexts, where decisions must be supported by accurate, robust, and reproducible methods with explicit uncertainty quantific...
Javier Venema, Stefano De Luca, Pablo Mesejo, Óscar Ibáñez
Multi-Armed Sequential Hypothesis Testing by Betting
We consider a variant of sequential testing by betting where, at each time step, the statistician is presented with multiple data sources (arms) and obtains data by choosing one of the arms. We con...
Ricardo J. Sandoval, Ian Waudby-Smith, Michael I. Jordan
Training Diffusion Language Models for Black-Box Optimization
We study offline black-box optimization (BBO), aiming to discover improved designs from an offline dataset of designs and labels, a problem common in robotics, DNA, and materials science with limit...
Zipeng Sun, Can Chen, Ye Yuan, Haolun Wu, Jiayao Gu, Christopher Pal, Xue Liu
Only relative ranks matter in weight-clustered large language models
Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is strong...
Borja Aizpurua, Sukhbinder Singh, Román Orús
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic...
Priyaranjan Pattnayak, Sanchari Chowdhuri
Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference
Collaborative inference of object classification Deep neural Networks (DNNs) where resource-constrained end-devices offload partially processed data to remote edge servers to complete end-to-end pr...
Shima Yousefi, Saptarshi Debroy
Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages
Understanding the distance between human languages is central to linguistics, anthropology, and tracing human evolutionary history. Yet, while linguistics has long provided rich qualitative account...
Yue Zhao, Jiatao Gu, Paloma Jeretič, Weijie Su
In Perfect Harmony: Orchestrating Causality in Actor-Based Systems
Runtime verification has gained popularity as a lightweight approach for increasing assurance in systems under scrutiny. Performing runtime checks enables dynamic monitoring and alerts for unexpect...
Vladyslav Mikytiv, Bernardo Toninho, Carla Ferreira