Papers
Research papers from arXiv and related sources
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autores...
Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym A...
Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes
Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contribu...
Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Ronnie de Souza Santos, Roberto Tonelli, Gi...
Using Educational Comics in Physics Teaching for Chemistry and Biochemistry Students: Impact on Motivation and Domain-Specific Conceptual Gains
This study investigates the impact of educational comics as an active learning strategy in physics workshops for undergraduate students in Chemistry and Pharmacy and Biochemistry during the second ...
Mauricio Echiburu, Camilo Henriquez, Rodrigo Valdés, Cristobal Ríos
Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models
As large language models (LLMs) continue to advance, there is increasing interest in their ability to infer human mental states and demonstrate a human-like Theory of Mind (ToM). Most existing ToM ...
Siqi Liu, Xinyang Li, Bochao Zou, Junbao Zhuo, Huimin Ma, Jiansheng Chen
Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA
Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent fr...
John Ray B. Martinez
Robust synchrotron-based deep learning algorithm for intracochlear segmentation in clinical scans: development and international validation
Clinical imaging is routinely used for cochlear implant surgical planning yet lacks the resolution and contrast necessary to visualize the fine intracochlear structures critical for individualized ...
Ashley Micuda, Daniel Newsted, Nastaran Shakourifar, Sachin Pandey, Asma Alahmadi, Kevin D. Brown...
Conformalized Transfer Learning for Li-ion Battery State of Health Forecasting under Manufacturing and Usage Variability
Accurate forecasting of state-of-health (SOH) is essential for ensuring safe and reliable operation of lithium-ion cells. However, existing models calibrated on laboratory tests at specific conditi...
Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Marcello Canova
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can...
Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim, Jiwon Jeon, Dongsheng Li, Yuqin...
Counting Without Numbers \& Finding Without Words
Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but be...
Badri Narayana Patro
The VLT/ERIS grating vector Apodizing Phase Plate coronagraph
We describe the design, laboratory manufacture, and on-sky testing of the grating vector apodizing phase plate (gvAPP) coronagraph for the Enhanced Resolution Imager and Spectrograph (ERIS) on the ...
M. A. Kenworthy, F. A. Dannert, J. Hayoz, D. Doelman, B. J. Sutlieff, P. Liu, F. Snik, M. J. Bons...
Mechanic: Sorrifier-Driven Formal Decomposition Workflow for Automated Theorem Proving
Recent advances in large language models (LLMs) and LLM-based agents have substantially improved the capabilities of automated theorem proving. However, for problems requiring complex mathematical ...
Ruichen Qiu, Yichuan Cao, Junqi Liu, Dakai Guo, Xiao-Shan Gao, Lihong Zhi, Ruyong Feng
Study of Low-Frequency Core-Edge Coupling in a Tokamak: II. Spatial Channeling & Focusing In Antenna-Driven MHD
Motivated by evidence for core-edge coupling in the form of double-peaked fishbone-like low-frequency modes ($\lesssim 20\,{\rm kHz}$) in KSTAR, which exhibit synchronized Alfvénic activity both in...
Andreas Bierwage, Wonjun Lee, Young-chul Ghim, Panith Adulsiriswad, Nobuyuki Aiba, Seungmin Bong,...
Unleashing Vision-Language Semantics for Deepfake Video Detection
Recent Deepfake Video Detection (DFD) studies have demonstrated that pre-trained Vision-Language Models (VLMs) such as CLIP exhibit strong generalization capabilities in detecting artifacts across ...
Jiawen Zhu, Yunqi Miao, Xueyi Zhang, Jiankang Deng, Guansong Pang
Stable corrections for perturbed diagonally implicit Runge--Kutta methods
A mixed accuracy framework for Runge--Kutta methods presented in Grant [JSC 2022] and applied to diagonally implicit Runge--Kutta (DIRK) methods can significantly speed up the computation by replac...
John Driscoll, Sigal Gottlieb, Zachary J. Grant, César Herrera, Tej Sai Kakumanu, Michael H. Sawi...
Integrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice
Current clinical decision support systems (CDSSs) typically base their predictions on correlation, not causation. In recent years, causal machine learning (ML) has emerged as a promising way to imp...
Domenique Zipperling, Lukas Schmidt, Benedikt Hahn, Niklas Kühl, Steven Kimbrough
What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification
Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce no...
Massa Baali, Sarthak Bisht, Rita Singh, Bhiksha Raj
Learning Response-Statistic Shifts and Parametric Roll Episodes from Wave--Vessel Time Series via LSTM Functional Models
Parametric roll is a rare but high-consequence instability that can trigger abrupt regime changes in ship response, including pronounced shifts in roll statistics and tail risk. This paper develops...
Jose del Aguila Ferrandis
Iterate to Differentiate: Enhancing Discriminability and Reliability in Zero-Shot TTS Evaluation
Reliable evaluation of modern zero-shot text-to-speech (TTS) models remains challenging. Subjective tests are costly and hard to reproduce, while objective metrics often saturate, failing to distin...
Shengfan Shen, Di Wu, Xingchen Song, Dinghao Zhou, Liumeng Xue, Meng Meng, Jian Luan, Shuai Wang
OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework
Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization an...
Ben Chen, Siyuan Wang, Yufei Ma, Zihan Liang, Xuxin Zhang, Yue Lv, Ying Yang, Huangyu Dai, Lingta...
Reconstructing effective ultrasound transducer models via distributed source inversion
Accurate modeling of ultrasound wave propagation is essential for high-fidelity simulation and imaging in ultrasonic testing. A primary challenge lies in characterizing the excitation source, parti...
Tim Bürchner, Simon Schmid, Ernst Rank, Stefan Kollmannsberger, Andreas Fichtner