Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autores...

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym A...

2603.24511 2026-03-25
TESTING

Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contribu...

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Ronnie de Souza Santos, Roberto Tonelli, Gi...

2603.24501 2026-03-25
TESTING

Using Educational Comics in Physics Teaching for Chemistry and Biochemistry Students: Impact on Motivation and Domain-Specific Conceptual Gains

This study investigates the impact of educational comics as an active learning strategy in physics workshops for undergraduate students in Chemistry and Pharmacy and Biochemistry during the second ...

Mauricio Echiburu, Camilo Henriquez, Rodrigo Valdés, Cristobal Ríos

2603.24498 2026-03-25
AI LLM

Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models

As large language models (LLMs) continue to advance, there is increasing interest in their ability to infer human mental states and demonstrate a human-like Theory of Mind (ToM). Most existing ToM ...

Siqi Liu, Xinyang Li, Bochao Zou, Junbao Zhuo, Huimin Ma, Jiansheng Chen

2603.24484 2026-03-25
AI LLM

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent fr...

John Ray B. Martinez

2603.24481 2026-03-25
TESTING

Robust synchrotron-based deep learning algorithm for intracochlear segmentation in clinical scans: development and international validation

Clinical imaging is routinely used for cochlear implant surgical planning yet lacks the resolution and contrast necessary to visualize the fine intracochlear structures critical for individualized ...

Ashley Micuda, Daniel Newsted, Nastaran Shakourifar, Sachin Pandey, Asma Alahmadi, Kevin D. Brown...

2603.24476 2026-03-25
TESTING

Conformalized Transfer Learning for Li-ion Battery State of Health Forecasting under Manufacturing and Usage Variability

Accurate forecasting of state-of-health (SOH) is essential for ensuring safe and reliable operation of lithium-ion cells. However, existing models calibrated on laboratory tests at specific conditi...

Samuel Filgueira da Silva, Mehmet Fatih Ozkan, Faissal El Idrissi, Marcello Canova

2603.24475 2026-03-25
AI LLM

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can...

Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim, Jiwon Jeon, Dongsheng Li, Yuqin...

2603.24472 2026-03-25
AI LLM

Counting Without Numbers \& Finding Without Words

Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but be...

Badri Narayana Patro

2603.24470 2026-03-25
TESTING

The VLT/ERIS grating vector Apodizing Phase Plate coronagraph

We describe the design, laboratory manufacture, and on-sky testing of the grating vector apodizing phase plate (gvAPP) coronagraph for the Enhanced Resolution Imager and Spectrograph (ERIS) on the ...

M. A. Kenworthy, F. A. Dannert, J. Hayoz, D. Doelman, B. J. Sutlieff, P. Liu, F. Snik, M. J. Bons...

2603.24469 2026-03-25
AI LLM

Mechanic: Sorrifier-Driven Formal Decomposition Workflow for Automated Theorem Proving

Recent advances in large language models (LLMs) and LLM-based agents have substantially improved the capabilities of automated theorem proving. However, for problems requiring complex mathematical ...

Ruichen Qiu, Yichuan Cao, Junqi Liu, Dakai Guo, Xiao-Shan Gao, Lihong Zhi, Ruyong Feng

2603.24465 2026-03-25
TESTING

Study of Low-Frequency Core-Edge Coupling in a Tokamak: II. Spatial Channeling & Focusing In Antenna-Driven MHD

Motivated by evidence for core-edge coupling in the form of double-peaked fishbone-like low-frequency modes ($\lesssim 20\,{\rm kHz}$) in KSTAR, which exhibit synchronized Alfvénic activity both in...

Andreas Bierwage, Wonjun Lee, Young-chul Ghim, Panith Adulsiriswad, Nobuyuki Aiba, Seungmin Bong,...

2603.24463 2026-03-25
AI LLM

Unleashing Vision-Language Semantics for Deepfake Video Detection

Recent Deepfake Video Detection (DFD) studies have demonstrated that pre-trained Vision-Language Models (VLMs) such as CLIP exhibit strong generalization capabilities in detecting artifacts across ...

Jiawen Zhu, Yunqi Miao, Xueyi Zhang, Jiankang Deng, Guansong Pang

2603.24454 2026-03-25
TESTING

Stable corrections for perturbed diagonally implicit Runge--Kutta methods

A mixed accuracy framework for Runge--Kutta methods presented in Grant [JSC 2022] and applied to diagonally implicit Runge--Kutta (DIRK) methods can significantly speed up the computation by replac...

John Driscoll, Sigal Gottlieb, Zachary J. Grant, César Herrera, Tej Sai Kakumanu, Michael H. Sawi...

2603.24451 2026-03-25
AI LLM

Integrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice

Current clinical decision support systems (CDSSs) typically base their predictions on correlation, not causation. In recent years, causal machine learning (ML) has emerged as a promising way to imp...

Domenique Zipperling, Lukas Schmidt, Benedikt Hahn, Niklas Kühl, Steven Kimbrough

2603.24448 2026-03-25
TESTING

What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification

Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce no...

Massa Baali, Sarthak Bisht, Rita Singh, Bhiksha Raj

2603.24432 2026-03-25
TESTING

Learning Response-Statistic Shifts and Parametric Roll Episodes from Wave--Vessel Time Series via LSTM Functional Models

Parametric roll is a rare but high-consequence instability that can trigger abrupt regime changes in ship response, including pronounced shifts in roll statistics and tail risk. This paper develops...

Jose del Aguila Ferrandis

2603.24431 2026-03-25
TESTING

Iterate to Differentiate: Enhancing Discriminability and Reliability in Zero-Shot TTS Evaluation

Reliable evaluation of modern zero-shot text-to-speech (TTS) models remains challenging. Subjective tests are costly and hard to reproduce, while objective metrics often saturate, failing to distin...

Shengfan Shen, Di Wu, Xingchen Song, Dinghao Zhou, Liumeng Xue, Meng Meng, Jian Luan, Shuai Wang

2603.24430 2026-03-25
TESTING

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization an...

Ben Chen, Siyuan Wang, Yufei Ma, Zihan Liang, Xuxin Zhang, Yue Lv, Ying Yang, Huangyu Dai, Lingta...

2603.24422 2026-03-25
TESTING

Reconstructing effective ultrasound transducer models via distributed source inversion

Accurate modeling of ultrasound wave propagation is essential for high-fidelity simulation and imaging in ultrasonic testing. A primary challenge lies in characterizing the excitation source, parti...

Tim Bürchner, Simon Schmid, Ernst Rank, Stefan Kollmannsberger, Andreas Fichtner

2603.24415 2026-03-25