Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

FAMOSE: A ReAct Approach to Automated Feature Discovery

Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space tradi...

Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li

2602.17641 2026-02-19
AI LLM

When to Trust the Cheap Check: Weak and Strong Verification for Reasoning

Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Extern...

Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani

2602.17633 2026-02-19
AI LLM

Unmasking the Factual-Conceptual Gap in Persian Language Models

While emerging Persian NLP benchmarks have expanded into pragmatics and politeness, they rarely distinguish between memorized cultural facts and the ability to reason about implicit social norms. W...

Alireza Sakhaeirad, Ali Ma'manpoosh, Arshia Hemmat

2602.17623 2026-02-19
AI LLM

What Makes a Good LLM Agent for Real-world Penetration Testing?

LLM-based agents show promise for automating penetration testing, yet reported performance varies widely across systems and benchmarks. We analyze 28 LLM-based penetration testing systems and evalu...

Gelei Deng, Yi Liu, Yuekang Li, Ruozhao Yang, Xiaofei Xie, Jie Zhang, Han Qiu, Tianwei Zhang

2602.17622 2026-02-19
AI LLM

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput. However, for w...

Luke Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han

2602.17616 2026-02-19
AI LLM

Exploring Novel Data Storage Approaches for Large-Scale Numerical Weather Prediction

Driven by scientific and industry ambition, HPC and AI applications such as operational Numerical Weather Prediction (NWP) require processing and storing ever-increasing data volumes as fast as pos...

Nicolau Manubens Gil

2602.17610 2026-02-19
AI LLM

Towards Anytime-Valid Statistical Watermarking

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promisi...

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan

2602.17608 2026-02-19
AI LLM

AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-base...

Jianda Du, Youran Sun, Haizhao Yang

2602.17607 2026-02-19
TESTING

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

In many real-world settings, such as environmental monitoring, disaster response, or public health, with costly and difficult data collection and dynamic environments, strategically sampling from u...

Jowaria Khan, Anindya Sarkar, Yevgeniy Vorobeychik, Elizabeth Bondi-Kelly

2602.17605 2026-02-19
AI LLM

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the ...

Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui ...

2602.17602 2026-02-19
AI LLM

Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment

Music generation has advanced markedly through multimodal deep learning, enabling models to synthesize audio from text and, more recently, from images. However, existing image-conditioned systems s...

Ivan Rinaldi, Matteo Mendula, Nicola Fanelli, Florence Levé, Matteo Testi, Giovanna Castellano, G...

2602.17599 2026-02-19
AI LLM

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Current speech LLMs largely perform implicit ASR: on tasks solvable from a transcript, they are behaviorally and mechanistically equivalent to simple Whisper$\to$LLM cascades. We show this through ...

Jayadev Billa

2602.17598 2026-02-19
TESTING

Stochastic galactic supernova flux of semi-relativistic particles

New exotic particles with MeV masses, such as axion-like particles or light dark matter, can be emitted from core-collapse supernovae (SNe) with semi-relativistic velocities. Due to their speed dis...

David Alonso-González, David Cerdeño, Marina Cermeño, Andres D. Perez

2602.17597 2026-02-19
TESTING

Asymptotic Smoothing of the Lipschitz Loss Landscape in Overparameterized One-Hidden-Layer ReLU Networks

We study the topology of the loss landscape of one-hidden-layer ReLU networks under overparameterization. On the theory side, we (i) prove that for convex $L$-Lipschitz losses with an $\ell_1$-regu...

Saveliy Baturin

2602.17596 2026-02-19
AI LLM

AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Con...

Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L....

2602.17594 2026-02-19
TESTING

BMW: Bayesian Model-Assisted Adaptive Phase II Clinical Trial Design for Win Ratio Statistic

The win ratio (WR) statistic is increasingly used to evaluate treatment effects based on prioritized composite endpoints, yet existing Bayesian adaptive designs are not directly applicable because ...

Di Zhu, Yong Zang

2602.17592 2026-02-19
TESTING

BMC4TimeSec: Verification Of Timed Security Protocols

We present BMC4TimeSec, an end-to-end tool for verifying Timed Security Protocols (TSP) based on SMT-based bounded model checking and multi-agent modelling in the form of Timed Interpreted Systems ...

Agnieszka M. Zbrzezny

2602.17590 2026-02-19
TESTING

Asymptotically Optimal Sequential Testing with Markovian Data

We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set...

Alhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal, Debabrota Basu, P. N. Karthik

2602.17587 2026-02-19
AI LLM

Building an AI-native Research Ecosystem for Experimental Particle Physics: A Community Vision

Experimental particle physics seeks to understand the universe by probing its fundamental particles and forces and exploring how they govern the large-scale processes that shape cosmic evolution. T...

Thea Klaeboe Aarrestad, Alaa Abdelhamid, Haider Abidi, Jahred Adelman, Jennifer Adelman-McCarthy,...

2602.17582 2026-02-19
TESTING

Optimal control of stochastic Volterra integral equations with completely monotone kernels and stochastic differential equations on Hilbert spaces with unbounded control and diffusion operators

The dynamic programming approach is one of the most powerful ones in optimal control. However, when dealing with optimal control problems of stochastic Volterra integral equations (SVIEs) with comp...

Gabriele Bolli, Filippo de Feo

2602.17578 2026-02-19