Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent researches leverage Large Language Models (LLMs) to automatically convert PyTorch refer...

Siqi Guo, Ming Lin, Tianbao Yang

2603.21465 2026-03-23
TESTING

Cross-Context Verification: Hierarchical Detection of Benchmark Contamination through Session-Isolated Analysis

LLM coding benchmarks face a credibility crisis: widespread solution leakage and test quality issues undermine SWE-bench Verified, while existing detection methods--paraphrase consistency, n-gram o...

Tae-Eun Song

2603.21454 2026-03-23
TESTING

Unified Sensitivity-Based Heuristic for Optimal Line Switching and Substation Reconfiguration

Optimal transmission switching (OTS) determines which transmission lines to remove from service to minimize dispatch costs. Unlike topology design, it alters the operational status of operating lin...

Zongqi Hu, Weiqi Meng, Bai Cui

2603.21429 2026-03-22
TESTING

Active-power control strategies in grid-forming power converters to improve transient stability in power systems with 100% converter-based generation

Grid-forming voltage source converters (GFM-VSCs) play a crucial role in the stability of power systems with large amounts of converter-based generation. Transient stability (angle stability under ...

Régulo E. Ávila-Martínez, Luis Rouco, Javier Renedo, Lukas Sigrist, Aurelio Garcia-Cerrada

2603.21428 2026-03-22
TESTING

Dynasto: Validity-Aware Dynamic-Static Parameter Optimization for Autonomous Driving Testing

Extensive simulation-based testing is important for assuring the safety of autonomous driving systems (ADS). However, generating safety-critical traffic scenarios remains challenging because failur...

Dmytro Humeniuk, Mohammad Hamdaqa, Houssem Ben Braiek, Amel Bennaceur, Foutse Khomh

2603.21427 2026-03-22
TESTING

Tiny but uniform improvements of adaptive BH procedures via compound e-values

After the seminal Benjamini-Hochberg (BH) procedure for controlling the false discovery rate (FDR) was proposed, dozens of papers have attempted to improve its power by adapting to the unknown prop...

Nikolaos Ignatiadis, Ruodu Wang, Aaditya Ramdas

2603.21424 2026-03-22
TESTING

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

Adversarial-example-based fingerprinting approaches, which leverage the decision boundary characteristics of deep neural networks (DNNs) to craft fingerprints, have proven effective for model owner...

Guang Yang, Ziye Geng, Yihang Chen, Changqing Luo

2603.21411 2026-03-22
TESTING

The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability

Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDP...

Anthony T. Nixon

2603.21399 2026-03-22
TESTING

A Constructive Approach to $q$-Gaussian Distributions: $α$-Divergence as Rate Function and Generalized de Moivre-Laplace Theorem

The Large Deviation Principle (LDP) and the Central Limit Theorem (CLT) are concepts of information theory and probability. While their formulations are established under the i.i.d. assumption, the...

Hiroki Suyari, Antonio M. Scarfone

2603.21391 2026-03-22
TESTING

TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference

Large language models run every token through every layer, regardless of difficulty. We present TIDE, a post-training system that attaches tiny learned routers at periodic checkpoint layers and, at...

Jaber Jaber, Osama Jaber

2603.21365 2026-03-22
TESTING

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet ...

Liang Ding

2603.21357 2026-03-22
TESTING

Probabilistic theories stable under teleportation

A long-standing problem in the foundations of quantum mechanics is to identify a physical principle that explains why algebraically maximal violations of Bell inequalities can generally not be achi...

Lionel J. Dmello, David Gross

2603.21347 2026-03-22
TESTING

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily translate multimodal understanding ...

Dongyoung Kim, Sumin Park, Woomin Song, Seungku Kim, Taeyoung Kim, Huiwon Jang, Jinwoo Shin, Jaeh...

2603.21341 2026-03-22
TESTING

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agen...

Jaber Jaber, Osama Jaber

2603.21331 2026-03-22
TESTING

COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs exce...

Xiaozhe Li, Tianyi Lyu, Siyi Yang, Yizhao Yang, Yuxi Gong, Jinxuan Huang, Ligao Zhang, Zhuoyi Hua...

2603.21329 2026-03-22
TESTING

Improving Coherence and Persistence in Agentic AI for System Optimization

Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While Large Language Models (LLMs) show ...

Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan

2603.21321 2026-03-22
TESTING

A Parametric, Geometry-Aware Residential Construction Cost Estimation Model for Ghana: Design, Validation, and the "Completeness Gap" in Informal Contractor Quotes

Ghana faces a residential housing deficit of two million units. A key driver of project failure is the "completeness gap", a systematic discrepancy between informal contractor quotes and actual cos...

Emmanuel Apaaboah, Bernard Opoku, the GhanaHousePlanner Research Team

2603.21314 2026-03-22
AI LLM

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However,...

Yuan Zhou, Yongzhi Li, Yanqi Dai, Xingyu Zhu, Yi Tan, Qingshan Xu, Beier Zhu, Richang Hong, Hanwa...

2603.20187 2026-03-20
TESTING

Prediction and Experimental Verification of Electrolyte Solvation Structure from an OMol25-Trained Interatomic Potential

Machine learning interatomic potentials (MLIPs) trained on large, chemically diverse datasets are revolutionizing computational chemistry, enabling molecular dynamics simulations of battery electro...

Nitesh Kumar, Jianwei Lai, Casey S. Mezerkor, Jiaqi Wang, Kamila M. Wiaderek, J. David Bazak, Sam...

2603.20183 2026-03-20
AI LLM

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Although robot-to-robot (R2R) communication improves indoor scene understanding beyond what a single robot can achieve, R2R alone cannot overcome partial observability without substantial explorati...

Fan Yang, Soumya Teotia, Shaunak A. Mehta, Prajit KrisshnaKumar, Quanting Xie, Jun Liu, Yueqi Son...

2603.20182 2026-03-20