Personal Assistant Web

TESTING

DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation

Developing efficient CUDA kernels is a fundamental yet challenging task in the generative AI industry. Recent researches leverage Large Language Models (LLMs) to automatically convert PyTorch refer...

Siqi Guo, Ming Lin, Tianbao Yang

2603.21465 • 2026-03-23

View PDF

TESTING

Cross-Context Verification: Hierarchical Detection of Benchmark Contamination through Session-Isolated Analysis

LLM coding benchmarks face a credibility crisis: widespread solution leakage and test quality issues undermine SWE-bench Verified, while existing detection methods--paraphrase consistency, n-gram o...

Tae-Eun Song

2603.21454 • 2026-03-23

View PDF

TESTING

Unified Sensitivity-Based Heuristic for Optimal Line Switching and Substation Reconfiguration

Optimal transmission switching (OTS) determines which transmission lines to remove from service to minimize dispatch costs. Unlike topology design, it alters the operational status of operating lin...

Zongqi Hu, Weiqi Meng, Bai Cui

2603.21429 • 2026-03-22

View PDF

TESTING

Active-power control strategies in grid-forming power converters to improve transient stability in power systems with 100% converter-based generation

Grid-forming voltage source converters (GFM-VSCs) play a crucial role in the stability of power systems with large amounts of converter-based generation. Transient stability (angle stability under ...

Régulo E. Ávila-Martínez, Luis Rouco, Javier Renedo, Lukas Sigrist, Aurelio Garcia-Cerrada

2603.21428 • 2026-03-22

View PDF

TESTING

Dynasto: Validity-Aware Dynamic-Static Parameter Optimization for Autonomous Driving Testing

Extensive simulation-based testing is important for assuring the safety of autonomous driving systems (ADS). However, generating safety-critical traffic scenarios remains challenging because failur...

Dmytro Humeniuk, Mohammad Hamdaqa, Houssem Ben Braiek, Amel Bennaceur, Foutse Khomh

2603.21427 • 2026-03-22

View PDF

TESTING

Tiny but uniform improvements of adaptive BH procedures via compound e-values

After the seminal Benjamini-Hochberg (BH) procedure for controlling the false discovery rate (FDR) was proposed, dozens of papers have attempted to improve its power by adapting to the unknown prop...

Nikolaos Ignatiadis, Ruodu Wang, Aaditya Ramdas

2603.21424 • 2026-03-22

View PDF

TESTING

Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach

Adversarial-example-based fingerprinting approaches, which leverage the decision boundary characteristics of deep neural networks (DNNs) to craft fingerprints, have proven effective for model owner...

Guang Yang, Ziye Geng, Yihang Chen, Changqing Luo

2603.21411 • 2026-03-22

View PDF

TESTING

The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability

Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDP...

Anthony T. Nixon

2603.21399 • 2026-03-22

View PDF

TESTING

A Constructive Approach to $q$-Gaussian Distributions: $α$-Divergence as Rate Function and Generalized de Moivre-Laplace Theorem

The Large Deviation Principle (LDP) and the Central Limit Theorem (CLT) are concepts of information theory and probability. While their formulations are established under the i.i.d. assumption, the...

Hiroki Suyari, Antonio M. Scarfone

2603.21391 • 2026-03-22

View PDF

TESTING

TIDE: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference

Large language models run every token through every layer, regardless of difficulty. We present TIDE, a post-training system that attaches tiny learned routers at periodic checkpoint layers and, at...

Jaber Jaber, Osama Jaber

2603.21365 • 2026-03-22

View PDF

TESTING

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet ...

Liang Ding

2603.21357 • 2026-03-22

View PDF

TESTING

Probabilistic theories stable under teleportation

A long-standing problem in the foundations of quantum mechanics is to identify a physical principle that explains why algebraically maximal violations of Bell inequalities can generally not be achi...

Lionel J. Dmello, David Gross

2603.21347 • 2026-03-22

View PDF

TESTING

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily translate multimodal understanding ...

Dongyoung Kim, Sumin Park, Woomin Song, Seungku Kim, Taeyoung Kim, Huiwon Jang, Jinwoo Shin, Jaeh...

2603.21341 • 2026-03-22

View PDF

TESTING

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

Writing high-performance GPU kernels is among the most labor-intensive tasks in machine learning systems engineering. We present AutoKernel, an open-source framework that applies an autonomous agen...

Jaber Jaber, Osama Jaber

2603.21331 • 2026-03-22

View PDF

TESTING

COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs exce...

Xiaozhe Li, Tianyi Lyu, Siyi Yang, Yizhao Yang, Yuxi Gong, Jinxuan Huang, Ligao Zhang, Zhuoyi Hua...

2603.21329 • 2026-03-22

View PDF

TESTING

Improving Coherence and Persistence in Agentic AI for System Optimization

Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While Large Language Models (LLMs) show ...

Pantea Karimi, Kimia Noorbakhsh, Mohammad Alizadeh, Hari Balakrishnan

2603.21321 • 2026-03-22

View PDF

TESTING

A Parametric, Geometry-Aware Residential Construction Cost Estimation Model for Ghana: Design, Validation, and the "Completeness Gap" in Informal Contractor Quotes

Ghana faces a residential housing deficit of two million units. A key driver of project failure is the "completeness gap", a systematic discrepancy between informal contractor quotes and actual cos...

Emmanuel Apaaboah, Bernard Opoku, the GhanaHousePlanner Research Team

2603.21314 • 2026-03-22

View PDF

AI LLM

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However,...

Yuan Zhou, Yongzhi Li, Yanqi Dai, Xingyu Zhu, Yi Tan, Qingshan Xu, Beier Zhu, Richang Hong, Hanwa...

2603.20187 • 2026-03-20

View PDF

TESTING

Prediction and Experimental Verification of Electrolyte Solvation Structure from an OMol25-Trained Interatomic Potential

Machine learning interatomic potentials (MLIPs) trained on large, chemically diverse datasets are revolutionizing computational chemistry, enabling molecular dynamics simulations of battery electro...

Nitesh Kumar, Jianwei Lai, Casey S. Mezerkor, Jiaqi Wang, Kamila M. Wiaderek, J. David Bazak, Sam...

2603.20183 • 2026-03-20

View PDF

AI LLM

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Although robot-to-robot (R2R) communication improves indoor scene understanding beyond what a single robot can achieve, R2R alone cannot overcome partial observability without substantial explorati...

Fan Yang, Soumya Teotia, Shaunak A. Mehta, Prajit KrisshnaKumar, Quanting Xie, Jun Liu, Yueqi Son...

2603.20182 • 2026-03-20

View PDF

Papers