Personal Assistant Web

TESTING

PDE foundation model-accelerated inverse estimation of system parameters in inertial confinement fusion

PDE foundation models are typically pretrained on large, diverse corpora of PDE datasets and can be adapted to new settings with limited task-specific data. However, most downstream evaluations foc...

Mahindra Rautela, Alexander Scheinker, Bradley Love, Diane Oyen, Nathan DeBardeleben, Earl Lawren...

2603.04606 • 2026-03-04

View PDF

TESTING

Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development

Code generation has emerged as one of AI's highest-impact use cases, yet existing benchmarks measure isolated tasks rather than the complete "zero-to-one" process of building a working application ...

Hung Tran, Langston Nashold, Rayan Krishnan, Antoine Bigeard, Alex Gu

2603.04601 • 2026-03-04

View PDF

TESTING

PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing

Composed Image Retrieval (CIR) has made significant progress, yet current benchmarks are limited to single ground-truth answers and lack the annotations needed to evaluate false positive avoidance,...

Rohan Mahadev, Joyce Yuan, Patrick Poirson, David Xue, Hao-Yu Wu, Dmitry Kislyuk

2603.04598 • 2026-03-04

View PDF

TESTING

PulSKASim: A Pulsar Simulator for SKA-Scale Interferometric Observations

Accurate simulation of pulsar flux variability is critical for testing Square Kilometre Array (SKA) interferometric pipelines. However, most existing simulators neglect the effects of integration t...

X. Li, V. Stolyarov

2603.04593 • 2026-03-04

View PDF

TESTING

Industrial Survey on Robustness Testing In Cyber Physical Systems

Cyber-Physical Systems (CPS) play a critical role in modern industrial domains, including manufacturing, energy, transportation, and healthcare, where they enable automation, optimization, and real...

Christophe Ponsard, Abiola Paterne Chokki, Jean-François Daune

2603.04587 • 2026-03-04

View PDF

TESTING

Token Taxes: mitigating AGI's economic risks

The development of AGI threatens to erode government tax bases, lower living standards, and disempower citizens -- risks that make the 40-year stagnation of wages during the first industrial revolu...

Lucas Irwin, Tung-Yu Wu, Fazl Barez

2603.04555 • 2026-03-04

View PDF

TESTING

EBLM XVII - Tidal Synchronization and Circularization in Tight Stellar Binaries

Tidal interactions in close stellar binaries are central to their orbital and rotational evolution, making observational tests of theoretical predictions essential for our understanding of the evol...

Ritika Sethi, David V. Martin, Adrian Barker, Pierre F. L. Maxted, Amaury H. M. J. Triaud, Vedad ...

2603.04554 • 2026-03-04

View PDF

TESTING

Discovering mathematical concepts through a multi-agent system

Mathematical concepts emerge through an interplay of processes, including experimentation, efforts at proof, and counterexamples. In this paper, we present a new multi-agent model for computational...

Daattavya Aggarwal, Oisin Kim, Carl Henrik Ek, Challenger Mishra

2603.04528 • 2026-03-04

View PDF

TESTING

Coexistence of Chromatic Flares and an Achromatic QPO in the Gamma-ray Blazar PG 1553+113

The physical origin of quasi-periodic oscillations (QPOs) in blazars remains debated, with geometric and plasma-driven scenarios as the main competing interpretations. Discriminating between them r...

Elena Madero, Alberto Domínguez

2603.04527 • 2026-03-04

View PDF

TESTING

Towards Predictive Quantum Algorithmic Performance: Modeling Time-Correlated Noise at Scale

Combining tensor network techniques with quantum autoregressive moving average models, we quantify the effects of time-correlated noise on quantum algorithms and predict their performance at scale....

Amit Jamadagni, Gregory Quiroz, Eugene Dumitrescu

2603.04524 • 2026-03-04

View PDF

TESTING

MXDFz4.4: A LyC emitter 250Myr after the epoch of reionization and a first test of Ly-alpha morphology as a tracer of LyC escape at high redshift

Assessing the contribution of ionizing sources to cosmic reionization is a central goal of extragalactic astrophysics. Understanding and quantifying ionizing escape remains challenging near the epo...

Ilias Goovaerts, Marc Rafelski, Alexander Beckett, Grecco Oyarzùn, Annalisa Citro, Farhanul Hasan...

2603.04517 • 2026-03-04

View PDF

TESTING

The erasure of Galactic bar resonances by dark matter subhaloes

In the context of increasing appreciation for the coupling between the Galactic bar and the halo, we introduce a new framework using stars trapped in resonance with the bar to probe the Galactic da...

Elliot Y. Davies, Adam M. Dillamore, Vasily Belokurov, Lina Necib

2603.04490 • 2026-03-04

View PDF

TESTING

NASA's Pandora SmallSat Mission: Simulated Modeling and Retrieval of Near-Infrared Exoplanet Transmission Spectra

Pandora is a SmallSat mission dedicated to understanding exoplanets and their host stars by disentangling the impact of stellar heterogeneity on exoplanet transmission spectra. Selected as a NASA A...

Yoav Rotman, Peter McGill, Luis Welbanks, Benjamin V. Rackham, Aishwarya Iyer, Daniel Apai, Micha...

2603.04488 • 2026-03-04

View PDF

TESTING

HyQBench: A Benchmark Suite for Hybrid CV-DV Quantum Computing

Hybrid continuous-variable (CV)-discrete-variable (DV) quantum systems present a promising direction for quantum computing by combining the high dimensional encoding capabilities of qumodes with th...

Shubdeep Mohapatra, Yuan Liu, Eddy Z. Zhang, Huiyang Zhou

2603.04398 • 2026-03-04

View PDF

TESTING

CLARC: C/C++ Benchmark for Robust Code Search

Efficient code retrieval is critical for developer productivity, yet existing benchmarks largely focus on Python and rarely stress-test robustness beyond superficial lexical cues. To address the ga...

Kaicheng Wang, Liyan Huang, Weike Fang, Weihang Wang

2603.04484 • 2026-03-04

View PDF

AI LLM

SELDON: Supernova Explosions Learned by Deep ODE Networks

The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the tradi...

Jiezhong Wu, Jack O'Brien, Jennifer Li, M. S. Krafczyk, Ved G. Shah, Amanda R. Wasserman, Daniel ...

2603.04392 • 2026-03-04

View PDF

AI LLM

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction fail...

Boyuan, Guan, Wencong Cui, Levente Juhasz

2603.04390 • 2026-03-04

View PDF

TESTING

ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of i...

Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Holy...

2603.04385 • 2026-03-04

View PDF

TESTING

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family. We ...

Maximilian von Klinski, Maximilian Schall

2603.04380 • 2026-03-04

View PDF

AI LLM

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce ...

Furkan Mumcu, Yasin Yilmaz

2603.04378 • 2026-03-04

View PDF

Papers