Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Predicting Conflict Impact on Performance in O-RAN

The O-RAN Alliance promotes the integration of intelligent autonomous agents to control the Radio Access Network (RAN). This improves flexibility, performance, and observability in the RAN, but int...

Pietro Brach del Prever, Niloofar Mohamadi, Salvatore D'Oro, Leonardo Bonati, Michele Polese, Łuk...

2603.08685 2026-03-09
AI LLM

A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade using AI-Guided Evolutionary Search

The celebrated Myerson--Satterthwaite theorem shows that in bilateral trade, no mechanism can be simultaneously fully efficient, Bayesian incentive compatible (BIC), and budget balanced (BB). This ...

Yang Cai, Vineet Gupta, Zun Li, Aranyak Mehta

2603.08679 2026-03-09
TESTING

Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision-Language Models

Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate....

Siqi Shang, Minchao Huang, Bill Fan, Lillian Chin

2603.08668 2026-03-09
AI LLM

Cybersecurity AI: Hacking Consumer Robots in the AI Era

Is robot cybersecurity broken by AI? Consumer robots -- from autonomous lawnmowers to powered exoskeletons and window cleaners -- are rapidly entering homes and workplaces, yet their security remai...

Víctor Mayoral-Vilches, Unai Ayucar-Carbajo, Olivier Laflamme, Ruikai Peng, María Sanz-Gómez, Fra...

2603.08665 2026-03-09
AI LLM

How Far Can Unsupervised RLVR Scale LLM Training?

Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent ...

Bingxiang He, Yuxin Zuo, Zeyuan Liu, Shangziqi Zhao, Zixuan Fu, Junlin Yang, Cheng Qian, Kaiyan Z...

2603.08660 2026-03-09
TESTING

Context-free Self-Conditioned GAN for Trajectory Forecasting

In this paper, we present a context-free unsupervised approach based on a self-conditioned GAN to learn different modes from 2D trajectories. Our intuition is that each mode indicates a different b...

Tiago Rodrigues de Almeida, Eduardo Gutierrez Maestro, Oscar Martinez Mozos

2603.08658 2026-03-09
AI LLM

OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins ...

Krista Opsahl-Ong, Arnav Singhvi, Jasmine Collins, Ivan Zhou, Cindy Wang, Ashutosh Baheti, Owen O...

2603.08655 2026-03-09
AI LLM

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However...

Haodong Li, Chunmei Qing, Huanyu Zhang, Dongzhi Jiang, Yihang Zou, Hongbo Peng, Dingming Li, Yuho...

2603.08652 2026-03-09
TESTING

Divide and Predict: An Architecture for Input Space Partitioning and Enhanced Accuracy

In this article the authors develop an intrinsic measure for quantifying heterogeneity in training data for supervised learning. This measure is the variance of a random variable which factors thro...

Fenix W. Huang, Henning S. Mortveit, Christian M. Reidys

2603.08649 2026-03-09
AI LLM

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

AI agents have become surprisingly proficient at software engineering over the past year, largely due to improvements in reasoning capabilities. This raises a deeper question: can these systems ext...

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym...

2603.08640 2026-03-09
TESTING

UNBOX: Unveiling Black-box visual models with Natural-language

Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as pro...

Simone Carnemolla, Chiara Russo, Simone Palazzo, Quentin Bouniot, Daniela Giordano, Zeynep Akata,...

2603.08639 2026-03-09
AI LLM

Reachability-based Temporal Logic Verification for Reliable LLM-guided Human-Autonomy Teaming

We propose a reachability-based framework for reliable LLM-guided human-autonomy teaming (HAT) using signal temporal logic (STL). In the proposed framework, LLM is leveraged as a translator that tr...

Joonwon Choi, Kartik Anand Pant, Karthik Nune, Inseok Hwang

2603.08633 2026-03-09
TESTING

Secondary gravitational waves against a strong gravitational wave in the Bianchi VI universe

A proper-time method for constructing models of dynamic gravitational-wave fields is presented. Using the proper-time method, analytical (not numerical) models of secondary gravitational waves are ...

Konstantin E. Osetrin

2603.08628 2026-03-09
AI LLM

Coverage-Guided Multi-Agent Harness Generation for Java Library Fuzzing

Coverage-guided fuzzing has proven effective for software testing, but targeting library code requires specialized fuzz harnesses that translate fuzzer-generated inputs into valid API invocations. ...

Nils Loose, Nico Winkel, Kristoffer Hempel, Felix Mächtle, Julian Hans, Thomas Eisenbarth

2603.08616 2026-03-09
TESTING

Query-Guided Analysis and Mitigation of Data Verification Errors (Extended Version)

Data verification, the process of labeling data items as correct or incorrect, is a preprocessing step that may critically affect the quality of results in data-driven pipelines. Despite recent adv...

Ran Schreiber, Yael Amsterdamer

2603.08612 2026-03-09
TESTING

RESAPLE: An Approximate One-Step Restricted Likelihood Estimator of Spatial Dependence for Exploratory Spatial Analysis

Diagnostics such as Moran's index and approximate profile likelihood-based estimators (APLE) for Gaussian spatial autoregressive models are widely used in exploratory data analysis to assess the st...

Aditya Khan, Meredith Franklin

2603.08607 2026-03-09
AI LLM

What to Make Sense of in the Era of LLM? A Perspective from the Structure and Efforts in Sensemaking

Sensemaking tasks often entail navigating through complex, ambiguous data to construct coherent insights. Prior work has shown that crowds can effectively distribute cognitive load, pooling diverse...

Tianyi Li, Satya Samhita Bonepalli, Vikram Mohanty

2603.08604 2026-03-09
TESTING

Bilevel Planning with Learned Symbolic Abstractions from Interaction Data

Intelligent agents must reason over both continuous dynamics and discrete representations to generate effective plans in complex environments. Previous studies have shown that symbolic abstractions...

Fatih Dogangun, Burcu Kilic, Serdar Bahar, Emre Ugur

2603.08599 2026-03-09
TESTING

The Grasshopper Problem on the Sphere

The spherical grasshopper problem is a geometric optimization problem that arises in the context of Bell inequalities and can be interpreted as identifying the best local hidden variable approximat...

David Llamas, Dmitry Chistikov, Adrian Kent, Mike Paterson, Olga Goulko

2603.08579 2026-03-09
TESTING

Drift-to-Action Controllers: Budgeted Interventions with Online Risk Certificates

Deployed machine learning systems face distribution drift, yet most monitoring pipelines stop at alarms and leave the response underspecified under labeling, compute, and latency constraints. We in...

Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh

2603.08578 2026-03-09