Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

Quantum Random Forest for the Regression Problem

The Random Forest model is one of the popular models of Machine learning. We present a quantum algorithm for testing (forecasting) process of the Random Forest machine learning model for the Regres...

Kamil Khadiev, Liliya Safina

2603.22790 2026-03-24
TESTING

Understanding Bugs in Quantum Simulators: An Empirical Study

Quantum simulators are a foundational component of the quantum software ecosystem. They are widely used to develop and debug quantum programs, validate compiler transformations, and support empiric...

Krishna Upadhyay, Moshood Fakorede, Umar Farooq

2603.22789 2026-03-24
TESTING

Exposure-Normalized Bed and Chair Fall Rates via Continuous AI Monitoring

This retrospective cohort study used continuous AI monitoring to estimate fall rates by exposure time rather than occupied bed-days. From August 2024 to December 2025, 3,980 eligible monitoring uni...

Paolo Gabriel, Peter Rehani, Zack Drumm, Tyler Troy, Tiffany Wyatt, Narinder Singh

2603.22785 2026-03-24
TESTING

Caterpillar of Thoughts: The Optimal Test-Time Algorithm for Large Language Models

Large language models (LLMs) can often produce substantially better outputs when allowed to use additional test-time computation, such as sampling, chain of thought, backtracking, or revising parti...

Amir Azarmehr, Soheil Behnezhad, Alma Ghafari

2603.22784 2026-03-24
TESTING

MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding

The rapid progress of Large Language Models (LLMs) has spurred growing interest in Multi-modal LLMs (MLLMs) and motivated the development of benchmarks to evaluate their perceptual and comprehensio...

Purui Bai, Tao Wu, Jiayang Sun, Xinyue Liu, Huaibo Huang, Ran He

2603.22756 2026-03-24
TESTING

PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation

Large language models (LLMs) solve complex problems by generating multi-step reasoning traces. Yet these traces are typically analyzed from only one of two perspectives: the sequence of tokens acro...

Ruidi Chang, Jiawei Zhou, Hanjie Chen

2603.22754 2026-03-24
TESTING

CLiGNet: Clinical Label-Interaction Graph Network for Medical Specialty Classification from Clinical Transcriptions

Automated classification of clinical transcriptions into medical specialties is essential for routing, coding, and clinical decision support, yet prior work on the widely used MTSamples benchmark s...

Pronob Kumar Barman, Pronoy Kumar Barman

2603.22752 2026-03-24
TESTING

Experimental investigation of magnetic properties of MnFeCo$_{4}$Si$_{2}$ discovered by GNoME

AI-driven inorganic materials research has garnered significant attention due to its ability to reduce the time, labor, and cost associated with experiments. An AI model known as GNoME, recently de...

Shuhei Naganuma, Jiro Kitagawa

2603.22748 2026-03-24
TESTING

Pre Seismic Quiescence and Dynamical Regime Transitions in the Japan and Chile Earthquake Catalogs Evidence from KR Critical Slowing Down Indicators

We present the KR excitation regulation framework, a coupled ordinary differential equation system that produces Critical Slowing Down (CSD) indicators from rolling earthquake magnitude windows, an...

Ramakrishna Pasupuleti

2603.22745 2026-03-24
TESTING

Beyond Binary Correctness: Scaling Evaluation of Long-Horizon Agents on Subjective Enterprise Tasks

Large language models excel on objectively verifiable tasks such as math and programming, where evaluation reduces to unit tests or a single correct answer. In contrast, real-world enterprise work ...

Abhishek Chandwani, Ishan Gupta

2603.22744 2026-03-24
TESTING

Explanation Generation for Contradiction Reconciliation with LLMs

Existing NLP work commonly treats contradictions as errors to be resolved by choosing which statements to accept or discard. Yet a key aspect of human reasoning in social interactions and professio...

Jason Chan, Zhixue Zhao, Robert Gaizauskas

2603.22735 2026-03-24
TESTING

How Utilitarian Are OpenAI's Models Really? Replicating and Reinterpreting Pfeffer, Krügel, and Uhl (2025)

Pfeffer, Krügel, and Uhl (2025) report that OpenAI's reasoning model o1-mini produces more utilitarian responses to the trolley problem and footbridge dilemma than the non-reasoning model GPT-4o. I...

Johannes Himmelreich

2603.22730 2026-03-24
TESTING

Beyond Explanation: Evidentiary Rights for Algorithmic Accountability

Algorithmic accountability scholarship has focused heavily on explanation, helping affected parties understand why decisions were made. We argue this focus is insufficient. Explanation without evid...

Matthew Stewart

2603.22716 2026-03-24
TESTING

Detecting Non-Membership in LLM Training Data via Rank Correlations

As large language models (LLMs) are trained on increasingly vast and opaque text corpora, determining which data contributed to training has become essential for copyright enforcement, compliance a...

Pranav Shetty, Mirazul Haque, Zhiqiang Ma, Xiaomo Liu

2603.22707 2026-03-24
TESTING

Testing Properties of Edge Distributions

We initiate the study of distribution testing for probability distributions over the edges of a graph, motivated by the closely related question of ``edge-distribution-free'' graph property testing...

Yumou Fei

2603.22702 2026-03-24
TESTING

A Clinically Anchored Radiomics Dictionary for Explainable TI-RADS-Based Thyroid Nodule Classification in Ultrasound; Dictionary Version TU1.0

Artificial intelligence based radiomics models for thyroid ultrasound (US) often achieve strong diagnostic performance but remain difficult to interpret, limiting clinical trust and adoption. We de...

Mohammad Salmanpour, Shahram Taeb, Ali Fathi Jouzdani, Mohammad Ayazi, Siavash Hosseinpour Saffar...

2603.22692 2026-03-24
TESTING

BlindMarket: Enabling Verifiable, Confidential, and Traceable IP Core Distribution in Zero-Trust Settings

We present BlindMarket, an end-to-end zero-trust distribution framework for hardware IP cores. BlindMarket allows two parties, the IP user and the IP vendor, to complete an IP trading process with ...

Zhaoxiang Liu, Samuel Judson, Raj Dutta, Mark Santolucito, Xiaolong Guo, Ning Luo

2603.22685 2026-03-24
TESTING

Fixed-level calibration of the Cauchy combination test

The Cauchy combination test (CCT) is widely used because it gives a closed-form combined $p$-value and is known to be asymptotically valid as the nominal level $α\downarrow0$ under broad dependence...

Hirofumi Ota

2603.22668 2026-03-24
TESTING

Variable-Resolution Virtual Maps for Autonomous Exploration with Unmanned Surface Vehicles (USVs)

Autonomous exploration by unmanned surface vehicles (USVs) in near-shore waters requires reliable localisation and consistent mapping over extended areas, but this is challenged by GNSS degradation...

Ye Li, Yewei Huang, Wenlong GaoZhang, Alberto Quattrini Li, Brendan Englot, Yuanchang Liu

2603.22667 2026-03-24
TESTING

Toward Faithful Segmentation Attribution via Benchmarking and Dual-Evidence Fusion

Attribution maps for semantic segmentation are almost always judged by visual plausibility. Yet looking convincing does not guarantee that the highlighted pixels actually drive the model's predicti...

Abu Noman Md Sakib, OFM Riaz Rahman Aranya, Kevin Desai, Zijie Zhang

2603.22624 2026-03-23