Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Exploring the $S_8$ Tension: Insights from the CatNorth 1.5-Million Quasar Candidates

The parameter $S_8$, a key probe of cosmic structure growth, exhibits a persistent $\sim3σ$ tension between high-redshift measurements from cosmic microwave background (CMB) anisotropies and low-re...

Jin Qin, Xue-Bing Wu, Yuming Fu, Haojie Xu, Yuxuan Pang, Yun-Hao Zhang, Pengjie Zhang

2603.09457 2026-03-10
TESTING

Declarative Scenario-based Testing with RoadLogic

Scenario-based testing is a key method for cost-effective and safe validation of autonomous vehicles (AVs). Existing approaches rely on imperative scenario definitions, requiring developers to manu...

Ezio Bartocci, Alessio Gambi, Felix Gigler, Cristinel Mateis, Dejan Ničković

2603.09455 2026-03-10
TESTING

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a pri...

Albus Yizhuo Li, Matthew Wicker

2603.09453 2026-03-10
AI LLM

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?

Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow -- triag...

Xiangsen Chen, Xuan Feng, Shuo Chen, Matthieu Maitre, Sudipto Rakshit, Diana Duvieilh, Ashley Pic...

2603.09452 2026-03-10
TESTING

Feasible Set and the Transformation of Values

This paper proposes a shift in perspective on two long-standing problems in political economy: the reduction of complex labor and the transformation problem. Rather than searching for a unique cons...

Jiyuan Lyu

2603.09450 2026-03-10
AI LLM

A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation

Delineating the clinical target volume (CTV) in radiotherapy involves complex margins constrained by tumor location and anatomical barriers. While deep learning models automate this process, their ...

Yoon Jo Kim, Wonyoung Cho, Jongmin Lee, Han Joo Chae, Hyunki Park, Sang Hoon Seo, Noh Jae Myung, ...

2603.09448 2026-03-10
AI LLM

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a land...

Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou, Vangelis Karkaletsis

2603.09435 2026-03-10
AI LLM

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As such, it is crucial that these models remain both morally grounded and knowled...

Saugata Purkayastha, Pranav Kushare, Pragya Paramita Pal, Sukannya Purkayastha

2603.09434 2026-03-10
TESTING

CERES: A Probabilistic Early Warning System for Acute Food Insecurity

We present CERES (Calibrated Early-warning and Risk Estimation System), an automated probabilistic forecasting system for acute food insecurity. CERES generates 90-day ahead probability estimates o...

Tom Danny S. Pedersen

2603.09425 2026-03-10
TESTING

MetaDAT: Generalizable Trajectory Prediction via Meta Pre-training and Data-Adaptive Test-Time Updating

Existing trajectory prediction methods exhibit significant performance degradation under distribution shifts during test time. Although test-time training techniques have been explored to enable ad...

Yuning Wang, Pu Zhang, Yuan He, Ke Wang, Jianru Xue

2603.09419 2026-03-10
AI LLM

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health

Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propagate biases embedded in their training data, which is potentially impactful in sensitive domains l...

Trung Hieu Ngo, Adrien Bazoge, Solen Quiniou, Pierre-Antoine Gourraud, Emmanuel Morin

2603.09416 2026-03-10
AI LLM

PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

Document Layout Analysis (DLA) is crucial for document artificial intelligence and has recently received increasing attention, resulting in an influx of large-scale public DLA datasets. Existing wo...

Zirui Zhang, Yaping Zhang, Lu Xiang, Yang Zhao, Feifei Zhai, Yu Zhou, Chengqing Zong

2603.09414 2026-03-10
AI LLM

LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist only for English datasets. We propose \textit{LLM as a Meta-Judge...

Lukáš Eigler, Jindřich Libovický, David Hurych

2603.09403 2026-03-10
AI LLM

Reward Prediction with Factorized World States

Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inh...

Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung

2603.09400 2026-03-10
TESTING

Deep Learning Search for Gravitational Waves from Compact Binary Coalescence

Gravitational wave searches rely on a combination of methods, including matched filtering, coherent analyses, and more recent machine learning based pipelines. For compact binary coalescences, wher...

Lorenzo Mobilia, Tito Dal Canton, Gianluca Maria Guidi

2603.09386 2026-03-10
TESTING

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are ...

Yang Chen, Xieyuanli Chen, Junxiang Li, Jie Tang, Tao Wu

2603.09377 2026-03-10
AI LLM

Quantifying and extending the coverage of spatial categorization data sets

Variation in spatial categorization across languages is often studied by eliciting human labels for the relations depicted in a set of scenes known as the Topological Relations Picture Series (TRPS...

Wanchun Li, Alexandra Carstensen, Yang Xu, Terry Regier, Charles Kemp

2603.09373 2026-03-10
TESTING

Verified delegated quantum computation requires techniques beyond cut-and-choose

Delegated quantum computation enables a client with limited quantum capabilities to outsource computations to a more powerful quantum server while preserving correctness and privacy. Verification i...

Fabian Wiesner, Anna Pappa

2603.09368 2026-03-10
TESTING

ProvAgent: Threat Detection Based on Identity-Behavior Binding and Multi-Agent Collaborative Attack Investigation

Advanced Persistent Threats (APTs) pose critical challenges to modern cybersecurity due to their multi-stage and stealthy nature. While provenance-based detection approaches show promise in capturi...

Wenhao Yan, Ning An, Linxu Li, Bingsheng Bi, Bo Jiang, Zhigang Lu, Baoxu Liu, Junrong Liu, Cong Dong

2603.09358 2026-03-10
AI LLM

Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically e...

Anshul Thakur, Soheila Molaei, Pafue Christy Nganjimi, Joshua Fieggen, Andrew A. S. Soltan, Danie...

2603.09356 2026-03-10