Papers
Research papers from arXiv and related sources
Exploring the $S_8$ Tension: Insights from the CatNorth 1.5-Million Quasar Candidates
The parameter $S_8$, a key probe of cosmic structure growth, exhibits a persistent $\sim3σ$ tension between high-redshift measurements from cosmic microwave background (CMB) anisotropies and low-re...
Jin Qin, Xue-Bing Wu, Yuming Fu, Haojie Xu, Yuxuan Pang, Yun-Hao Zhang, Pengjie Zhang
Declarative Scenario-based Testing with RoadLogic
Scenario-based testing is a key method for cost-effective and safe validation of autonomous vehicles (AVs). Existing approaches rely on imperative scenario definitions, requiring developers to manu...
Ezio Bartocci, Alessio Gambi, Felix Gigler, Cristinel Mateis, Dejan Ničković
Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers
Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a pri...
Albus Yizhuo Li, Matthew Wicker
CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?
Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow -- triag...
Xiangsen Chen, Xuan Feng, Shuo Chen, Matthieu Maitre, Sudipto Rakshit, Diana Duvieilh, Ashley Pic...
Feasible Set and the Transformation of Values
This paper proposes a shift in perspective on two long-standing problems in political economy: the reduction of complex labor and the transformation problem. Rather than searching for a unique cons...
Jiyuan Lyu
A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation
Delineating the clinical target volume (CTV) in radiotherapy involves complex margins constrained by tumor location and anatomical barriers. While deep learning models automate this process, their ...
Yoon Jo Kim, Wonyoung Cho, Jongmin Lee, Han Joo Chae, Hyunki Park, Sang Hoon Seo, Noh Jae Myung, ...
AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems
The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a land...
Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou, Vangelis Karkaletsis
Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs
Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As such, it is crucial that these models remain both morally grounded and knowled...
Saugata Purkayastha, Pranav Kushare, Pragya Paramita Pal, Sukannya Purkayastha
CERES: A Probabilistic Early Warning System for Acute Food Insecurity
We present CERES (Calibrated Early-warning and Risk Estimation System), an automated probabilistic forecasting system for acute food insecurity. CERES generates 90-day ahead probability estimates o...
Tom Danny S. Pedersen
MetaDAT: Generalizable Trajectory Prediction via Meta Pre-training and Data-Adaptive Test-Time Updating
Existing trajectory prediction methods exhibit significant performance degradation under distribution shifts during test time. Although test-time training techniques have been explored to enable ad...
Yuning Wang, Pu Zhang, Yuan He, Ke Wang, Jianru Xue
Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health
Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propagate biases embedded in their training data, which is potentially impactful in sensitive domains l...
Trung Hieu Ngo, Adrien Bazoge, Solen Quiniou, Pierre-Antoine Gourraud, Emmanuel Morin
PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue
Document Layout Analysis (DLA) is crucial for document artificial intelligence and has recently received increasing attention, resulting in an influx of large-scale public DLA datasets. Existing wo...
Zirui Zhang, Yaping Zhang, Lu Xiang, Yang Zhao, Feifei Zhai, Yu Zhou, Chengqing Zong
LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation
Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist only for English datasets. We propose \textit{LLM as a Meta-Judge...
Lukáš Eigler, Jindřich Libovický, David Hurych
Reward Prediction with Factorized World States
Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inh...
Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung
Deep Learning Search for Gravitational Waves from Compact Binary Coalescence
Gravitational wave searches rely on a combination of methods, including matched filtering, coherent analyses, and more recent machine learning based pipelines. For compact binary coalescences, wher...
Lorenzo Mobilia, Tito Dal Canton, Gianluca Maria Guidi
SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization
Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are ...
Yang Chen, Xieyuanli Chen, Junxiang Li, Jie Tang, Tao Wu
Quantifying and extending the coverage of spatial categorization data sets
Variation in spatial categorization across languages is often studied by eliciting human labels for the relations depicted in a set of scenes known as the Topological Relations Picture Series (TRPS...
Wanchun Li, Alexandra Carstensen, Yang Xu, Terry Regier, Charles Kemp
Verified delegated quantum computation requires techniques beyond cut-and-choose
Delegated quantum computation enables a client with limited quantum capabilities to outsource computations to a more powerful quantum server while preserving correctness and privacy. Verification i...
Fabian Wiesner, Anna Pappa
ProvAgent: Threat Detection Based on Identity-Behavior Binding and Multi-Agent Collaborative Attack Investigation
Advanced Persistent Threats (APTs) pose critical challenges to modern cybersecurity due to their multi-stage and stealthy nature. While provenance-based detection approaches show promise in capturi...
Wenhao Yan, Ning An, Linxu Li, Bingsheng Bi, Bo Jiang, Zhigang Lu, Baoxu Liu, Junrong Liu, Cong Dong
Democratising Clinical AI through Dataset Condensation for Classical Clinical Models
Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically e...
Anshul Thakur, Soheila Molaei, Pafue Christy Nganjimi, Joshua Fieggen, Andrew A. S. Soltan, Danie...