Papers
Research papers from arXiv and related sources
The economic alignment problem of artificial intelligence
Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment...
Daniel W. O'Neill, Stefano Vrizzi, Noemi Luna Carmeno, Felix Creutzig, Jefim Vogel
Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning
Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversar...
Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera
Ours go to 211: Euler pseudoprimes to 47 prime bases (from Carmichael numbers)
In this paper we show that a certain subset of the Carmichael numbers contains good Euler pseudoprimes, composite numbers that for many bases survive the Solovay-Strassen primality test. We present...
Alejandra Alcantarilla Sánchez, Jolijn Cottaar, Tanja Lange, Benne de Weger
UniVBench: Towards Unified Evaluation for Video Foundation Models
Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimoda...
Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zu...
From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models
Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable b...
Norman Peitek, Julia Hess, Sven Apel
A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios
AI is increasingly being used to assist fraud and cybercrime. However, it is unclear whether current large language models can assist complex criminal activity. Working with law enforcement and pol...
Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christ...
The supersonic nature of jellyfish galaxies
All gas-rich galaxies in cluster environments are expected to experience ram-pressure stripping from the intra-cluster medium. However, only a fraction of these develop ongoing star-formation in th...
Alessandro Ignesti, Francesca Loi, Antonino Marasco, Benedetta Vulcani, Bianca M. Poggianti, Chri...
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) archit...
Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Ch...
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per conditi...
Heejin Jo
An Empirical Study of Bugs in Modern LLM Agent Frameworks
LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the un...
Xinxue Zhu, Jiacong Wu, Xiaoyu Zhang, Tianlin Li, Yanzhou Mu, Juan Zhai, Chao Shen, Yang Liu
An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention
The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code...
Madhusudan Ghosh, Rishabh Gupta
MulCovFuzz: A Multi-Component Coverage-Guided Greybox Fuzzer for 5G Protocol Testing
As mobile networks transition to 5G infrastructure, ensuring robust security becomes more important due to the complex architecture and expanded attack surface. Traditional security testing approac...
Yu Wang, Yang Xiang, Chandra Thapa, Hajime Suzuki
p-Hacking Inflates Type I Error Rates in the Error Statistical Approach but not in the Formal Inference Approach
p-hacking occurs when researchers conduct multiple significance tests (e.g., p1;H0,1 and p2;H0,2) and then selectively report tests that yield desirable (usually significant) results (e.g., p2 < 0....
Mark Rubin
CALIMA: On-the-fly dust and PAH evolution for radiation-hydrodynamics galaxy formation simulations
Dust grains and polycyclic aromatic hydrocarbons (PAHs) actively contribute to the thermodynamics, chemistry, and radiative state of the interstellar medium (ISM), yet most ISM models and galaxy si...
Francisco Rodríguez Montero, Yohan Dubois, Harley Katz, Adrianne Slyz, Julien Devriendt
D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models
Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumptio...
Shunsuke Ubukata
Therapist-Robot-Patient Physical Interaction is Worth a Thousand Words: Enabling Intuitive Therapist Guidance via Remote Haptic Control
Robotic systems can enhance the amount and repeatability of physically guided motor training. Yet their real-world adoption is limited, partly due to non-intuitive trainer/therapist-trainee/patient...
Beatrice Luciani, Alex van den Berg, Matti Lang, Alexandre L. Ratschat, Laura Marchal-Crespo
Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models
Current Vision-Language Models (VLMs) for deepfake detection excel at identifying spatial artifacts but overlook a critical dimension: temporal inconsistencies in video forgeries. Adapting VLMs to ...
Zheyuan Gu, Qingsong Zhao, Yusong Wang, Zhaohong Huang, Xinqi Li, Cheng Yuan, Jiaowei Shao, Chi Z...
Linear Perturbations and Multi-Probe Diagnostics in Dark-Sector Selective $f(R,T_χ)$ Gravity
We develop a dark-sector selective trace-coupled extension of gravity in which the matter--curvature coupling depends exclusively on the trace of the dark-matter energy--momentum tensor, $T_χ$, def...
L. Yildiz, D. Kayki, E. Gudekli
RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms
Time-series data vary widely across domains, making a universal anomaly detector impractical. Methods that perform well on one dataset often fail to transfer because what counts as an anomaly is co...
Mohamed Abdelmaksoud, Sheng Ding, Andrey Morozov, Ziawasch Abedjan
Generalisation of RLHF under Reward Shift and Clipped KL Regularisation
Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especi...
Kenton Tang, Yuzhu Chen, Fengxiang He