Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

The economic alignment problem of artificial intelligence

Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment...

Daniel W. O'Neill, Stefano Vrizzi, Noemi Luna Carmeno, Felix Creutzig, Jefim Vogel

2602.21843 2026-02-25
AI LLM

Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning

Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversar...

Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera

2602.21841 2026-02-25
TESTING

Ours go to 211: Euler pseudoprimes to 47 prime bases (from Carmichael numbers)

In this paper we show that a certain subset of the Carmichael numbers contains good Euler pseudoprimes, composite numbers that for many bases survive the Solovay-Strassen primality test. We present...

Alejandra Alcantarilla Sánchez, Jolijn Cottaar, Tanja Lange, Benne de Weger

2602.21840 2026-02-25
AI LLM

UniVBench: Towards Unified Evaluation for Video Foundation Models

Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimoda...

Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zu...

2602.21835 2026-02-25
AI LLM

From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models

Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable b...

Norman Peitek, Julia Hess, Sven Apel

2602.21833 2026-02-25
AI LLM

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

AI is increasingly being used to assist fraud and cybercrime. However, it is unclear whether current large language models can assist complex criminal activity. Working with law enforcement and pol...

Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christ...

2602.21831 2026-02-25
TESTING

The supersonic nature of jellyfish galaxies

All gas-rich galaxies in cluster environments are expected to experience ram-pressure stripping from the intra-cluster medium. However, only a fraction of these develop ongoing star-formation in th...

Alessandro Ignesti, Francesca Loi, Antonino Marasco, Benedetta Vulcani, Bianca M. Poggianti, Chri...

2602.21821 2026-02-25
AI LLM

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) archit...

Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Ch...

2602.21818 2026-02-25
AI LLM

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per conditi...

Heejin Jo

2602.21814 2026-02-25
AI LLM

An Empirical Study of Bugs in Modern LLM Agent Frameworks

LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the un...

Xinxue Zhu, Jiacong Wu, Xiaoyu Zhang, Tianlin Li, Yanzhou Mu, Juan Zhai, Chao Shen, Yang Liu

2602.21806 2026-02-25
AI LLM

An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code...

Madhusudan Ghosh, Rishabh Gupta

2602.21800 2026-02-25
TESTING

MulCovFuzz: A Multi-Component Coverage-Guided Greybox Fuzzer for 5G Protocol Testing

As mobile networks transition to 5G infrastructure, ensuring robust security becomes more important due to the complex architecture and expanded attack surface. Traditional security testing approac...

Yu Wang, Yang Xiang, Chandra Thapa, Hajime Suzuki

2602.21794 2026-02-25
TESTING

p-Hacking Inflates Type I Error Rates in the Error Statistical Approach but not in the Formal Inference Approach

p-hacking occurs when researchers conduct multiple significance tests (e.g., p1;H0,1 and p2;H0,2) and then selectively report tests that yield desirable (usually significant) results (e.g., p2 < 0....

Mark Rubin

2602.21792 2026-02-25
TESTING

CALIMA: On-the-fly dust and PAH evolution for radiation-hydrodynamics galaxy formation simulations

Dust grains and polycyclic aromatic hydrocarbons (PAHs) actively contribute to the thermodynamics, chemistry, and radiative state of the interstellar medium (ISM), yet most ISM models and galaxy si...

Francisco Rodríguez Montero, Yohan Dubois, Harley Katz, Adrianne Slyz, Julien Devriendt

2602.21790 2026-02-25
AI LLM

D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models

Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumptio...

Shunsuke Ubukata

2602.21786 2026-02-25
TESTING

Therapist-Robot-Patient Physical Interaction is Worth a Thousand Words: Enabling Intuitive Therapist Guidance via Remote Haptic Control

Robotic systems can enhance the amount and repeatability of physically guided motor training. Yet their real-world adoption is limited, partly due to non-intuitive trainer/therapist-trainee/patient...

Beatrice Luciani, Alex van den Berg, Matti Lang, Alexandre L. Ratschat, Laura Marchal-Crespo

2602.21783 2026-02-25
TESTING

Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

Current Vision-Language Models (VLMs) for deepfake detection excel at identifying spatial artifacts but overlook a critical dimension: temporal inconsistencies in video forgeries. Adapting VLMs to ...

Zheyuan Gu, Qingsong Zhao, Yusong Wang, Zhaohong Huang, Xinqi Li, Cheng Yuan, Jiaowei Shao, Chi Z...

2602.21779 2026-02-25
TESTING

Linear Perturbations and Multi-Probe Diagnostics in Dark-Sector Selective $f(R,T_χ)$ Gravity

We develop a dark-sector selective trace-coupled extension of gravity in which the matter--curvature coupling depends exclusively on the trace of the dark-matter energy--momentum tensor, $T_χ$, def...

L. Yildiz, D. Kayki, E. Gudekli

2602.21774 2026-02-25
TESTING

RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms

Time-series data vary widely across domains, making a universal anomaly detector impractical. Methods that perform well on one dataset often fail to transfer because what counts as an anomaly is co...

Mohamed Abdelmaksoud, Sheng Ding, Andrey Morozov, Ziawasch Abedjan

2602.21766 2026-02-25
AI LLM

Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especi...

Kenton Tang, Yuzhu Chen, Fengxiang He

2602.21765 2026-02-25