Personal Assistant Web

AI LLM

The economic alignment problem of artificial intelligence

Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment...

Daniel W. O'Neill, Stefano Vrizzi, Noemi Luna Carmeno, Felix Creutzig, Jefim Vogel

2602.21843 • 2026-02-25

View PDF

AI LLM

Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning

Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversar...

Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera

2602.21841 • 2026-02-25

View PDF

TESTING

Ours go to 211: Euler pseudoprimes to 47 prime bases (from Carmichael numbers)

In this paper we show that a certain subset of the Carmichael numbers contains good Euler pseudoprimes, composite numbers that for many bases survive the Solovay-Strassen primality test. We present...

Alejandra Alcantarilla Sánchez, Jolijn Cottaar, Tanja Lange, Benne de Weger

2602.21840 • 2026-02-25

View PDF

AI LLM

UniVBench: Towards Unified Evaluation for Video Foundation Models

Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimoda...

Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zu...

2602.21835 • 2026-02-25

View PDF

AI LLM

From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models

Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable b...

Norman Peitek, Julia Hess, Sven Apel

2602.21833 • 2026-02-25

View PDF

AI LLM

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

AI is increasingly being used to assist fraud and cybercrime. However, it is unclear whether current large language models can assist complex criminal activity. Working with law enforcement and pol...

Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christ...

2602.21831 • 2026-02-25

View PDF

TESTING

The supersonic nature of jellyfish galaxies

All gas-rich galaxies in cluster environments are expected to experience ram-pressure stripping from the intra-cluster medium. However, only a fraction of these develop ongoing star-formation in th...

Alessandro Ignesti, Francesca Loi, Antonino Marasco, Benedetta Vulcani, Bianca M. Poggianti, Chri...

2602.21821 • 2026-02-25

View PDF

AI LLM

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) archit...

Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Ch...

2602.21818 • 2026-02-25

View PDF

AI LLM

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per conditi...

Heejin Jo

2602.21814 • 2026-02-25

View PDF

AI LLM

An Empirical Study of Bugs in Modern LLM Agent Frameworks

LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the un...

Xinxue Zhu, Jiacong Wu, Xiaoyu Zhang, Tianlin Li, Yanzhou Mu, Juan Zhai, Chao Shen, Yang Liu

2602.21806 • 2026-02-25

View PDF

AI LLM

An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code...

Madhusudan Ghosh, Rishabh Gupta

2602.21800 • 2026-02-25

View PDF

TESTING

MulCovFuzz: A Multi-Component Coverage-Guided Greybox Fuzzer for 5G Protocol Testing

As mobile networks transition to 5G infrastructure, ensuring robust security becomes more important due to the complex architecture and expanded attack surface. Traditional security testing approac...

Yu Wang, Yang Xiang, Chandra Thapa, Hajime Suzuki

2602.21794 • 2026-02-25

View PDF

TESTING

p-Hacking Inflates Type I Error Rates in the Error Statistical Approach but not in the Formal Inference Approach

p-hacking occurs when researchers conduct multiple significance tests (e.g., p1;H0,1 and p2;H0,2) and then selectively report tests that yield desirable (usually significant) results (e.g., p2 < 0....

Mark Rubin

2602.21792 • 2026-02-25

View PDF

TESTING

CALIMA: On-the-fly dust and PAH evolution for radiation-hydrodynamics galaxy formation simulations

Dust grains and polycyclic aromatic hydrocarbons (PAHs) actively contribute to the thermodynamics, chemistry, and radiative state of the interstellar medium (ISM), yet most ISM models and galaxy si...

Francisco Rodríguez Montero, Yohan Dubois, Harley Katz, Adrianne Slyz, Julien Devriendt

2602.21790 • 2026-02-25

View PDF

AI LLM

D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models

Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumptio...

Shunsuke Ubukata

2602.21786 • 2026-02-25

View PDF

TESTING

Therapist-Robot-Patient Physical Interaction is Worth a Thousand Words: Enabling Intuitive Therapist Guidance via Remote Haptic Control

Robotic systems can enhance the amount and repeatability of physically guided motor training. Yet their real-world adoption is limited, partly due to non-intuitive trainer/therapist-trainee/patient...

Beatrice Luciani, Alex van den Berg, Matti Lang, Alexandre L. Ratschat, Laura Marchal-Crespo

2602.21783 • 2026-02-25

View PDF

TESTING

Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

Current Vision-Language Models (VLMs) for deepfake detection excel at identifying spatial artifacts but overlook a critical dimension: temporal inconsistencies in video forgeries. Adapting VLMs to ...

Zheyuan Gu, Qingsong Zhao, Yusong Wang, Zhaohong Huang, Xinqi Li, Cheng Yuan, Jiaowei Shao, Chi Z...

2602.21779 • 2026-02-25

View PDF

TESTING

Linear Perturbations and Multi-Probe Diagnostics in Dark-Sector Selective $f(R,T_χ)$ Gravity

We develop a dark-sector selective trace-coupled extension of gravity in which the matter--curvature coupling depends exclusively on the trace of the dark-matter energy--momentum tensor, $T_χ$, def...

L. Yildiz, D. Kayki, E. Gudekli

2602.21774 • 2026-02-25

View PDF

TESTING

RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms

Time-series data vary widely across domains, making a universal anomaly detector impractical. Methods that perform well on one dataset often fail to transfer because what counts as an anomaly is co...

Mohamed Abdelmaksoud, Sheng Ding, Andrey Morozov, Ziawasch Abedjan

2602.21766 • 2026-02-25

View PDF

AI LLM

Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especi...

Kenton Tang, Yuzhu Chen, Fengxiang He

2602.21765 • 2026-02-25

View PDF

Papers