Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Distill and Align Decomposition for Enhanced Claim Verification

Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinfor...

Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Cha...

2602.21857 2026-02-25
AI LLM

Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation

Accurate annotation of endoscopic videos is essential yet time-consuming, particularly for challenging datasets such as dysplasia in Barrett's esophagus, where the affected regions are irregular an...

Lokesha Rasanjalee, Jin Lin Tan, Dileepa Pitawela, Rajvinder Singh, Hsiang-Ting Chen

2602.21855 2026-02-25
AI LLM

FewMMBench: A Benchmark for Multimodal Few-Shot Learning

As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce Fe...

Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem

2602.21854 2026-02-25
AI LLM

The economic alignment problem of artificial intelligence

Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment...

Daniel W. O'Neill, Stefano Vrizzi, Noemi Luna Carmeno, Felix Creutzig, Jefim Vogel

2602.21843 2026-02-25
AI LLM

Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning

Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversar...

Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera

2602.21841 2026-02-25
AI LLM

UniVBench: Towards Unified Evaluation for Video Foundation Models

Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimoda...

Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zu...

2602.21835 2026-02-25
AI LLM

From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models

Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable b...

Norman Peitek, Julia Hess, Sven Apel

2602.21833 2026-02-25
AI LLM

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

AI is increasingly being used to assist fraud and cybercrime. However, it is unclear whether current large language models can assist complex criminal activity. Working with law enforcement and pol...

Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christ...

2602.21831 2026-02-25
AI LLM

SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model

SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) archit...

Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Ch...

2602.21818 2026-02-25
AI LLM

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per conditi...

Heejin Jo

2602.21814 2026-02-25
AI LLM

An Empirical Study of Bugs in Modern LLM Agent Frameworks

LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the un...

Xinxue Zhu, Jiacong Wu, Xiaoyu Zhang, Tianlin Li, Yanzhou Mu, Juan Zhai, Chao Shen, Yang Liu

2602.21806 2026-02-25
AI LLM

An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention

The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code...

Madhusudan Ghosh, Rishabh Gupta

2602.21800 2026-02-25
AI LLM

D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models

Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumptio...

Shunsuke Ubukata

2602.21786 2026-02-25
AI LLM

Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especi...

Kenton Tang, Yuzhu Chen, Fengxiang He

2602.21765 2026-02-25
AI LLM

Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs

Implicit Discourse Relation Recognition (IDRR) remains a challenging task due to the requirement for deep semantic understanding in the absence of explicit discourse markers. A further limitation i...

Heng Wang, Changxing Wu

2602.21763 2026-02-25
AI LLM

SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness

Single-point annotation is increasingly prominent in visual tasks for labeling cost reduction. However, it challenges tasks requiring high precision, such as the point-prompted instance segmentatio...

Zhaoyang Wei, Xumeng Han, Xuehui Yu, Xue Yang, Guorong Li, Zhenjun Han, Jianbin Jiao

2602.21762 2026-02-25
AI LLM

Offline Reasoning for Efficient Recommendation: LLM-Empowered Persona-Profiled Item Indexing

Recent advances in large language models (LLMs) offer new opportunities for recommender systems by capturing the nuanced semantics of user interests and item characteristics through rich semantic u...

Deogyong Kim, Junseong Lee, Jeongeun Lee, Changhoe Kim, Junguel Lee, Jungseok Lee, Dongha Lee

2602.21756 2026-02-25
AI LLM

From Words to Amino Acids: Does the Curse of Depth Persist?

Protein language models (PLMs) have become widely adopted as general-purpose models, demonstrating strong performance in protein engineering and de novo design. Like large language models (LLMs), t...

Aleena Siji, Amir Mohammad Karimi Mamaghan, Ferdinand Kapl, Tobias Höppe, Emmanouil Angelis, Andr...

2602.21750 2026-02-25
AI LLM

fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

In a previous work, we introduced the fuzzy Ethical Decision-Making framework (fEDM), a risk-based ethical reasoning architecture grounded in fuzzy logic. The original model combined a fuzzy Ethica...

Abeer Dyoub, Francesca A. Lisi

2602.21746 2026-02-25
AI LLM

The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

We introduce the ASIR (Awakened Shared Intelligence Relationship) Courage Model, a phase-dynamic framework that formalizes truth-disclosure as a state transition rather than a personality trait. Th...

Hyo Jin Kim

2602.21745 2026-02-25