Papers
Research papers from arXiv and related sources
Distill and Align Decomposition for Enhanced Claim Verification
Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinfor...
Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Cha...
Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation
Accurate annotation of endoscopic videos is essential yet time-consuming, particularly for challenging datasets such as dysplasia in Barrett's esophagus, where the affected regions are irregular an...
Lokesha Rasanjalee, Jin Lin Tan, Dileepa Pitawela, Rajvinder Singh, Hsiang-Ting Chen
FewMMBench: A Benchmark for Multimodal Few-Shot Learning
As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce Fe...
Mustafa Dogan, Ilker Kesen, Iacer Calixto, Aykut Erdem, Erkut Erdem
The economic alignment problem of artificial intelligence
Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment...
Daniel W. O'Neill, Stefano Vrizzi, Noemi Luna Carmeno, Felix Creutzig, Jefim Vogel
Resilient Federated Chain: Transforming Blockchain Consensus into an Active Defense Layer for Federated Learning
Federated Learning (FL) has emerged as a key paradigm for building Trustworthy AI systems by enabling privacy-preserving, decentralized model training. However, FL is highly susceptible to adversar...
Mario García-Márquez, Nuria Rodríguez-Barroso, M. Victoria Luzón, Francisco Herrera
UniVBench: Towards Unified Evaluation for Video Foundation Models
Video foundation models aim to integrate video understanding, generation, editing, and instruction following within a single framework, making them a central direction for next-generation multimoda...
Jianhui Wei, Xiaotian Zhang, Yichen Li, Yuan Wang, Yan Zhang, Ziyi Chen, Zhihang Tang, Wei Xu, Zu...
From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models
Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibit inconsistencies and unpredictable b...
Norman Peitek, Julia Hess, Sven Apel
A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios
AI is increasingly being used to assist fraud and cybercrime. However, it is unclear whether current large language models can assist complex criminal activity. Working with law enforcement and pol...
Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christ...
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
SkyReels V4 is a unified multi modal video foundation model for joint video audio generation, inpainting, and editing. The model adopts a dual stream Multimodal Diffusion Transformer (MMDiT) archit...
Guibin Chen, Dixuan Lin, Jiangping Yang, Youqiang Zhang, Zhengcong Fei, Debang Li, Sheng Chen, Ch...
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per conditi...
Heejin Jo
An Empirical Study of Bugs in Modern LLM Agent Frameworks
LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the un...
Xinxue Zhu, Jiacong Wu, Xiaoyu Zhang, Tianlin Li, Yanzhou Mu, Juan Zhai, Chao Shen, Yang Liu
An Evaluation of Context Length Extrapolation in Long Code via Positional Embeddings and Efficient Attention
The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing various code-related tasks such as code...
Madhusudan Ghosh, Rishabh Gupta
D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models
Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumptio...
Shunsuke Ubukata
Generalisation of RLHF under Reward Shift and Clipped KL Regularisation
Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especi...
Kenton Tang, Yuzhu Chen, Fengxiang He
Improving Implicit Discourse Relation Recognition with Natural Language Explanations from LLMs
Implicit Discourse Relation Recognition (IDRR) remains a challenging task due to the requirement for deep semantic understanding in the absence of explicit discourse markers. A further limitation i...
Heng Wang, Changxing Wu
SAPNet++: Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness
Single-point annotation is increasingly prominent in visual tasks for labeling cost reduction. However, it challenges tasks requiring high precision, such as the point-prompted instance segmentatio...
Zhaoyang Wei, Xumeng Han, Xuehui Yu, Xue Yang, Guorong Li, Zhenjun Han, Jianbin Jiao
Offline Reasoning for Efficient Recommendation: LLM-Empowered Persona-Profiled Item Indexing
Recent advances in large language models (LLMs) offer new opportunities for recommender systems by capturing the nuanced semantics of user interests and item characteristics through rich semantic u...
Deogyong Kim, Junseong Lee, Jeongeun Lee, Changhoe Kim, Junguel Lee, Jungseok Lee, Dongha Lee
From Words to Amino Acids: Does the Curse of Depth Persist?
Protein language models (PLMs) have become widely adopted as general-purpose models, demonstrating strong performance in protein engineering and de novo design. Like large language models (LLMs), t...
Aleena Siji, Amir Mohammad Karimi Mamaghan, Ferdinand Kapl, Tobias Höppe, Emmanouil Angelis, Andr...
fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation
In a previous work, we introduced the fuzzy Ethical Decision-Making framework (fEDM), a risk-based ethical reasoning architecture grounded in fuzzy logic. The original model combined a fuzzy Ethica...
Abeer Dyoub, Francesca A. Lisi
The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
We introduce the ASIR (Awakened Shared Intelligence Relationship) Courage Model, a phase-dynamic framework that formalizes truth-disclosure as a state transition rather than a personality trait. Th...
Hyo Jin Kim