Papers
Research papers from arXiv and related sources
FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection
The rapid iteration and widespread dissemination of deepfake technology have posed severe challenges to information security, making robust and generalizable detection of AI-generated forged images...
Zhilin Tu, Kemou Li, Fengpeng Li, Jianwei Fei, Jiamin Zhang, Haiwei Wu
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation
Subject-driven image generation is increasingly expected to support fine-grained control over multiple entities within a single image. In multi-reference workflows, users may provide several subjec...
Wenqing Tian, Hanyi Mao, Zhaocheng Liu, Lihua Zhang, Qiang Liu, Jian Wu, Liang Wang
Camera-Agnostic Pruning of 3D Gaussian Splats via Descriptor-Based Beta Evidence
The pruning of 3D Gaussian splats is essential for reducing their complexity to enable efficient storage, transmission, and downstream processing. However, most of the existing pruning strategies d...
Peter Fasogbon, Ugurcan Budak, Patrice Rondao Alface, Hamed Rezazadegan Tavakoli
The Golden Subspace: Where Efficiency Meets Generalization in Continual Test-Time Adaptation
Continual Test-Time Adaptation (CTTA) aims to enable models to adapt online to unlabeled data streams under distribution shift without accessing source data. Existing CTTA methods face an efficienc...
Guannan Lai, Da-Wei Zhou, Zhenguo Li, Han-Jia Ye
Disengagement Analysis and Field Tests of a Prototypical Open-Source Level 4 Autonomous Driving System
Proprietary Autonomous Driving Systems are typically evaluated through disengagements, unplanned manual interventions to alter vehicle behavior, as annually reported by the California Department of...
Marvin Seegert, Christian Oefinger, Korbinian Moller, Christoph Bank, Johannes Betz
Guideline-grounded retrieval-augmented generation for ophthalmic clinical decision support
In this work, we propose Oph-Guid-RAG, a multimodal visual RAG system for ophthalmology clinical question answering and decision support. We treat each guideline page as an independent evidence uni...
Shuying Chen, Sen Cui, Zhong Cao
APEG: Adaptive Physical Layer Authentication with Channel Extrapolation and Generative AI
With the rapid advancement of 6G, identity authentication has become increasingly critical for ensuring wireless security. The lightweight and keyless Physical Layer Authentication (PLA) is regarde...
Xiqi Cheng, Rui Meng, Xiaodong Xu, Haixiao Gao, Ping Zhang, Dusit Niyato
AnkleType: A Hands- and Eyes-free Foot-based Text Entry Technique in Virtual Reality
Virtual Reality (VR) emphasizes immersive experiences, while text entry often requires hands or visual attention, which may disrupt the interaction flows in VR. We present AnkleType, a hand- and ey...
Xiyun Luo, Weirong Luo, Kening Zhu, Taizhou Chen
Collision-Free Velocity Scheduling for Multi-Agent Systems on Predefined Routes via Inexact-Projection ADMM
In structured multi-agent transportation systems, agents often must follow predefined routes, making spatial rerouting undesirable or impossible. This paper addresses route-constrained multi-agent ...
Seungyeop Lee, Jong-Han Kim
From Scores to Strategies: Towards Gaze-Informed Diagnostic Assessment for Visualization Literacy
Visualization literacy assessments typically rely on correctness to classify performance, providing little evidence about how readers arrive at their answers. We argue that gaze can address this ga...
Kathrin Schnizer
HMS-VesselNet: Hierarchical Multi-Scale Attention Network with Topology-Preserving Loss for Retinal Vessel Segmentation
Retinal vessel segmentation methods based on standard overlap losses tend to miss thin peripheral vessels because these structures occupy very few pixels and have low contrast against the backgroun...
Amarnath R
On the Constraints and Observational Manifestations of Failed Solar Eruptions in Toroidal Magnetic Cage
Observations show that many solar eruptions remain confined within strong overlying magnetic fields, forming a so-called magnetic cage. While confinement by poloidal overlying fields has been widel...
Jinhan Guo, Y. Guo, H. Wu, B. Schmieder, P. Démoulin, Y. W. Ni, C. Wang, S. Poedts, T. Li, Wensi ...
P^2O: Joint Policy and Prompt Optimization
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, vanilla RLVR suffers from...
Xinyu Lu, Kaiqi Zhang, Jinglin Yang, Boxi Cao, Yaojie Lu, Hongyu Lin, Min He, Xianpei Han, Le Sun
Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning
Speech deepfake source verification systems aims to determine whether two synthetic speech utterances originate from the same source generator, often assuming that the resulting source embeddings a...
Xi Xuan, Wenxin Zhang, Zhiyu Li, Jennifer Williams, Ville Hautamäki, Tomi H. Kinnunen
Adversarial Camouflage
While the rapid development of facial recognition algorithms has enabled numerous beneficial applications, their widespread deployment has raised significant concerns about the risks of mass survei...
Paweł Borsukiewicz, Daniele Lunghi, Melissa Tessa, Jacques Klein, Tegawendé F. Bissyandé
Tacit Knowledge Management with Generative AI: Proposal of the GenAI SECI Model
The emergence of generative AI is bringing about a significant transformation in knowledge management. Generative AI has the potential to address the limitations of conventional knowledge managemen...
Naoshi Uchihira
Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation
Video generation has recently emerged as a central task in the field of generative AI. However, the substantial computational cost inherent in video synthesis makes model distillation a critical te...
Yuyang You, Yongzhi Li, Jiahui Li, Yadong Mu, Quan Chen, Peng Jiang
Climate Prompting: Generating the Madden-Julian Oscillation using Video Diffusion and Low-Dimensional Conditioning
Generative Deep Learning is a powerful tool for modeling of the Madden-Julian oscillation (MJO) in the tropics, yet its relationship to traditional theoretical frameworks remains poorly understood....
Sulian Thual, Feiyang Cai, Jingjing Wang, Feng Luo
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
Do large language models reason morally, or do they merely sound like they do? We investigate whether LLM responses to moral dilemmas exhibit genuine developmental progression through Kohlberg's st...
Aryan Kasat, Smriti Singh, Aman Chadha, Vinija Jain
Verify Implementation Equivalence of Large Models
Verifying whether two implementations of the same large model are equivalent across frameworks is difficult in practice. Even when they realize the same computation, their graphs may differ subst...
Qi Zhan, Xing Hu, Xin Xia, Shanping Li