Papers
Research papers from arXiv and related sources
DQN Based Joint UAV Trajectory and Association Planning in NTN Assisted Networks
Advanced Air Mobility (AAM) has emerged as a key pillar of next-generation transportation systems, encompassing a wide range of uncrewed aerial vehicle (UAV) applications. To enable AAM, maintainin...
Afsoon Alidadi Shamsabadi, Cosmas Mwaba, Thomas Nugent, Jie Gao, Pablo Madoery, Halim Yanikomerog...
Mamba-VMR: Multimodal Query Augmentation via Generated Videos for Precise Temporal Grounding
Text-driven video moment retrieval (VMR) remains challenging due to limited capture of hidden temporal dynamics in untrimmed videos, leading to imprecise grounding in long sequences. Traditional me...
Yunzhuo Sun, Xinyue Liu, Yanyang Li, Nanding Wu, Yifang Xu, Linlin Zong, Xianchao Zhang, Wenxin L...
Programming Manufacturing Robots with Imperfect AI: LLMs as Tuning Experts for FDM Print Configuration Selection
We use fused deposition modeling (FDM) 3D printing as a case study of how manufacturing robots can use imperfect AI to acquire process expertise. In FDM, print configuration strongly affects output...
Ekta U. Samani, Christopher G. Atkeson
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Reinforcement learning with verifiable rewards (RLVR) has substantially improved the reasoning capabilities of large language models. While existing analyses identify that RLVR-induced changes are ...
Kexin Huang, Haoming Meng, Junkang Wu, Jinda Lu, Chiyu Ma, Ziqian Chen, Xue Wang, Bolin Ding, Jia...
Lemma Discovery in Agentic Program Verification
Deductive verification provides strong correctness guarantees for code by extracting verification conditions (VCs) and writing formal proofs for them. The expertise-intensive task of VC proving is ...
Huan Zhao, Haoxin Tu, Zhengyao Liu, Martin Rinard, Abhik Roychoudhury
From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI
Over time, the shared understanding that makes a software system safe to change quietly erodes. This gradual loss of understanding across a team increases cognitive debt, while the loss of captured...
Margaret-Anne Storey
Multiperspectivity as a Resource for Narrative Similarity Prediction
Predicting narrative similarity can be understood as an inherently interpretive task: different, equally valid readings of the same text can produce divergent interpretations and thus different sim...
Max Upravitelev, Veronika Solopova, Jing Yang, Charlott Jakob, Premtim Sahitaj, Ariana Sahitaj, V...
GSEM: Graph-based Self-Evolving Memory for Experience Augmented Clinical Reasoning
Clinical decision-making agents can benefit from reusing prior decision experience. However, many memory-augmented methods store experiences as independent records without explicit relational struc...
Xiao Han, Yuzheng Fan, Sendong Zhao, Haochun Wang, Bing Qin
P-Flow: Prompting Visual Effects Generation
Recent advancements in video generation models have significantly improved their ability to follow text prompts. However, the customization of dynamic visual effects, defined as temporally evolving...
Rui Zhao, Mike Zheng Shou
A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP
Despite rapid progress in AI agents for enterprise automation and decision-making, their real-world deployment and further performance gains remain constrained by limited data quality and quantity,...
Xi Yang, Aurelie Lozano, Naoki Abe, Bhavya, Saurabh Jha, Noah Zheutlin, Rohan R. Arora, Yu Deng,...
Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning
Multimodal 3D vision-language models show strong generalization across diverse 3D tasks, but their performance still degrades notably under domain shifts. This has motivated recent studies on test-...
Xingyu Zhu, Liang Yi, Shuo Wang, Wenbo Zhu, Yonglinag Wu, Beier Zhu, Hanwang Zhang
On the Failure of Topic-Matched Contrast Baselines in Multi-Directional Refusal Abliteration
Inasmuch as the removal of refusal behavior from instruction-tuned language models by directional abliteration requires the extraction of refusal-mediating directions from the residual stream activ...
Valentin Petrov
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
Despite the remarkable success of large-scale pre-trained image representation models (i.e., vision encoders) across various vision tasks, they are predominantly trained on 2D image data and theref...
Byungwoo Jeon, Dongyoung Kim, Huiwon Jang, Insoo Kim, Jinwoo Shin
Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch
Large language models (LLMs) achieve state-of-the-art (SOTA) performance across language tasks, but are costly to deploy due to their size and resource demands. Knowledge Distillation (KD) addresse...
Stella Eva Tsiapali, Cong-Thanh Do, Kate Knill
Dynamic analysis enhances issue resolution
Translating natural language descriptions into viable code fixes remains a fundamental challenge in software engineering. While the proliferation of agentic large language models (LLMs) has vastly ...
Mingwei Liu, Zihao Wang, Zhenxi Chen, Zheng Pei, Yanlin Wang, Zibin Zheng
DTVI: Dual-Stage Textual and Visual Intervention for Safe Text-to-Image Generation
Text-to-Image (T2I) diffusion models have demonstrated strong generation ability, but their potential to generate unsafe content raises significant safety concerns. Existing inference-time defense ...
Binhong Tan, Zhaoxin Wang, Handing Wang
On the Challenges and Opportunities of Learned Sparse Retrieval for Code
Retrieval over large codebases is a key component of modern LLM-based software engineering systems. Existing approaches predominantly rely on dense embedding models, while learned sparse retrieval ...
Simon Lupart, Maxime Louis, Thibault Formal, Hervé Déjean, Stéphane Clinchant
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
Vision-Language-Action (VLA) models typically map visual observations and linguistic instructions directly to robotic control signals. This "black-box" mapping forces a single forward pass to simul...
Zixuan Wang, Yuxin Chen, Yuqi Liu, Jinhui Ye, Pengguang Chen, Changsheng Lu, Shu Liu, Jiaya Jia
Surfacing and Applying Meaning: Supporting Hermeneutical Autonomy for LGBTQ+ People in Taiwan
After Taiwan's legalization of same-sex marriage in 2019, LGBTQ+ communities continue to face hostility on social media. Using the lens of hermeneutical injustice and autonomy, we examine how techn...
Yi-Tong Chen, En-Kai Chang, Nanyi Bi, Nitesh Goyal
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
We present daVinci-MagiHuman, an open-source audio-video generative foundation model for human-centric generation. daVinci-MagiHuman jointly generates synchronized video and audio using a single-st...
SII-GAIR, Sand. ai, :, Ethan Chern, Hansi Teng, Hanwen Sun, Hao Wang, Hong Pan, Hongyu Jia, Jia...