Papers
Research papers from arXiv and related sources
Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
Speech Large Language Models (Speech-LLMs) have emerged as a powerful approach for automatic speech recognition (ASR) by aligning speech encoders with large language models. However, adapting these...
Yuxiang Mei, Delai Qiu, Shengping Liu, Jiaen Liang, Yanhua Long
Prompt-Free Universal Region Proposal Network
Identifying potential objects is critical for object recognition and analysis across various computer vision applications. Existing methods typically localize potential objects by relying on exempl...
Qihong Tang, Changhan Liu, Shaofeng Zhang, Wenbin Li, Qi Fan, Yang Gao
Deep Learning-Based Airway Segmentation in Systemic Lupus Erythematosus Patients with Interstitial Lung Disease (SLE-ILD): A Comparative High-Resolution CT Analysis
To characterize lobar and segmental airway volume differences between systemic lupus erythematosus (SLE) patients with interstitial lung disease (ILD) and those without ILD (non-ILD) using a deep l...
Sirong Piao, Ying Ming, Ruijie Zhao, Jiaru Wang, Ran Xiao, Rui Zhao, Zicheng Liao, Qiqi Xu, Shaoz...
Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify
Podcast listening is often grounded in a set of favorite shows, while listener intent can evolve over time. This combination of stable preferences and changing intent motivates recommendation appro...
Edoardo D'Amico, Marco De Nadai, Praveen Chandar, Divita Vohra, Shawn Lin, Max Lefarov, Paul Gigi...
Informative Semi-Factuals for XAI: The Elaborated Explanations that People Prefer
Recently, in eXplainable AI (XAI), $\textit{even if}$ explanations -- so-called semi-factuals -- have emerged as a popular strategy that explains how a predicted outcome $\textit{can remain the sam...
Saugat Aryal, Mark T. Keane
A Unified Language Model for Large Scale Search, Recommendation, and Reasoning
LLMs are increasingly applied to recommendation, retrieval, and reasoning, yet deploying a single end-to-end model that can jointly support these behaviors over large, heterogeneous catalogs remain...
Marco De Nadai, Edoardo D'Amico, Max Lefarov, Alexandre Tamborrino, Divita Vohra, Mark VanMiddles...
Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing
Recent advancements in diffusion-based image editing pose a significant threat to the authenticity of digital visual content. Traditional embedding-based watermarking methods often introduce percep...
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Xiaojun Chen, Wu Liu, Weiping Wang
Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions
The rapid proliferation of large language models (LLMs) has created an urgent need for robust and generalizable detectors of machine-generated text. Existing benchmarks typically evaluate a single ...
Madhav S. Baidya, S. S. Baidya, Chirag Chawla
Proof-of-Authorship for Diffusion-based AI Generated Content
Recent advancements in AI-generated content (AIGC) have introduced new challenges in intellectual property protection and the authentication of generated objects. We focus on scenarios in which an ...
De Zhang Lee, Han Fang, Ee-Chien Chang
Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality
Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified...
Mengyu Bu, Yang Feng
Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation
Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as h...
Tharun Sethuraman, Subham Agrawal, Nils Dengler, Jorge de Heuvel, Teena Hassan, Maren Bennewitz
Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination
Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textit{HypoT...
Cem Uluoglakci, Tugba Taskaya Temizel
From Optimizable to Interactable: Mixed Digital Twin-Empowered Testing of Vehicle-Infrastructure Cooperation Systems
Sufficient testing under corner cases is critical for the long-term operation of vehicle-infrastructure cooperation systems (VICS). However, existing corner-case generation methods are primarily AI...
Jianghong Dong, Chunying Yang, Mengchi Cai, Chaoyi Chen, Qing Xu, Jianqiang Wang, Keqiang Li
DustNET: enabling machine learning and AI models of dusty plasmas
Dusty plasmas are ubiquitous throughout the universe, spanning laboratory and industrial plasmas, fusion devices, planetary environments, cometary comae, and interstellar media. Despite decades of ...
Zhehui Wang, Justin C. Burton, Niklas Dormagen, Cheng-Ran Du, Yan Feng, John E. Foster, Max Klein...
Learning When to Attend: Conditional Memory Access for Long-Context LLMs
Language models struggle to generalize beyond pretraining context lengths, limiting long-horizon reasoning and retrieval. Continued pretraining on long-context data can help but is expensive due to...
Sakshi Choudhary, Aditya Chattopadhyay, Luca Zancato, Elvis Nunez, Matthew Trager, Wei Xia, Stefa...
VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection
Monocular 3D object detection typically relies on pseudo-labeling techniques to reduce dependency on real-world annotations. Recent advances demonstrate that deterministic linguistic cues can serve...
Chupeng Liu, Jiyong Rao, Shangquan Sun, Runkai Zhao, Weidong Cai
Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control
We present GuidedSAC, a novel reinforcement learning (RL) algorithm that facilitates efficient exploration in vast state-action spaces. GuidedSAC leverages large language models (LLMs) as intellige...
Hao Ma, Zhiqiang Pu, Xiaolin Ai, Huimu Wang
Multi-stage Flow Scheduling for LLM Serving
Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse paralleli...
Yijun Sun, Xudong Liao, Songrun Xie, Hao Chen, Han Tian, Wenxue Li, Yiming Zhang, Kai Chen
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation
Sequential Recommendation (SR) in multimodal settings typically relies on small frozen pretrained encoders, which limits semantic capacity and prevents Collaborative Filtering (CF) signals from bei...
Junyoung Kim, Woojoo Kim, Jaehyung Lim, Dongha Kim, Hwanjo Yu
Large Language Models as a Semantic Interface and Ethical Mediator in Neuro-Digital Ecosystems: Conceptual Foundations and a Regulatory Imperative
This article introduces and substantiates the concept of Neuro-Linguistic Integration (NLI), a novel paradigm for human-technology interaction where Large Language Models (LLMs) act as a key semant...
Alexander V. Shenderuk-Zhidkov, Alexander E. Hramov