Papers
Research papers from arXiv and related sources
From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation
Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot...
Yudai Noda, Kanji Tanaka
The AI Amplifier Effect: Defining Human-AI Intimacy and Romantic Relationships with Conversational AI
What does it mean to fall in love with something we know is virtual? The proliferation of conversational AI enables users to create customizable companions, fostering new intimate relationships tha...
Ching Christie Pang, Yi Gao, Xuetong Wang, Pan Hui
High-Fidelity Pruning for Large Language Models
Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks, yet their significant computational and memory requirements present major challenges for deploym...
Yijun Zhu, Jianxin Wang, Chengchao Shen
Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval
With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead...
Matei Benescu, Ivo Pascal de Jong
Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models
Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and insp...
Xuesong Wang, Caisheng Wang
In-Context Reinforcement Learning for Tool Use in Large Language Models
While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex tasks is often constrained by the limitations of their internal knowledge. A compelling approach ...
Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh
Deterministic Differentiable Structured Pruning for Large Language Models
Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity const...
Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen
CinemaWorld: Generative Augmented Reality with LLMs and 3D Scene Generation for Movie Augmentation
We introduce CinemaWorld, a generative augmented reality system that augments the viewer's physical surroundings with automatically generated mixed reality 3D content extracted from and synchronize...
Keiichi Ihara, DaeHo Lee, Manato Abe, Hye-Young Jo, Ryo Suzuki
Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor
Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation ...
Jiayu Huang, Xiaohu Wu, Tiantian He, Qicheng Lao
BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations
The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handl...
Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding
SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning
Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what come...
Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, O...
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this wor...
Boqiang Zhang, Lei Ke, Ruihan Yang, Qi Gao, Tianyuan Qu, Rossell Chen, Dong Yu, Leoweiliang
The Pen: Episodic Cognitive Assistance via an Ear-Worn Interface
Wearable AI is often designed as always-available, yet continuous availability can conflict with how people work and socialize, creating discomfort around privacy, disruption, and unclear system bo...
Yonatan Tussa, Andy Heredia
RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
Conversational generative AI is rapidly entering healthcare, where general-purpose models must integrate heterogeneous patient signals and support diverse interaction styles while producing clinica...
Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo
Evaluating the Predictability of Selected Weather Extremes with Aurora, an AI Weather Forecast Model
AI weather foundation models now achieve forecast skill comparable to numerical weather prediction at far lower computational cost, yet their predictability for high-impact extremes across dynamica...
Qin Huang, Moyan Liu, Yeongbin Kwon, Upmanu Lall
When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models
While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal...
Qitong Wang, Haoran Dai, Haotian Zhang, Christopher Rasmussen, Binghui Wang
Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing
Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependen...
Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods su...
Kartik Sharma, Rakshit S. Trivedi
NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers. Unlike LoRA and other parame...
Ethan Smith
Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement
Vision-language models encode continuous geometry that their text pathway fails to express: a 6,000-parameter linear probe extracts hand joint angles at 6.1 degrees MAE from frozen features, while ...
Yakov Pyotr Shkolnikov