Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot...

Yudai Noda, Kanji Tanaka

2603.08086 2026-03-09
AI LLM

The AI Amplifier Effect: Defining Human-AI Intimacy and Romantic Relationships with Conversational AI

What does it mean to fall in love with something we know is virtual? The proliferation of conversational AI enables users to create customizable companions, fostering new intimate relationships tha...

Ching Christie Pang, Yi Gao, Xuetong Wang, Pan Hui

2603.08084 2026-03-09
AI LLM

High-Fidelity Pruning for Large Language Models

Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks, yet their significant computational and memory requirements present major challenges for deploym...

Yijun Zhu, Jianxin Wang, Chengchao Shen

2603.08083 2026-03-09
AI LLM

Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval

With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead...

Matei Benescu, Ivo Pascal de Jong

2603.08077 2026-03-09
AI LLM

Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and insp...

Xuesong Wang, Caisheng Wang

2603.08069 2026-03-09
AI LLM

In-Context Reinforcement Learning for Tool Use in Large Language Models

While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex tasks is often constrained by the limitations of their internal knowledge. A compelling approach ...

Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh

2603.08068 2026-03-09
AI LLM

Deterministic Differentiable Structured Pruning for Large Language Models

Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity const...

Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen

2603.08065 2026-03-09
AI LLM

CinemaWorld: Generative Augmented Reality with LLMs and 3D Scene Generation for Movie Augmentation

We introduce CinemaWorld, a generative augmented reality system that augments the viewer's physical surroundings with automatically generated mixed reality 3D content extracted from and synchronize...

Keiichi Ihara, DaeHo Lee, Manato Abe, Hye-Young Jo, Ryo Suzuki

2603.08060 2026-03-09
AI LLM

Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor

Large Language Models (LLMs) are pivotal in natural language processing. The impracticality of full fine-tuning has prompted Parameter-Efficient Fine-Tuning (PEFT) methods like Low-Rank Adaptation ...

Jiayu Huang, Xiaohu Wu, Tiantian He, Qicheng Lao

2603.08058 2026-03-09
AI LLM

BEVLM: Distilling Semantic Knowledge from LLMs into Bird's-Eye View Representations

The integration of Large Language Models (LLMs) into autonomous driving has attracted growing interest for their strong reasoning and semantic understanding abilities, which are essential for handl...

Thomas Monninger, Shaoyuan Xie, Qi Alfred Chen, Sihao Ding

2603.06576 2026-03-06
AI LLM

SUREON: A Benchmark and Vision-Language-Model for Surgical Reasoning

Surgeons don't just see -- they interpret. When an expert observes a surgical scene, they understand not only what instrument is being used, but why it was chosen, what risk it poses, and what come...

Alejandra Perez, Anita Rau, Lee White, Busisiwe Mlambo, Chinedu Nwoye, Muhammad Abdullah Jamal, O...

2603.06570 2026-03-06
AI LLM

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this wor...

Boqiang Zhang, Lei Ke, Ruihan Yang, Qi Gao, Tianyuan Qu, Rossell Chen, Dong Yu, Leoweiliang

2603.06569 2026-03-06
AI LLM

The Pen: Episodic Cognitive Assistance via an Ear-Worn Interface

Wearable AI is often designed as always-available, yet continuous availability can conflict with how people work and socialize, creating discomfort around privacy, disruption, and unclear system bo...

Yonatan Tussa, Andy Heredia

2603.06564 2026-03-06
AI LLM

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Conversational generative AI is rapidly entering healthcare, where general-purpose models must integrate heterogeneous patient signals and support diverse interaction styles while producing clinica...

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo

2603.06542 2026-03-06
AI LLM

Evaluating the Predictability of Selected Weather Extremes with Aurora, an AI Weather Forecast Model

AI weather foundation models now achieve forecast skill comparable to numerical weather prediction at far lower computational cost, yet their predictability for high-impact extremes across dynamica...

Qin Huang, Moyan Liu, Yeongbin Kwon, Upmanu Lall

2603.06516 2026-03-06
AI LLM

When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models

While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal...

Qitong Wang, Haoran Dai, Haotian Zhang, Christopher Rasmussen, Binghui Wang

2603.06508 2026-03-06
AI LLM

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependen...

Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul

2603.06503 2026-03-06
AI LLM

COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

Activation steering methods enable inference-time control of large language model (LLM) behavior without retraining, but current approaches face a fundamental trade-off: sample-efficient methods su...

Kartik Sharma, Rakshit S. Trivedi

2603.06495 2026-03-06
AI LLM

NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches

We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers. Unlike LoRA and other parame...

Ethan Smith

2603.06492 2026-03-06
AI LLM

Do Foundation Models Know Geometry? Probing Frozen Features for Continuous Physical Measurement

Vision-language models encode continuous geometry that their text pathway fails to express: a 6,000-parameter linear probe extracts hand joint angles at 6.1 degrees MAE from frozen features, while ...

Yakov Pyotr Shkolnikov

2603.06459 2026-03-06