Papers
Research papers from arXiv and related sources
Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA
Live streaming commerce has become a prominent form of broadcasting in the modern era. To facilitate more efficient and convenient product promotions for streamers, we present Click-to-Ask, an AI-d...
Ruizhi Yu, Keyang Zhong, Peng Liu, Qi Wu, Haoran Zhang, Yanhao Zhang, Chen Chen, Haonan Lu
An Onto-Relational-Sophic Framework for Governing Synthetic Minds
The rapid evolution of artificial intelligence, from task-specific systems to foundation models exhibiting broad, flexible competence across reasoning, creative synthesis, and social interaction, h...
Huansheng Ning, Jianguo Ding
D-Mem: A Dual-Process Memory System for LLM Agents
Driven by the development of persistent, self-adapting autonomous agents, equipping these systems with high-fidelity memory access for long-horizon reasoning has emerged as a critical requirement. ...
Zhixing You, Jiachen Yuan, Jason Cai
GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
In recent years, AI-generated videos have become increasingly realistic and sophisticated. Meanwhile, Large Vision-Language Models (LVLMs) have shown strong potential for detecting such content. Ho...
Yueying Zou, Pei Pei Li, Zekun Li, Xinyu Guo, Xing Cui, Huaibo Huang, Ran He
REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation
Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene u...
Shuqi Xiao, Maani Ghaffari, Chengzhong Xu, Hui Kong
ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs
Tool-augmented large language models (LLMs) must tightly couple multi-step reasoning with external actions, yet existing benchmarks often confound this interplay with complex environment dynamics, ...
Wanjia Zhao, Ludwig Schmidt, James Zou, Vidhisha Balachandran, Lingjiao Chen
SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization
SQL query comprehension is a significant challenge due to complex syntax, diverse join types, and deep nesting. Many queries lack adequate comments, severely hindering code readability, maintainabi...
Lei Yu, Peng Wang, Jingyuan Zhang, Xin Wang, Jia Xu, Li Yang, Changzhi Deng, Jiajia Ma, Fengjun Z...
From Connectivity to Multi-Orbit Intelligence: Space-Based Data Center Architectures for 6G and Beyond
Direct handset-to-satellite (DHTS) communication is emerging as a core capability of 6G non-terrestrial networks, enabling standard devices to directly access low Earth orbit (LEO) satellites. Whil...
Shimaa Naser, Maryam Tariq, Raneem Abdel-Rahim, De Mi, Azzam Mourad, Hadi Otrok, Mahmoud Al-Qutay...
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
Token pruning is essential for enhancing the computational efficiency of vision-language models (VLMs), particularly for video-based tasks where temporal redundancy is prevalent. Prior approaches t...
Jianrui Zhang, Yue Yang, Rohun Tripathi, Winson Han, Ranjay Krishna, Christopher Clark, Yong Jae ...
AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse
Building LLM-based agents has become increasingly important. Recent works on LLM-based agent self-evolution primarily record successful experiences as textual prompts or reflections, which cannot r...
Zhang Zhang, Shuqi Lu, Hongjin Qian, Di He, Zheng Liu
The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering
We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional train...
Yigit Ekin, Yossi Gandelsman
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis
AI coding agents can resolve real-world software issues, yet they frequently introduce regressions, breaking tests that previously passed. Current benchmarks focus almost exclusively on resolution ...
Pepe Alonso
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
Media design layer generation enables the creation of fully editable, layered design documents such as posters, flyers, and logos using only natural language prompts. Existing methods either restri...
Vlad-Constantin Lungu-Stan, Ionut Mironica, Mariana-Iuliana Georgescu
ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation
Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphol...
Argentina Anna Rescigno, Eva Vanmassenhove, Johanna Monti
Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing
Large language models (LLMs) exhibit latent multi-token prediction (MTP) capabilities despite being trained solely for next-token generation. We propose a simple, training-free MTP approach that pr...
Raghavv Goel, Mukul Gagrani, Mingu Lee, Chris Lott
Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning
The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal prov...
Jingchun Yang, Jinchang Zhang
A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans
Legal age estimation plays a critical role in forensic and medico-legal contexts, where decisions must be supported by accurate, robust, and reproducible methods with explicit uncertainty quantific...
Javier Venema, Stefano De Luca, Pablo Mesejo, Óscar Ibáñez
Training Diffusion Language Models for Black-Box Optimization
We study offline black-box optimization (BBO), aiming to discover improved designs from an offline dataset of designs and labels, a problem common in robotics, DNA, and materials science with limit...
Zipeng Sun, Can Chen, Ye Yuan, Haolun Wu, Jiayao Gu, Christopher Pal, Xue Liu
Only relative ranks matter in weight-clustered large language models
Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is strong...
Borja Aizpurua, Sukhbinder Singh, Román Orús
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia
As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic...
Priyaranjan Pattnayak, Sanchari Chowdhuri