Papers
Research papers from arXiv and related sources
When More Is Less: A Systematic Analysis of Spatial and Commonsense Information for Visual Spatial Reasoning
Visual spatial reasoning (VSR) remains challenging for modern vision-language models (VLMs), despite advances in multimodal architectures. A common strategy is to inject additional information at i...
Muku Akasaka, Soyeon Caren Han
Structurally Aligned Subtask-Level Memory for Software Engineering Agents
Large Language Models (LLMs) have demonstrated significant potential as autonomous software engineering (SWE) agents. Recent work has further explored augmenting these agents with memory mechanisms...
Kangning Shen, Jingyuan Zhang, Chenxi Sun, Wencong Zeng, Yang Yue
MixSarc: A Bangla-English Code-Mixed Corpus for Implicit Meaning Identification
Bangla-English code-mixing is widespread across South Asian social media, yet resources for implicit meaning identification in this setting remain scarce. Existing sentiment and sarcasm models larg...
Kazi Samin Yasar Alam, Md Tanbir Chowdhury, Tamim Ahmed, Ajwad Abrar, Md Rafid Haque
Inverse prediction of capacitor multiphysics dynamic parameters using deep generative model
Finite element simulations are run by package design engineers to model design structures. The process is irreversible meaning every minute structural adjustment requires a fresh input parameter ru...
Kart-Leong Lim, Rahul Dutta, Mihai Rotaru
Towards Autonomous Graph Data Analytics with Analytics-Augmented Generation
This paper argues that reliable end-to-end graph data analytics cannot be achieved by retrieval- or code-generation-centric LLM agents alone. Although large language models (LLMs) provide strong re...
Qiange Wang, Chaoyi Chen, Jingqi Gao, Zihan Wang, Yanfeng Zhang, Ge Yu
AQR-HNSW: Accelerating Approximate Nearest Neighbor Search via Density-aware Quantization and Multi-stage Re-ranking
Approximate Nearest Neighbor (ANN) search has become fundamental to modern AI infrastructure, powering recommendation systems, search engines, and large language models across industry leaders from...
Ganap Ashit Tewary, Nrusinga Charan Gantayat, Jeff Zhang
Retrieval Challenges in Low-Resource Public Service Information: A Case Study on Food Pantry Access
Public service information systems are often fragmented, inconsistently formatted, and outdated. These characteristics create low-resource retrieval environments that hinder timely access to critic...
Touseef Hasan, Laila Cure, Souvika Sarkar
SPOC: Safety-Aware Planning Under Partial Observability And Physical Constraints
Embodied Task Planning with large language models faces safety challenges in real-world environments, where partial observability and physical constraints must be respected. Existing benchmarks oft...
Hyungmin Kim, Hobeom Jeon, Dohyung Kim, Minsu Jang, Jeahong Kim
Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection
Generative images have proliferated on Web platforms in social media and online copyright distribution scenarios, and semantic watermarking has increasingly been integrated into diffusion models to...
Zheng Gao, Xiaoyu Li, Zhicheng Bao, Xiaoyan Feng, Jiaojiao Jiang
CADC: Content Adaptive Diffusion-Based Generative Image Compression
Diffusion-based generative image compression has demonstrated remarkable potential for achieving realistic reconstruction at ultra-low bitrates. The key to unlocking this potential lies in making t...
Xihua Sheng, Lingyu Zhu, Tianyu Zhang, Dong Liu, Shiqi Wang, Jing Wang
Hall effect on nontrivial quadrupole order in quasi-kagome compound URhSn
This study focuses on the transport properties of the quasi-kagome compound URhSn, which exhibits successive phase transitions at TC =16 K (ferromagnetic phase) and TO =54 K (intermediate phase). A...
Yusei Shimizu, Arvind Maurya, Yoshiya Homma, Motoi Kimata, Toni Helm, Ai Nakamura, Dexin Li, Atsu...
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete output space. Existing methods use a calibrated scalar evaluat...
Sweta Karlekar, Carolina Zheng, Magnus Saebo, Nicolas Beltran-Velez, Shuyang Yu, John Bowlan, Mic...
Exploring Human-Machine Coexistence in Symmetrical Reality
In the context of the evolution of artificial intelligence (AI), the interaction between humans and AI entities has become increasingly salient, challenging the conventional human-centric paradigms...
Zhenliang Zhang
Power and Limitations of Aggregation in Compound AI Systems
When designing compound AI systems, a common approach is to query multiple copies of the same model and aggregate the responses to produce a synthesized output. Given the homogeneity of these model...
Nivasini Ananthakrishnan, Meena Jagadeesan
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cach...
Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Sha...
RAC: Relation-Aware Cache Replacement for Large Language Models
The scaling of Large Language Model (LLM) services faces significant cost and latency challenges, making effective caching under tight capacity crucial. Existing cache replacement policies, from he...
Yuchong Wu, Zihuan Xu, Wangze Ni, Peng Cheng, Lei Chen, Xuemin Lin, Heng Tao Shen, Kui Ren
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early results, ARL ...
Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Den...
Reasoning-Driven Design of Single Atom Catalysts via a Multi-Agent Large Language Model Framework
Large language models (LLMs) are becoming increasingly applied beyond natural language processing, demonstrating strong capabilities in complex scientific tasks that traditionally require human exp...
Dong Hyeon Mok, Seoin Back, Victor Fung, Guoxiang Hu
One Brain, Omni Modalities: Towards Unified Non-Invasive Brain Decoding with Large Language Models
Deciphering brain function through non-invasive recordings requires synthesizing complementary high-frequency electromagnetic (EEG/MEG) and low-frequency metabolic (fMRI) signals. However, despite ...
Changli Tang, Shurui Li, Junliang Wang, Qinfan Xiao, Zhonghao Zhai, Lei Bai, Yu Qiao, Bowen Zhou,...
Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning
AI agents with tool-use capabilities show promise for integrating the domain expertise of various tools. In the medical field, however, tools are usually AI models that are inherently error-prone a...
Zheang Huai, Honglong Yang, Xiaomeng Li