Papers
Research papers from arXiv and related sources
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, f...
Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
GoAgent: Group-of-Agents Communication Topology Generation for LLM-based Multi-Agent Systems
Large language model (LLM)-based multi-agent systems (MAS) have demonstrated exceptional capabilities in solving complex tasks, yet their effectiveness depends heavily on the underlying communicati...
Hongjiang Chen, Xin Zheng, Yixin Liu, Pengfei Jiao, Shiyuan Li, Huan Liu, Zhidong Zhao, Ziqi Xu, ...
ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
Text-to-image diffusion models achieve high visual fidelity but surprisingly exhibit systematic failures in numerical control when prompts specify explicit object counts. To address this limitation...
Mohammad Shahab Sepehri, Asal Mehradfar, Berk Tinaz, Salman Avestimehr, Mahdi Soltanolkotabi
Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach
This paper presents a novel prompt engineering framework for trait specific Automatic Essay Scoring (AES) in Arabic, leveraging large language models (LLMs) under zero-shot and few-shot configurati...
Salim Al Mandhari, Hieu Pham Dinh, Mo El-Haj, Paul Rayson
GenFacet: End-to-End Generative Faceted Search via Multi-Task Preference Alignment in E-Commerce
Faceted search acts as a critical bridge for navigating massive ecommerce catalogs, yet traditional systems rely on static rule-based extraction or statistical ranking, struggling with emerging voc...
Zhouwei Zhai, Min Yang, Jin Li
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
The key-value (KV) cache is widely treated as essential state in transformer inference, and a large body of work engineers policies to compress, evict, or approximate its entries. We prove that thi...
Kaleem Ullah Qasim, Jiashu Zhang, Muhammad Kafeel Shaheen, Razan Alharith, Heying Zhang
Accurate Open-Loop Control of a Soft Continuum Robot Through Visually Learned Latent Representations
This work addresses open-loop control of a soft continuum robot (SCR) from video-learned latent dynamics. Visual Oscillator Networks (VONs) from previous work are used, that provide mechanistically...
Henrik Krauss, Johann Licher, Naoya Takeishi, Annika Raatz, Takehisa Yairi
Ensembles-based Feature Guided Analysis
Recent Deep Neural Networks (DNN) applications ask for techniques that can explain their behavior. Existing solutions, such as Feature Guided Analysis (FGA), extract rules on their internal behavio...
Federico Formica, Stefano Gregis, Andrea Rota, Aurora Francesca Zanenga, Mark Lawford, Claudio Me...
PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization
Social platforms serve as central hubs for information exchange, where user behaviors and platform interventions jointly shape opinions. However, intervention policies like recommendation and conte...
Renhong Huang, Ning Tang, Jiarong Xu, Yuxuan Cao, Qingqian Tu, Sheng Guo, Bo Zheng, Huiyuan Liu, ...
HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning
Although agentic workflows have demonstrated strong potential for solving complex tasks, existing automated generation methods remain inefficient and underperform, as they rely on predefined operat...
Beibei Xu, Yutong Ye, Chuyun Shen, Yingbo Zhou, Cheng Chen, Mingsong Chen
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
The exponential expansion of context windows in LLMs has unlocked capabilities for long-document understanding but introduced severe bottlenecks in inference latency and information utilization. Ex...
Zhengpei Hu, Kai Li, Dapeng Fu, Chang Zeng, Yue Li, Yuanhao Tang, Jianqiang Huang
MetaCues: Enabling Critical Engagement with Generative AI for Information Seeking and Sensemaking
Generative AI (GenAI) search tools are increasingly used for information seeking, yet their design tends to encourage cognitive offloading, which may lead to passive engagement, selective attention...
Anjali Singh, Karan Taneja, Zhitong Guan, Soo Young Rieh
Dual Prompt-Driven Feature Encoding for Nighttime UAV Tracking
Robust feature encoding constitutes the foundation of UAV tracking by enabling the nuanced perception of target appearance and motion, thereby playing a pivotal role in ensuring reliable tracking. ...
Yiheng Wang, Changhong Fu, Liangliang Yao, Haobo Zuo, Zijie Zhang
A Concept of Next-Generation Atmospheric Cherenkov Telescope Array (NG-ACTA)
The Next-Generation Atmospheric Cherenkov Telescope Array (NG-ACTA) is proposed as a prospective infrastructure for very high energy (VHE) gamma-ray astronomy, consisting of a mixed-aperture array ...
Jiancheng Wang, Jirong Mao
Blow-up of solutions to the Euler-Poisson-Darbox equation with critical power nonlinearity
In our recent precious work, we established the finite time blow up result and upper bound of lifespan estimate to the singular Cauchy problem of semilinear Euler-Poisson-Darboux equation in R^n wi...
Mengting Fan, Ning-An Lai, Hiroyuki Takamura
Universal method for optimized robustness in self-testing of quantum resources
Self-testing is a phenomenon where the use of specific quantum states or measurements can be inferred solely from the correlations they generate. We introduce a universal method for conducting robu...
Shin-Liang Chen, Nikolai Miklin
Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL
In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been...
Xuhan Tong, Yuchen Zeng, Jiawei Zhang
ParallelVLM: Lossless Video-LLM Acceleration with Visual Alignment Aware Parallel Speculative Decoding
Although current Video-LLMs achieve impressive performance in video understanding tasks, their autoregressive decoding efficiency remains constrained by the massive number of video tokens. Visual t...
Quan Kong, Yuhao Shen, Yicheng Ji, Huan Li, Cong Wang
Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning
Video generation models are increasingly used as world simulators for storytelling, simulation, and embodied AI. As these models advance, a key question arises: do generated videos obey the physica...
Qin Zhang, Peiyu Jing, Hong-Xing Yu, Fangqiang Ding, Fan Nie, Weimin Wang, Yilun Du, James Zou, J...
CO-EVOLVE: Bidirectional Co-Evolution of Graph Structure and Semantics for Heterophilous Learning
The integration of Large Language Models (LLMs) and Graph Neural Networks (GNNs) promises to unify semantic understanding with structural reasoning, yet existing methods typically rely on static, u...
Jinming Xing, Muhammad Shahzad