Papers
Research papers from arXiv and related sources
OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets
Multimodal Large Language Models (MLLMs) enhance the potential of natural language processing. However, their actual impact on document information extraction remains unclear. In particular, it is ...
Jiyuan Shen, Peiyue Yuan, Atin Ghosh, Yifan Mai, Daniel Dahlmeier
Agentified Assessment of Logical Reasoning Agents
We present a framework for evaluating and benchmarking logical reasoning agents when assessment itself must be reproducible, auditable, and robust to execution failures. Building on agentified asse...
Zhiyu Ni, Yifeng Xiao, Zheng Liang
Rethinking Code Similarity for Automated Algorithm Design with LLMs
The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike ...
Rui Zhang, Zhichao Lu
From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench
Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comp...
Weikang Shi, Houxing Ren, Junting Pan, Aojun Zhou, Ke Wang, Zimu Lu, Yunqiao Yang, Yuxuan Hu, Lin...
Agentic Self-Evolutionary Replanning for Embodied Navigation
Failure is inevitable for embodied navigation in complex environments. To enhance the resilience, replanning (RP) is a viable option, where the robot is allowed to fail, but is capable of adjusting...
Guoliang Li, Ruihua Han, Chengyang Li, He Li, Shuai Wang, Wenchao Ding, Hong Zhang, Chengzhong Xu
EvoSkill: Automated Skill Discovery for Multi-Agent Systems
Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this...
Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, Tu Vu
Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing
Multimodal large language models (MLLMs) suffer from pronounced hallucinations in remote sensing visual question-answering (RS-VQA), primarily caused by visual grounding failures in large-scale sce...
Yi Liu, Jing Zhang, Di Wang, Xiaoyu Tian, Haonan Guo, Bo Du
CoShadow: Multi-Object Shadow Generation for Image Compositing via Diffusion Model
Realistic shadow generation is crucial for achieving seamless image compositing, yet existing methods primarily focus on single-object insertion and often fail to generalize when multiple foregroun...
Waqas Ahmed, Dean Diepeveen, Ferdous Sohel
Hardware Implementation of Photonic Spiking Hash Retrieval
Hashing retrieval is a pivotal technology for large-scale similarity search, widely applied in retrieval-augmented generation (RAG) for large language models (LLMs), massive image repositories, and...
Shangxuan Shi, Shuiying Xiang, Xintao Zeng, Yonghang Chen, Wanting Yu, Yahui Zhang, Xingxing Guo,...
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
Conventional LLM inference architectures suffer from high energy and latency due to frequent data movement across memory hierarchies. We propose Ouroboros, a wafer-scale SRAM-based Computing-in-Mem...
Yiqi Liu, Yudong Pan, Mengdi Wang, Shixin Zhao, Haonan Zhu, Yinhe Han, Lei Zhang, Ying Wang
Two-stage Convolutional Neural Network for six-dimensional phase space reconstruction
In particle accelerators, full knowledge of the six-dimensional (6D) beam phase space is crucial but difficult to obtain with conventional beam diagnostics. We develop a two-stage convolutional neu...
Sayantan Mukherjee, Masao Kuriki, Zachary John Liptak, Hitoshi Hayano, Masakazu Kurata, Nobuhiro ...
Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids
This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, ...
Mathuranathan Mayuravaani, W. Bastiaan Kleijn, Andrew Lensen, Charlotte Sørensen
An Empirical Analysis of Calibration and Selective Prediction in Multimodal Clinical Condition Classification
As artificial intelligence systems move toward clinical deployment, ensuring reliable prediction behavior is fundamental for safety-critical decision-making tasks. One proposed safeguard is selecti...
L. Julián Lechuga López, Farah E. Shamout, Tim G. J. Rudner
Decoupling Intrinsic Molecular Efficacy from Platform Effects: An Interpretable Machine Learning Framework for Unbiased Perovskite Passivator Discovery
Rational design of interface passivators for perovskite solar cells is hindered by the entanglement of intrinsic molecular efficacy with extrinsic platform-dependent performance - a confounding fac...
Jing Zhang, Ziyuan Li, Shan Gao, Zhen Zhu, Jing Wang, Xiangmei Duan
Designing XY and Dzyaloshinskii--Moriya couplings in Majorana Cooper pair boxes
We theoretically study how to design spin couplings in networks of Majorana Cooper pair boxes (MCBs) connected by multiple normal-metal leads. The inter-box interaction is generated by the conducti...
Manato Teranishi, Shintaro Hoshino, Ai Yamakage
From "What" to "How": Constrained Reasoning for Autoregressive Image Generation
Autoregressive image generation has seen recent improvements with the introduction of chain-of-thought and reinforcement learning. However, current methods merely specify "What" details to depict b...
Ruxue Yan, Xubo Liu, Wenya Guo, Zhengkun Zhang, Ying Zhang, Xiaojie Yuan
A Natural Language Agentic Approach to Study Affective Polarization
Affective polarization has been central to political and social studies, with growing focus on social media, where partisan divisions are often exacerbated. Real-world studies tend to have limited ...
Stephanie Anneris Malvicini, Ewelina Gajewska, Arda Derbent, Katarzyna Budzynska, Jarosław A. Chu...
FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing
The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing atte...
Jaehoon Lee, Suhwan Park, Tae Yoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae ...
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement lea...
Yueyang Cang, Xiaoteng Zhang, Erlu Zhao, Zehua Ji, Yuhang Liu, Yuchen He, Zhiyuan Ning, Chen Yiju...
Exact Moment Estimation of Stochastic Differential Dynamics
Moment estimation for stochastic differential equations (SDEs) is fundamental to the formal reasoning and verification of stochastic dynamical systems, yet remains challenging and is rarely availab...
Shenghua Feng, Jie An, Naijun Zhan, Fanjiang Xu