Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

Multimodal Large Language Models (MLLMs) enhance the potential of natural language processing. However, their actual impact on document information extraction remains unclear. In particular, it is ...

Jiyuan Shen, Peiyue Yuan, Atin Ghosh, Yifan Mai, Daniel Dahlmeier

2603.02789 2026-03-03
TESTING

Agentified Assessment of Logical Reasoning Agents

We present a framework for evaluating and benchmarking logical reasoning agents when assessment itself must be reproducible, auditable, and robust to execution failures. Building on agentified asse...

Zhiyu Ni, Yifeng Xiao, Zheng Liang

2603.02788 2026-03-03
AI LLM

Rethinking Code Similarity for Automated Algorithm Design with LLMs

The rise of Large Language Model-based Automated Algorithm Design (LLM-AAD) has transformed algorithm development by autonomously generating code implementations of expert-level algorithms. Unlike ...

Rui Zhang, Zhichao Lu

2603.02787 2026-03-03
AI LLM

From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comp...

Weikang Shi, Houxing Ren, Junting Pan, Aojun Zhou, Ke Wang, Zimu Lu, Yunqiao Yang, Yuxuan Hu, Lin...

2603.02775 2026-03-03
AI LLM

Agentic Self-Evolutionary Replanning for Embodied Navigation

Failure is inevitable for embodied navigation in complex environments. To enhance the resilience, replanning (RP) is a viable option, where the robot is allowed to fail, but is capable of adjusting...

Guoliang Li, Ruihua Han, Chengyang Li, He Li, Shuai Wang, Wenchao Ding, Hong Zhang, Chengzhong Xu

2603.02772 2026-03-03
AI LLM

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

Coding agents are increasingly used as general-purpose problem solvers, but their flexibility does not by itself confer the domain expertise needed for specialized tasks. Recent work addresses this...

Salaheddin Alzubi, Noah Provenzano, Jaydon Bingham, Weiyuan Chen, Tu Vu

2603.02766 2026-03-03
AI LLM

Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

Multimodal large language models (MLLMs) suffer from pronounced hallucinations in remote sensing visual question-answering (RS-VQA), primarily caused by visual grounding failures in large-scale sce...

Yi Liu, Jing Zhang, Di Wang, Xiaoyu Tian, Haonan Guo, Bo Du

2603.02754 2026-03-03
AI LLM

CoShadow: Multi-Object Shadow Generation for Image Compositing via Diffusion Model

Realistic shadow generation is crucial for achieving seamless image compositing, yet existing methods primarily focus on single-object insertion and often fail to generalize when multiple foregroun...

Waqas Ahmed, Dean Diepeveen, Ferdous Sohel

2603.02743 2026-03-03
AI LLM

Hardware Implementation of Photonic Spiking Hash Retrieval

Hashing retrieval is a pivotal technology for large-scale similarity search, widely applied in retrieval-augmented generation (RAG) for large language models (LLMs), massive image repositories, and...

Shangxuan Shi, Shuiying Xiang, Xintao Zeng, Yonghang Chen, Wanting Yu, Yahui Zhang, Xingxing Guo,...

2603.02738 2026-03-03
AI LLM

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference

Conventional LLM inference architectures suffer from high energy and latency due to frequent data movement across memory hierarchies. We propose Ouroboros, a wafer-scale SRAM-based Computing-in-Mem...

Yiqi Liu, Yudong Pan, Mengdi Wang, Shixin Zhao, Haonan Zhu, Yinhe Han, Lei Zhang, Ying Wang

2603.02737 2026-03-03
TESTING

Two-stage Convolutional Neural Network for six-dimensional phase space reconstruction

In particle accelerators, full knowledge of the six-dimensional (6D) beam phase space is crucial but difficult to obtain with conventional beam diagnostics. We develop a two-stage convolutional neu...

Sayantan Mukherjee, Masao Kuriki, Zachary John Liptak, Hitoshi Hayano, Masakazu Kurata, Nobuhiro ...

2603.02733 2026-03-03
TESTING

Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids

This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, ...

Mathuranathan Mayuravaani, W. Bastiaan Kleijn, Andrew Lensen, Charlotte Sørensen

2603.02724 2026-03-03
AI LLM

An Empirical Analysis of Calibration and Selective Prediction in Multimodal Clinical Condition Classification

As artificial intelligence systems move toward clinical deployment, ensuring reliable prediction behavior is fundamental for safety-critical decision-making tasks. One proposed safeguard is selecti...

L. Julián Lechuga López, Farah E. Shamout, Tim G. J. Rudner

2603.02719 2026-03-03
TESTING

Decoupling Intrinsic Molecular Efficacy from Platform Effects: An Interpretable Machine Learning Framework for Unbiased Perovskite Passivator Discovery

Rational design of interface passivators for perovskite solar cells is hindered by the entanglement of intrinsic molecular efficacy with extrinsic platform-dependent performance - a confounding fac...

Jing Zhang, Ziyuan Li, Shan Gao, Zhen Zhu, Jing Wang, Xiangmei Duan

2603.02717 2026-03-03
AI LLM

Designing XY and Dzyaloshinskii--Moriya couplings in Majorana Cooper pair boxes

We theoretically study how to design spin couplings in networks of Majorana Cooper pair boxes (MCBs) connected by multiple normal-metal leads. The inter-box interaction is generated by the conducti...

Manato Teranishi, Shintaro Hoshino, Ai Yamakage

2603.02713 2026-03-03
AI LLM

From "What" to "How": Constrained Reasoning for Autoregressive Image Generation

Autoregressive image generation has seen recent improvements with the introduction of chain-of-thought and reinforcement learning. However, current methods merely specify "What" details to depict b...

Ruxue Yan, Xubo Liu, Wenya Guo, Zhengkun Zhang, Ying Zhang, Xiaojie Yuan

2603.02712 2026-03-03
AI LLM

A Natural Language Agentic Approach to Study Affective Polarization

Affective polarization has been central to political and social studies, with growing focus on social media, where partisan divisions are often exacerbated. Real-world studies tend to have limited ...

Stephanie Anneris Malvicini, Ewelina Gajewska, Arda Derbent, Katarzyna Budzynska, Jarosław A. Chu...

2603.02711 2026-03-03
AI LLM

FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

The financial domain involves a variety of important time-series problems. Recently, time-series analysis methods that jointly leverage textual and numerical information have gained increasing atte...

Jaehoon Lee, Suhwan Park, Tae Yoon Lim, Seunghan Lee, Jun Seo, Dongwan Kang, Hwanil Choi, Minjae ...

2603.02702 2026-03-03
AI LLM

Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization

Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement lea...

Yueyang Cang, Xiaoteng Zhang, Erlu Zhao, Zehua Ji, Yuhang Liu, Yuchen He, Zhiyuan Ning, Chen Yiju...

2603.02701 2026-03-03
TESTING

Exact Moment Estimation of Stochastic Differential Dynamics

Moment estimation for stochastic differential equations (SDEs) is fundamental to the formal reasoning and verification of stochastic dynamical systems, yet remains challenging and is rarely availab...

Shenghua Feng, Jie An, Naijun Zhan, Fanjiang Xu

2603.02696 2026-03-03