Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

ProductResearch: Training E-Commerce Deep Research Agents via Multi-Agent Synthetic Trajectory Distillation

Large Language Model (LLM)-based agents show promise for e-commerce conversational shopping, yet existing implementations lack the interaction depth and contextual breadth required for complex prod...

Jiangyuan Wang, Kejun Xiao, Huaipeng Zhao, Tao Luo, Xiaoyi Zeng

2602.23716 2026-02-27
AI LLM

A Reliable Indoor Navigation System for Humans Using AR-based Technique

Reliable navigation systems are not available indoors, such as in campuses and small areas. Users must depend on confusing, time-consuming static signage or floor maps. In this paper, an AR-based t...

Vijay U. Rathod, Manav S. Sharma, Shambhavi Verma, Aadi Joshi, Sachin Aage, Sujal Shahane

2602.23706 2026-02-27
AI LLM

From Flat Logs to Causal Graphs: Hierarchical Failure Attribution for LLM-based Multi-Agent Systems

LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechanisms. Existing failure attribution met...

Yawen Wang, Wenjie Wu, Junjie Wang, Qing Wang

2602.23701 2026-02-27
TESTING

Privacy-Preserving Local Energy Trading Considering Network Fees

Driven by the widespread deployment of distributed energy resources, local energy markets (LEMs) have emerged as a promising approach for enabling direct trades among prosumers and consumers to bal...

Eman Alqahtani, Mustafa A. Mustafa

2602.23698 2026-02-27
AI LLM

Does Personalized Nudging Wear Off? A Longitudinal Study of AI Self-Modeling for Behavioral Engagement

Sustaining the effectiveness of behavior change technologies remains a key challenge. AI self-modeling, which generates personalized portrayals of one's ideal self, has shown promise for motivating...

Qing He, Zeyu Wang, Yuzhou Du, Jiahuan Ding, Yuanchun Shi, Yuntao Wang

2602.23688 2026-02-27
AI LLM

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference

The paradigm of large language model (LLM) reasoning is shifting from parameter scaling to test-time compute scaling, yet many existing approaches still rely on uniform brute-force sampling (for ex...

Siyuan Ma, Bo Gao, Xiaojun Jia, Simeng Qin, Tianlin Li, Ke Ma, Xiaoshuang Jia, Wenqi Ren, Yang Liu

2602.23681 2026-02-27
AI LLM

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering

Automated radiology report generation using vision-language models (VLMs) is limited by the risk of prior-comparison hallucination, where the model generates historical findings unsupported by the ...

Ao Li, Rui Liu, Mingjie Li, Sheng Liu, Lei Wang, Xiaodan Liang, Lina Yao, Xiaojun Chang, Lei Xing

2602.23676 2026-02-27
AI LLM

PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents

Large language model (LLM) agents typically rely on reactive decision-making paradigms such as ReAct, selecting actions conditioned on growing execution histories. While effective for short tasks, ...

Yihan, Wen, Xin Chen

2602.23668 2026-02-27
AI LLM

TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining

TRIZ-based contradiction mining is a fundamental task in patent analysis and systematic innovation, as it enables the identification of improving and worsening technical parameters that drive inven...

Zitong Xu, Yuqing Wu, Yue Zhao

2602.23656 2026-02-27
TESTING

ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models

Large-scale Vision-Language Models (VLMs) exhibit strong zero-shot recognition, yet their real-world deployment is challenged by distribution shifts. While Test-Time Adaptation (TTA) can mitigate t...

Wei Luo, Yangfan Ou, Jin Deng, Zeshuai Deng, Xiquan Yan, Zhiquan Wen, Mingkui Tan

2602.23653 2026-02-27
AI LLM

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech

We introduce AudioCapBench, a benchmark for evaluating audio captioning capabilities of large multimodal models. \method covers three distinct audio domains, including environmental sound, music, a...

Jielin Qiu, Jianguo Zhang, Zixiang Chen, Liangwei Yang, Ming Zhu, Juntao Tan, Haolin Chen, Wentin...

2602.23649 2026-02-27
AI LLM

SGAgent: Suggestion-Guided LLM-Based Multi-Agent Framework for Repository-Level Software Repair

The rapid advancement of Large Language Models (LLMs) has led to the emergence of intelligent agents capable of autonomously interacting with environments and invoking external tools. Recently, age...

Quanjun Zhang, Chengyu Gao, Yu Han, Ye Shang, Chunrong Fang, Zhenyu Chen, Liang Xiao

2602.23647 2026-02-27
AI LLM

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Everyone from AI executives and researchers to doomsayers, politicians, and activists is talking about Artificial General Intelligence (AGI). Yet, they often don't seem to agree on its exact defini...

Judah Goldfeder, Philippe Wyder, Yann LeCun, Ravid Shwartz Ziv

2602.23643 2026-02-27
TESTING

Stress-Testing Assumptions: A Guide to Bayesian Sensitivity Analyses in Causal Inference

While observational data are routinely used to estimate causal effects of biomedical treatments, doing so requires special methods to adjust for observed confounding. These methods invariably rely ...

Arman Oganisian

2602.23640 2026-02-27
TESTING

Learning to Reflect and Correct: Towards Better Decoding Trajectories for Large-Scale Generative Recommendation

Generative Recommendation (GR) has become a promising paradigm for large-scale recommendation systems. However, existing GR models typically perform single-pass decoding without explicit refinement...

Haibo Xing, Hao Deng, Lingyu Mu, Jinxin Hu, Yu Zhang, Xiaoyi Zeng, Jing Zhang

2602.23639 2026-02-27
AI LLM

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a...

Zhihao Ding, Jinming Li, Ze Lu, Jieming Shi

2602.23636 2026-02-27
AI LLM

When LLMs Help -- and Hurt -- Teaching Assistants in Proof-Based Courses

Teaching assistants (TAs) are essential to grading and feedback provision in proof-based courses, yet these tasks are time-intensive and difficult to scale. Although Large Language Models (LLMs) ha...

Romina Mahinpei, Sofiia Druchyna, Manoel Horta Ribeiro

2602.23635 2026-02-27
TESTING

MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, a...

Lun Zhan, Feng Xiong, Huanyong Liu, Feng Zhang, Yuhui Yin

2602.23632 2026-02-27
AI LLM

Toward E2E Intelligence in 6G Networks: An AI Agent-Based RAN-CN Converged Intelligence Framework

Recent advances in intelligent network control have primarily relied on task-specific Artificial Intelligence (AI) models deployed separately within the Radio Access Network (RAN) and Core Network ...

Youbin Han, Haneul Ko, Namseok Ko, Tarik Taleb, Yan Chen

2602.23623 2026-02-27
AI LLM

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Significant progress has been made in the field of Instruction-based Image Editing Models (IIEMs). However, while these models demonstrate plausible adherence to instructions and strong reasoning a...

Shibo Hong, Boxian Ai, Jun Kuang, Wei Wang, FengJiao Chen, Zhongyuan Peng, Chenhao Huang, Yixin Cao

2602.23622 2026-02-27