Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on...

Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian, Ophir Frieder, Nazli Goharian

2602.22827 2026-02-26
AI LLM

Face Time Traveller : Travel Through Ages Without Losing Identity

Face aging, an ill-posed problem shaped by environmental and genetic factors, is vital in entertainment, forensics, and digital archiving, where realistic age transformations must preserve both ide...

Purbayan Kar, Ayush Ghadiya, Vishal Chudasama, Pankaj Wasnik, C. V. Jawahar

2602.22819 2026-02-26
AI LLM

When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design

Agentic AI increasingly intervenes proactively by inferring users' situations from contextual data yet often fails for lack of principled judgment about when, why, and whether to act. We address th...

Soyoung Jung, Daehoo Yoon, Sung Gyu Koh, Young Hwan Kim, Yehan Ahn, Sung Park

2602.22814 2026-02-26
AI LLM

Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching

Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperativ...

Hiroki Matsutani, Naoki Matsuda, Naoto Sugiura

2602.22812 2026-02-26
AI LLM

PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning

With the recent fast development of generative models, instruction-based image editing has shown great potential in generating high-quality images. However, the quality of editing highly depends on...

Mingde Yao, Zhiyuan You, Tam-King Man, Menglu Wang, Tianfan Xue

2602.22809 2026-02-26
AI LLM

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with exte...

Shiqian Su, Sen Xing, Xuan Dong, Muyan Zhong, Bin Wang, Xizhou Zhu, Yuntao Chen, Wenhai Wang, Yue...

2602.22808 2026-02-26
AI LLM

Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift

The rapid evolution of large language models (LLMs) has transformed prompt engineering from a localized craft into a systems-level governance challenge. As models scale and update across generation...

Hyunwoo Kim, Hanau Yi, Jaehee Bae, Yumin Kim

2602.22790 2026-02-26
AI LLM

Probing for Knowledge Attribution in Large Language Models

Large language models (LLMs) often generate fluent but unfounded claims, or hallucinations, which fall into two types: (i) faithfulness violations - misusing user context - and (ii) factuality viol...

Ivo Brink, Alexander Boer, Dennis Ulmer

2602.22787 2026-02-26
AI LLM

ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Clinical decisions are often required under incomplete information. Clinical experts must identify whether available information is sufficient for judgment, as both premature conclusion and unneces...

Yusuke Watanabe, Yohei Kobashi, Takeshi Kojima, Yusuke Iwasawa, Yasushi Okuno, Yutaka Matsuo

2602.22771 2026-02-26
AI LLM

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a sign...

Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhila...

2602.22769 2026-02-26
AI LLM

Towards Better RL Training Data Utilization via Second-Order Rollout

Reinforcement Learning (RL) has empowered Large Language Models (LLMs) with strong reasoning capabilities, but vanilla RL mainly focuses on generation capability improvement by training with only f...

Zhe Yang, Yudong Wang, Rang Li, Zhifang Sui

2602.22765 2026-02-26
AI LLM

Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents

The Rust programming language presents a steep learning curve and significant coding challenges, making the automation of issue resolution essential for its broader adoption. Recently, LLM-powered ...

Jiahong Xiang, Wenxiao He, Xihua Wang, Hongliang Tian, Yuqun Zhang

2602.22764 2026-02-26
AI LLM

An AI-Based Structured Semantic Control Model for Stable and Coherent Dynamic Interactive Content Generation

This study addresses the challenge that generative models struggle to balance flexibility, stability, and controllability in complex interactive scenarios. It proposes a controllable generation fra...

Rui Liu

2602.22762 2026-02-26
AI LLM

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to cur...

Philipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao

2602.22760 2026-02-26
AI LLM

Decomposing Physician Disagreement in HealthBench

We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 1...

Satya Borgohain, Roy Mariathas

2602.22758 2026-02-26
AI LLM

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

We introduce AuditBench, an alignment auditing benchmark. AuditBench consists of 56 language models with implanted hidden behaviors. Each model has one of 14 concerning behaviors--such as sycophant...

Abhay Sheshadri, Aidan Ewart, Kai Fronsdal, Isha Gupta, Samuel R. Bowman, Sara Price, Samuel Mark...

2602.22755 2026-02-26
AI LLM

Measurements of branching fractions of $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}π^{+}$ and $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}K^{+}$

Based on a data sample corresponding to an integrated luminosity of 6.4~fb$^{-1}$ of $e^+e^-$ annihilation and collected with the BESIII detector at 13 center-of-mass energy points ranging between ...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Alibert...

2602.22754 2026-02-26
AI LLM

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Condit...

Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger

2602.22752 2026-02-26
AI LLM

The Inference Bottleneck: Antitrust and Neutrality Duties in the Age of Cognitive Infrastructure

As generative AI commercializes, competitive advantage is shifting from one-time model training toward continuous inference, distribution, and routing. At the frontier, large-scale inference can fu...

Gaston Besanson, Marcelo Celani

2602.22750 2026-02-26
AI LLM

SPATIALALIGN: Aligning Dynamic Spatial Relationships in Video Generation

Most text-to-video (T2V) generators prioritize aesthetic quality, but often ignoring the spatial constraints in the generated videos. In this work, we present SPATIALALIGN, a self-improvement frame...

Fengming Liu, Tat-Jen Cham, Chuanxia Zheng

2602.22745 2026-02-26