Personal Assistant Web

AI LLM

TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

This paper presents a comprehensive evaluation framework for assessing the cultural competence of large language models (LLMs) in Persian. Existing Persian cultural benchmarks rely predominantly on...

Reihaneh Iranmanesh, Saeedeh Davoudi, Pasha Abrishamchian, Ophir Frieder, Nazli Goharian

2602.22827 • 2026-02-26

View PDF

AI LLM

Face Time Traveller : Travel Through Ages Without Losing Identity

Face aging, an ill-posed problem shaped by environmental and genetic factors, is vital in entertainment, forensics, and digital archiving, where realistic age transformations must preserve both ide...

Purbayan Kar, Ayush Ghadiya, Vishal Chudasama, Pankaj Wasnik, C. V. Jawahar

2602.22819 • 2026-02-26

View PDF

AI LLM

When Should an AI Act? A Human-Centered Model of Scene, Context, and Behavior for Agentic AI Design

Agentic AI increasingly intervenes proactively by inferring users' situations from contextual data yet often fails for lack of principled judgment about when, why, and whether to act. We address th...

Soyoung Jung, Daehoo Yoon, Sung Gyu Koh, Young Hwan Kim, Yehan Ahn, Sung Park

2602.22814 • 2026-02-26

View PDF

AI LLM

Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching

Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperativ...

Hiroki Matsutani, Naoki Matsuda, Naoto Sugiura

2602.22812 • 2026-02-26

View PDF

AI LLM

PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning

With the recent fast development of generative models, instruction-based image editing has shown great potential in generating high-quality images. However, the quality of editing highly depends on...

Mingde Yao, Zhiyuan You, Tam-King Man, Menglu Wang, Tianfan Xue

2602.22809 • 2026-02-26

View PDF

AI LLM

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with exte...

Shiqian Su, Sen Xing, Xuan Dong, Muyan Zhong, Bin Wang, Xizhou Zhu, Yuntao Chen, Wenhai Wang, Yue...

2602.22808 • 2026-02-26

View PDF

AI LLM

Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift

The rapid evolution of large language models (LLMs) has transformed prompt engineering from a localized craft into a systems-level governance challenge. As models scale and update across generation...

Hyunwoo Kim, Hanau Yi, Jaehee Bae, Yumin Kim

2602.22790 • 2026-02-26

View PDF

AI LLM

Probing for Knowledge Attribution in Large Language Models

Large language models (LLMs) often generate fluent but unfounded claims, or hallucinations, which fall into two types: (i) faithfulness violations - misusing user context - and (ii) factuality viol...

Ivo Brink, Alexander Boer, Dennis Ulmer

2602.22787 • 2026-02-26

View PDF

AI LLM

ClinDet-Bench: Beyond Abstention, Evaluating Judgment Determinability of LLMs in Clinical Decision-Making

Clinical decisions are often required under incomplete information. Clinical experts must identify whether available information is sufficient for judgment, as both premature conclusion and unneces...

Yusuke Watanabe, Yohei Kobashi, Takeshi Kojima, Yusuke Iwasawa, Yasushi Okuno, Yutaka Matsuo

2602.22771 • 2026-02-26

View PDF

AI LLM

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a sign...

Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhila...

2602.22769 • 2026-02-26

View PDF

AI LLM

Towards Better RL Training Data Utilization via Second-Order Rollout

Reinforcement Learning (RL) has empowered Large Language Models (LLMs) with strong reasoning capabilities, but vanilla RL mainly focuses on generation capability improvement by training with only f...

Zhe Yang, Yudong Wang, Rang Li, Zhifang Sui

2602.22765 • 2026-02-26

View PDF

AI LLM

Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents

The Rust programming language presents a steep learning curve and significant coding challenges, making the automation of issue resolution essential for its broader adoption. Recently, LLM-powered ...

Jiahong Xiang, Wenxiao He, Xihua Wang, Hongliang Tian, Yuqun Zhang

2602.22764 • 2026-02-26

View PDF

AI LLM

An AI-Based Structured Semantic Control Model for Stable and Coherent Dynamic Interactive Content Generation

This study addresses the challenge that generative models struggle to balance flexibility, stability, and controllability in complex interactive scenarios. It proposes a controllable generation fra...

Rui Liu

2602.22762 • 2026-02-26

View PDF

AI LLM

Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to cur...

Philipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao

2602.22760 • 2026-02-26

View PDF

AI LLM

Decomposing Physician Disagreement in HealthBench

We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 1...

Satya Borgohain, Roy Mariathas

2602.22758 • 2026-02-26

View PDF

AI LLM

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

We introduce AuditBench, an alignment auditing benchmark. AuditBench consists of 56 language models with implanted hidden behaviors. Each model has one of 14 concerning behaviors--such as sycophant...

Abhay Sheshadri, Aidan Ewart, Kai Fronsdal, Isha Gupta, Samuel R. Bowman, Sara Price, Samuel Mark...

2602.22755 • 2026-02-26

View PDF

AI LLM

Measurements of branching fractions of $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}π^{+}$ and $Λ_{c}^{+}\toΣ^{0}K_{S}^{0}K^{+}$

Based on a data sample corresponding to an integrated luminosity of 6.4~fb$^{-1}$ of $e^+e^-$ annihilation and collected with the BESIII detector at 13 center-of-mass energy points ranging between ...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Alibert...

2602.22754 • 2026-02-26

View PDF

AI LLM

Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Condit...

Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger

2602.22752 • 2026-02-26

View PDF

AI LLM

The Inference Bottleneck: Antitrust and Neutrality Duties in the Age of Cognitive Infrastructure

As generative AI commercializes, competitive advantage is shifting from one-time model training toward continuous inference, distribution, and routing. At the frontier, large-scale inference can fu...

Gaston Besanson, Marcelo Celani

2602.22750 • 2026-02-26

View PDF

AI LLM

SPATIALALIGN: Aligning Dynamic Spatial Relationships in Video Generation

Most text-to-video (T2V) generators prioritize aesthetic quality, but often ignoring the spatial constraints in the generated videos. In this work, we present SPATIALALIGN, a self-improvement frame...

Fengming Liu, Tat-Jen Cham, Chuanxia Zheng

2602.22745 • 2026-02-26

View PDF

Papers