Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

"When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction

Human collaborators coordinate dynamically through process visibility and workspace awareness, yet AI agents typically either provide only final outputs or expose read-only execution processes (e.g...

Kihoon Son, Hyewon Lee, DaEun Choi, Yoonsu Kim, Tae Soo Kim, Yoonjoo Lee, John Joon Young Chung, ...

2603.02050 2026-03-02
AI LLM

Expanding LLM Agent Boundaries with Strategy-Guided Exploration

Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding. However, exploration r...

Andrew Szot, Michael Kirchhof, Omar Attia, Alexander Toshev

2603.02045 2026-03-02
AI LLM

EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training

Large language models (LLMs) are predominantly trained on English-centric data, resulting in uneven performance for smaller languages. We study whether continued pretraining (CPT) can substantially...

Aleksei Dorkin, Taido Purason, Emil Kalbaliyev, Hele-Andra Kuulmets, Marii Ojastu, Mark Fišel, Ta...

2603.02041 2026-03-02
AI LLM

MetaRCA: A Generalizable Root Cause Analysis Framework for Cloud-Native Systems Powered by Meta Causal Knowledge

The dynamics and complexity of cloud-native systems present significant challenges for Root Cause Analysis (RCA). While causality-based RCA methods have shown significant progress in recent years, ...

Shuai Liang, Pengfei Chen, Bozhe Tian, Gou Tan, Maohong Xu, Youjun Qu, Yahui Zhao, Yiduo Shang, C...

2603.02032 2026-03-02
AI LLM

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

Moving beyond evaluations that collapse performance across heterogeneous prompts toward fine-grained evaluation at the prompt level, or within relatively homogeneous subsets, is necessary to diagno...

Felipe Maia Polo, Aida Nematzadeh, Virginia Aglietti, Adam Fisch, Isabela Albuquerque

2603.02029 2026-03-02
AI LLM

Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT

Recent 3D CT vision-language models align volumes with reports via contrastive pretraining, but typically rely on limited public data and provide only coarse global supervision. We train a 3D CT vi...

Simon Ging, Philipp Arnold, Sebastian Walter, Hani Alnahas, Hannah Bast, Elmar Kotter, Jiancheng ...

2603.02026 2026-03-02
AI LLM

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful repres...

Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth

2603.02008 2026-03-02
AI LLM

Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines

Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstract...

Johannes Wehrstein, Timo Eckmann, Matthias Jasny, Carsten Binnig

2603.02001 2026-03-02
AI LLM

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Recent advances in generative AI have significantly enhanced the realism of multimodal media manipulation, thereby posing substantial challenges to manipulation detection. Existing manipulation det...

Yuchen Zhang, Yaxiong Wang, Kecheng Han, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng

2603.01993 2026-03-02
AI LLM

According to Me: Long-Term Personalized Referential Memory QA

Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term M...

Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, Bill Byrne

2603.01990 2026-03-02
AI LLM

ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models

In automatic music generation, a central challenge is to design controls that enable meaningful human-machine interaction. Existing systems often rely on extrinsic inputs such as text prompts or me...

Xiaoyu Yi, Qi He, Gus Xia, Ziyu Wang

2603.01984 2026-03-02
AI LLM

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. St...

Yixin Nie, Lin Guan, Zhongyao Ma, Anchit Gupta, Yipin Zhou, Xiao Li, Zhengping Zhou, Raymond Zeng...

2603.01973 2026-03-02
AI LLM

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Long-horizon interactions between users and LLM-based assistants necessitate effective memory management, yet current approaches face challenges in training and evaluation of memory. Existing memor...

Cheng Jiayang, Dongyu Ru, Lin Qiu, Yiyang Li, Xuezhi Cao, Yangqiu Song, Xunliang Cai

2603.01966 2026-03-02
AI LLM

LiveCultureBench: a Multi-Agent, Multi-Cultural Benchmark for Large Language Models in Dynamic Social Simulations

Large language models (LLMs) are increasingly deployed as autonomous agents, yet evaluations focus primarily on task success rather than cultural appropriateness or evaluator reliability. We introd...

Viet-Thanh Pham, Lizhen Qu, Thuy-Trang Vu, Gholamreza Haffari, Dinh Phung

2603.01952 2026-03-02
AI LLM

Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platfo...

Huw Day, Adrianna Jezierska, Jessica Woodgate

2603.01942 2026-03-02
AI LLM

Dream2Learn: Structured Generative Dreaming for Continual Learning

Continual learning requires balancing plasticity and stability while mitigating catastrophic forgetting. Inspired by human dreaming as a mechanism for internal simulation and knowledge restructurin...

Salvatore Calcagno, Matteo Pennisi, Federica Proietto Salanitri, Amelia Sorrenti, Simone Palazzo,...

2603.01935 2026-03-02
AI LLM

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferat...

Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang

2603.01919 2026-03-02
AI LLM

Fast Entropy Decoding for Sparse MVM on GPUs

We present a novel, practical approach to speed up sparse matrix-vector multiplication (SpMVM) on GPUs. The novel key idea is to apply lossless entropy coding to further compress the sparse matrix ...

Emil Schätzle, Tommaso Pegolotti, Markus Püschel

2603.01915 2026-03-02
AI LLM

Demonstrating ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

Interactive articles help readers engage with complex ideas through exploration, yet creating them remains costly, requiring both domain expertise and web development skills. Recent LLM-based agent...

Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Wei Chen

2603.01912 2026-03-02
AI LLM

FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures

This system paper describes our participation in the SemEval-2025 Task-7 ``Everyday Knowledge Across Diverse Languages and Cultures''. We attended two subtasks, i.e., Track 1: Short Answer Question...

Liliia Bogdanova, Shiran Sun, Lifeng Han, Natalia Amat Lefort, Flor Miriam Plaza-del-Arco

2603.01910 2026-03-02