Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Expanding LLM Agent Boundaries with Strategy-Guided Exploration

Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding. However, exploration r...

Andrew Szot, Michael Kirchhof, Omar Attia, Alexander Toshev

2603.02045 2026-03-02
AI LLM

EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training

Large language models (LLMs) are predominantly trained on English-centric data, resulting in uneven performance for smaller languages. We study whether continued pretraining (CPT) can substantially...

Aleksei Dorkin, Taido Purason, Emil Kalbaliyev, Hele-Andra Kuulmets, Marii Ojastu, Mark Fišel, Ta...

2603.02041 2026-03-02
TESTING

On-surface synthesis and aromaticity of large cyclocarbons

Molecular rings of N carbon atoms, that is, cyclo[N]carbons, or $C_N$, can be formed by tip-induced chemistry [1-7]. Because of their monocyclic geometry, cyclocarbons are fundamentally important f...

Lisanne Sellies, Marco Vitek, Yueze Gao, Fabian Paschke, Florian Albrecht, Jakob Eckrich, Beren D...

2603.02040 2026-03-02
AI LLM

MetaRCA: A Generalizable Root Cause Analysis Framework for Cloud-Native Systems Powered by Meta Causal Knowledge

The dynamics and complexity of cloud-native systems present significant challenges for Root Cause Analysis (RCA). While causality-based RCA methods have shown significant progress in recent years, ...

Shuai Liang, Pengfei Chen, Bozhe Tian, Gou Tan, Maohong Xu, Youjun Qu, Yahui Zhao, Yiduo Shang, C...

2603.02032 2026-03-02
AI LLM

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

Moving beyond evaluations that collapse performance across heterogeneous prompts toward fine-grained evaluation at the prompt level, or within relatively homogeneous subsets, is necessary to diagno...

Felipe Maia Polo, Aida Nematzadeh, Virginia Aglietti, Adam Fisch, Isabela Albuquerque

2603.02029 2026-03-02
TESTING

Latent attention on masked patches for flow reconstruction

Vision transformers have demonstrated outstanding performance on image generation applications, but their adoption in scientific disciplines, like fluid dynamics, has been limited. We introduce the...

Ben Eze, Luca Magri, Andrea Nóvoa

2603.02028 2026-03-02
AI LLM

Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT

Recent 3D CT vision-language models align volumes with reports via contrastive pretraining, but typically rely on limited public data and provide only coarse global supervision. We train a 3D CT vi...

Simon Ging, Philipp Arnold, Sebastian Walter, Hani Alnahas, Hannah Bast, Elmar Kotter, Jiancheng ...

2603.02026 2026-03-02
TESTING

PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking

Test-time scaling has shown that allocating more additional computation at inference can improve generation quality, motivating a natural follow-up question: where should this computation be spent?...

He Li, Feichen Song, Boyi Zeng, Shixiang Song, Zhiqin John Xu, Ziwei He, Zhouhan Lin

2603.02023 2026-03-02
TESTING

CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data

Tabular synthetic data generators are typically trained to match observational distributions, which can yield high conventional utility (e.g., column correlations, predictive accuracy) yet poor pre...

Amir Asiaee, Zhuohui J. Liang, Chao Yan

2603.02015 2026-03-02
AI LLM

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful repres...

Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth

2603.02008 2026-03-02
TESTING

Kruskal-EDS: Edge Dynamic Stratification

We introduce Kruskal-EDS (Edge Dynamic Stratification), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory $Θ$(m log m) global sort with ...

Yves Mercadier

2603.02006 2026-03-02
AI LLM

Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines

Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstract...

Johannes Wehrstein, Timo Eckmann, Matthias Jasny, Carsten Binnig

2603.02001 2026-03-02
AI LLM

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Recent advances in generative AI have significantly enhanced the realism of multimodal media manipulation, thereby posing substantial challenges to manipulation detection. Existing manipulation det...

Yuchen Zhang, Yaxiong Wang, Kecheng Han, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng

2603.01993 2026-03-02
AI LLM

According to Me: Long-Term Personalized Referential Memory QA

Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term M...

Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, Bill Byrne

2603.01990 2026-03-02
TESTING

Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering

Change point detection for time series analysis is a difficult and important problem in applied statistics, for which a variety of approaches have been developed in the past several decades. Here, ...

David Gentile, Joshua Huang, James M. Murphy

2603.01989 2026-03-02
AI LLM

ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models

In automatic music generation, a central challenge is to design controls that enable meaningful human-machine interaction. Existing systems often rely on extrinsic inputs such as text prompts or me...

Xiaoyu Yi, Qi He, Gus Xia, Ziyu Wang

2603.01984 2026-03-02
TESTING

Robust White Blood Cell Classification with Stain-Normalized Decoupled Learning and Ensembling

White blood cell (WBC) classification is fundamental for hematology applications such as infection assessment, leukemia screening, and treatment monitoring. However, real-world WBC datasets present...

Luu Le, Hoang-Loc Cao, Ha-Hieu Pham, Thanh-Huy Nguyen, Ulas Bagci

2603.01976 2026-03-02
AI LLM

CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production

This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. St...

Yixin Nie, Lin Guan, Zhongyao Ma, Anchit Gupta, Yipin Zhou, Xiao Li, Zhengping Zhou, Raymond Zeng...

2603.01973 2026-03-02
AI LLM

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Long-horizon interactions between users and LLM-based assistants necessitate effective memory management, yet current approaches face challenges in training and evaluation of memory. Existing memor...

Cheng Jiayang, Dongyu Ru, Lin Qiu, Yiyang Li, Xuezhi Cao, Yangqiu Song, Xunliang Cai

2603.01966 2026-03-02
TESTING

CoVAE: correlated multimodal generative modeling

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space tha...

Federico Caretti, Guido Sanguinetti

2603.01965 2026-03-02