Papers
Research papers from arXiv and related sources
Expanding LLM Agent Boundaries with Strategy-Guided Exploration
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding. However, exploration r...
Andrew Szot, Michael Kirchhof, Omar Attia, Alexander Toshev
EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training
Large language models (LLMs) are predominantly trained on English-centric data, resulting in uneven performance for smaller languages. We study whether continued pretraining (CPT) can substantially...
Aleksei Dorkin, Taido Purason, Emil Kalbaliyev, Hele-Andra Kuulmets, Marii Ojastu, Mark Fišel, Ta...
On-surface synthesis and aromaticity of large cyclocarbons
Molecular rings of N carbon atoms, that is, cyclo[N]carbons, or $C_N$, can be formed by tip-induced chemistry [1-7]. Because of their monocyclic geometry, cyclocarbons are fundamentally important f...
Lisanne Sellies, Marco Vitek, Yueze Gao, Fabian Paschke, Florian Albrecht, Jakob Eckrich, Beren D...
MetaRCA: A Generalizable Root Cause Analysis Framework for Cloud-Native Systems Powered by Meta Causal Knowledge
The dynamics and complexity of cloud-native systems present significant challenges for Root Cause Analysis (RCA). While causality-based RCA methods have shown significant progress in recent years, ...
Shuai Liang, Pengfei Chen, Bozhe Tian, Gou Tan, Maohong Xu, Youjun Qu, Yahui Zhao, Yiduo Shang, C...
Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization
Moving beyond evaluations that collapse performance across heterogeneous prompts toward fine-grained evaluation at the prompt level, or within relatively homogeneous subsets, is necessary to diagno...
Felipe Maia Polo, Aida Nematzadeh, Virginia Aglietti, Adam Fisch, Isabela Albuquerque
Latent attention on masked patches for flow reconstruction
Vision transformers have demonstrated outstanding performance on image generation applications, but their adoption in scientific disciplines, like fluid dynamics, has been limited. We introduce the...
Ben Eze, Luca Magri, Andrea Nóvoa
Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT
Recent 3D CT vision-language models align volumes with reports via contrastive pretraining, but typically rely on limited public data and provide only coarse global supervision. We train a 3D CT vi...
Simon Ging, Philipp Arnold, Sebastian Walter, Hani Alnahas, Hannah Bast, Elmar Kotter, Jiancheng ...
PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking
Test-time scaling has shown that allocating more additional computation at inference can improve generation quality, motivating a natural follow-up question: where should this computation be spent?...
He Li, Feichen Song, Boyi Zeng, Shixiang Song, Zhiqin John Xu, Ziwei He, Zhouhan Lin
CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic Data
Tabular synthetic data generators are typically trained to match observational distributions, which can yield high conventional utility (e.g., column correlations, predictive accuracy) yet poor pre...
Amir Asiaee, Zhuohui J. Liang, Chao Yan
Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards
Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful repres...
Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth
Kruskal-EDS: Edge Dynamic Stratification
We introduce Kruskal-EDS (Edge Dynamic Stratification), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory $Θ$(m log m) global sort with ...
Yves Mercadier
Bespoke OLAP: Synthesizing Workload-Specific One-size-fits-one Database Engines
Modern OLAP engines are designed to support arbitrary analytical workloads, but this generality incurs structural overhead, including runtime schema interpretation, indirection layers, and abstract...
Johannes Wehrstein, Timo Eckmann, Matthias Jasny, Carsten Binnig
Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection
Recent advances in generative AI have significantly enhanced the realism of multimodal media manipulation, thereby posing substantial challenges to manipulation detection. Existing manipulation det...
Yuchen Zhang, Yaxiong Wang, Kecheng Han, Yujiao Wu, Lianwei Wu, Li Zhu, Zhedong Zheng
According to Me: Long-Term Personalized Referential Memory QA
Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term M...
Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, Bill Byrne
Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering
Change point detection for time series analysis is a difficult and important problem in applied statistics, for which a variety of approaches have been developed in the past several decades. Here, ...
David Gentile, Joshua Huang, James M. Murphy
ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models
In automatic music generation, a central challenge is to design controls that enable meaningful human-machine interaction. Existing systems often rely on extrinsic inputs such as text prompts or me...
Xiaoyu Yi, Qi He, Gus Xia, Ziyu Wang
Robust White Blood Cell Classification with Stain-Normalized Decoupled Learning and Ensembling
White blood cell (WBC) classification is fundamental for hematology applications such as infection assessment, leukemia screening, and treatment monitoring. However, real-world WBC datasets present...
Luu Le, Hoang-Loc Cao, Ha-Hieu Pham, Thanh-Huy Nguyen, Ulas Bagci
CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. St...
Yixin Nie, Lin Guan, Zhongyao Ma, Anchit Gupta, Yipin Zhou, Xiao Li, Zhengping Zhou, Raymond Zeng...
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Long-horizon interactions between users and LLM-based assistants necessitate effective memory management, yet current approaches face challenges in training and evaluation of memory. Existing memor...
Cheng Jiayang, Dongyu Ru, Lin Qiu, Yiyang Li, Xuezhi Cao, Yangqiu Song, Xunliang Cai
CoVAE: correlated multimodal generative modeling
Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space tha...
Federico Caretti, Guido Sanguinetti