Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Operationalization of Machine Learning with Serverless Architecture: An Industrial Operationalization of Machine Learning with Serverless Architecture: An Industrial Implementation for Harmonized System Code Prediction

This paper presents a serverless MLOps framework orchestrating the complete ML lifecycle from data ingestion, training, deployment, monitoring, and retraining to using event-driven pipelines and ma...

Sai Vineeth Kandappareddigari, Santhoshkumar Jagadish, Gauri Verma, Ilhuicamina Contreras, Christ...

2602.17102 2026-02-19
AI LLM

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level ...

Siyu Wang, Ruotian Lu, Zhihao Yang, Yuchao Wang, Yanzhou Zhang, Lei Xu, Qimin Xu, Guojun Yin, Cai...

2602.17100 2026-02-19
AI LLM

AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing

Despite recent breakthroughs, audio foundation models struggle in processing complex multi-source acoustic scenes. We refer to this challenging domain as audio stories, which can have multiple spea...

William Chen, Prem Seetharaman, Rithesh Kumar, Oriol Nieto, Shinji Watanabe, Justin Salamon, Zeyu...

2602.17097 2026-02-19
AI LLM

Agentic Wireless Communication for 6G: Intent-Aware and Continuously Evolving Physical-Layer Intelligence

As 6G wireless systems evolve, growing functional complexity and diverse service demands are driving a shift from rule-based control to intent-driven autonomous intelligence. User requirements are ...

Zhaoyang Li, Xingzhi Jin, Junyu Pan, Qianqian Yang, Zhiguo Shi

2602.17096 2026-02-19
AI LLM

FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitat...

Chuiyang Meng, Ming Tang, Vincent W. S. Wong

2602.17095 2026-02-19
TESTING

A Locality Radius Framework for Understanding Relational Inductive Bias in Database Learning

Foreign key discovery and related schema-level prediction tasks are often modeled using graph neural networks (GNNs), implicitly assuming that relational inductive bias improves performance. Howeve...

Aadi Joshi, Kavya Bhand

2602.17092 2026-02-19
AI LLM

What to Cut? Predicting Unnecessary Methods in Agentic Code Generation

Agentic Coding, powered by autonomous agents such as GitHub Copilot and Cursor, enables developers to generate code, tests, and pull requests from natural language instructions alone. While this ac...

Kan Watanabe, Tatsuya Shirai, Yutaro Kashiwa, Hajimu Iida

2602.17091 2026-02-19
AI LLM

Synergizing Transport-Based Generative Models and Latent Geometry for Stochastic Closure Modeling

Diffusion models recently developed for generative AI tasks can produce high-quality samples while still maintaining diversity among samples to promote mode coverage, providing a promising path for...

Xinghao Dong, Huchen Yang, Jin-long Wu

2602.17089 2026-02-19
AI LLM

How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses

The rapid adoption of large language models has led to the emergence of AI coding agents that autonomously create pull requests on GitHub. However, how these agents differ in their pull request des...

Kan Watanabe, Rikuto Tsuchida, Takahiro Monno, Bin Huang, Kazuma Yamasaki, Youmei Fan, Kazumasa S...

2602.17084 2026-02-19
AI LLM

Rememo: A Research-through-Design Inquiry Towards an AI-in-the-loop Therapist's Tool for Dementia Reminiscence

Reminiscence therapy (RT) is a common non-pharmacological intervention in dementia care. Recent technology-mediated interventions have largely focused on people with dementia through solutions that...

Celeste Seah, Yoke Chuan Lee, Jung-Joo Lee, Ching-Chiuan Yen, Clement Zheng

2602.17083 2026-02-19
TESTING

Environmental policy in the context of complex systems: Statistical optimization and sensitivity analysis for ABMs

Coupled human-environment systems are increasingly being understood as complex adaptive systems (CAS), in which micro-level interactions between components lead to emergent behavior. Agent-based mo...

Dylan Munson, Arijit Dey, Simon Mak

2602.17079 2026-02-19
AI LLM

BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios

Large language models (LLMs)-based chatbots are increasingly being adopted in the financial domain, particularly in digital banking, to handle customer inquiries about products such as deposits, sa...

Yunseung Lee, Subin Kim, Youngjun Kwak, Jaegul Choo

2602.17072 2026-02-19
TESTING

A Long-term Value Prediction Framework In Video Ranking

Accurately modeling long-term value (LTV) at the ranking stage of short-video recommendation remains challenging. While delayed feedback and extended engagement have been explored, fine-grained att...

Huabin Chen, Xinao Wang, Huiping Chu, Keqin Xu, Chenhao Zhai, Chenyi Wang, Kai Meng, Yuning Jiang

2602.17058 2026-02-19
TESTING

ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning

While recent Arabic NLP benchmarks focus on scale, they often rely on synthetic or translated data which may benefit from deeper linguistic verification. We introduce ALPS (Arabic Linguistic & Prag...

Hussein S. Al-Olimat, Ahmad Alshareef

2602.17054 2026-02-19
AI LLM

RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We in...

Yunseok Han, Yejoon Lee, Jaeyoung Do

2602.17053 2026-02-19
AI LLM

Large Language Models Persuade Without Planning Theory of Mind

A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theo...

Jared Moore, Rasmus Overmark, Ned Cooper, Beba Cibralic, Nick Haber, Cameron R. Jones

2602.17045 2026-02-19
TESTING

Quantifying the limits of human athletic performance: A Bayesian analysis of elite decathletes

Because the decathlon tests many facets of athleticism, including sprinting, throwing, jumping, and endurance, many consider it to be the ultimate test of athletic ability. On this view, estimating...

Paul-Hieu V. Nguyen, James M. Smoliga, Benton Lindaman, Sameer K. Deshpande

2602.17043 2026-02-19
AI LLM

Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity ...

Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng

2602.17038 2026-02-19
AI LLM

Wink: Recovering from Misbehaviors in Coding Agents

Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to ...

Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra

2602.17037 2026-02-19
TESTING

Product Hardy Spaces on Spaces of Homogeneous Type: Discrete Product Calderón-Type Reproducing Formula, Atomic Characterization, and Product Calderón--Zygmund Operators

Let $i\in\{1,2\}$ and $X_i$ be a space of homogeneous type in the sense of Coifman and Weiss with the upper dimension $ω_i$. Also let $η_i$ be the smoothness index of the Auscher--Hytönen wavelet f...

Ziyi He, Dachun Yang, Taotao Zheng

2602.17031 2026-02-19