Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Less is More: Robust Zero-Communication 3D Pursuit-Evasion via Representational Parsimony

Asymmetric 3D pursuit-evasion in cluttered voxel environments is difficult under communication latency, partial observability, and nonholonomic maneuver limits. While many MARL methods rely on rich...

Jialin Ying, Zhihao Li, Zicheng Dong, Guohua Wu, Yihuan Liao

2603.08273 2026-03-09
TESTING

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

In-context imitation learning allows robots to acquire skills from demonstrations, yet one-shot trajectory generation remains fragile under environmental variation. We propose SAIL, a framework tha...

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So Kuroki

2603.08269 2026-03-09
TESTING

Lattice Determination of the Baryon Junction Mass in $(2+1)$ Dimensions

This contribution investigates baryonic flux tube configurations in $SU(3)$ Yang--Mills theory in $(2+1)$ dimensions. Leveraging recent next-to-leading-order results within the Effective String The...

Dario Panfalone, Michele Caselle, Nicodemo Magnoli, Lorenzo Verzichelli

2603.08268 2026-03-09
AI LLM

Towards a more efficient bias detection in financial language models

Bias in financial language models constitutes a major obstacle to their adoption in real-world applications. Detecting such bias is challenging, as it requires identifying inputs whose predictions ...

Firas Hadj Kacem, Ahmed Khanfir, Mike Papadakis

2603.08267 2026-03-09
AI LLM

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

The integration of Large Language Models (LLMs) into the financial domain is driving a paradigm shift from passive information retrieval to dynamic, agentic interaction. While general-purpose tool ...

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Den...

2603.08262 2026-03-09
AI LLM

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Existing data generation methods suffer from exploration limits, embodiment gaps, and low signal-to-noise ratios, leading to performance degradation during self-iteration. To address these challeng...

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Zhengbin Long, Haodong Xiang, Rong Shi, Zhuo Cui...

2603.08260 2026-03-09
AI LLM

NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating

Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1--5 scale in the context of short narrative stories containing ambiguous homonyms. Th...

Tong Wu, Thanet Markchom, Huizhi Liang

2603.08256 2026-03-09
AI LLM

Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Scaling test-time computation enhances LLM reasoning ability but faces a uniform computation paradox. Allocating identical resources leads to over-correction on simple tasks and insufficient refine...

Dongxu Zhang, Hongqiang Lin, Yiding Sun, Pengyu Wang, Qirui Wang, Ning Yang, Jihua Zhu

2603.08251 2026-03-09
AI LLM

Sensivity of LLMs' Explanations to the Training Randomness:Context, Class & Task Dependencies

Transformer models are now a cornerstone in natural language processing. Yet, explaining their decisions remains a challenge. It was shown recently that the same model trained on the same data with...

Romain Loncour, Jérémie Bogaert, François-Xavier Standaert

2603.08241 2026-03-09
AI LLM

Fibration Policy Optimization

Large language models are increasingly trained as heterogeneous systems spanning multiple domains, expert partitions, and agentic pipelines, yet prevalent proximal objectives operate at a single sc...

Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He

2603.08239 2026-03-09
AI LLM

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

With the rapid advancement of large language models (LLMs), the safety of LLMs has become a critical concern. Despite significant efforts in safety alignment, current LLMs remain vulnerable to jail...

Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li

2603.08234 2026-03-09
AI LLM

Computationally Efficient Data-Driven Topology Design Independent from High-Infoentropy Initial Dataset

Topology optimization (TO) has been widely adopted in engineering design; however, it is prone to being trapped in local optima, particularly in strongly nonlinear problems. Sensitivity-free data-d...

Jun Yang, Ziliang Wang, Shintaro Yamasaki

2603.08233 2026-03-09
TESTING

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

Paralinguistic speech tasks are often considered relatively language-agnostic, as they rely on extralinguistic acoustic cues rather than lexical content. However, prior studies report performance d...

Pol Buitrago, Oriol Pareras, Federico Costa, Javier Hernando

2603.08231 2026-03-09
TESTING

From Design to Validation: Preparing a LEO-Capable UE for End-to-End System Evaluation

The extension of 5G connectivity through Low-Earth Orbit satellite systems introduces significant technical challenges, particularly due to time-varying propagation delays and high Doppler shifts r...

Amedeo Giuliani, Pol Henarejos, Erislandy Mozo, Màrius Caus, Miguel Ángel Solis Gallego, Jaime Su...

2603.08229 2026-03-09
TESTING

Phase Transitions, Geodesic Structure, and Thermodynamic Properties Measurement of Einstein-Maxwell-Power Yang-Mills Black Hole Models

In this work, we test the geometrical structure and thermodynamic properties of the Einstein-Maxwell-Power-Yang-Mills black hole (BH) models, which constitute a nonlinear generalization of the stan...

Abdelmalek Bouzenada, Allan. R. P. Moreira, Shi-Hai Dong, Guo-Hua Sun, Muhammad Sharif

2603.08222 2026-03-09
AI LLM

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

Enterprise adoption of cloud-based AI agents faces a fundamental privacy dilemma: leveraging powerful cloud models requires sharing sensitive data, while local processing limits capability. Current...

Jianshu She

2603.08221 2026-03-09
AI LLM

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining

Speech-to-speech models handle turn-taking naturally but offer limited support for tool-calling or complex reasoning, while production ASR-LLM-TTS voice pipelines offer these capabilities but rely ...

Shangeth Rajaa

2603.08216 2026-03-09
TESTING

Fusion-Poly: A Polyhedral Framework Based on Spatial-Temporal Fusion for 3D Multi-Object Tracking

LiDAR-camera 3D multi-object tracking (MOT) combines rich visual semantics with accurate depth cues to improve trajectory consistency and tracking reliability. In practice, however, LiDAR and camer...

Xian Wu, Yitao Wu, Xiaoyu Li, Zijia Li, Lijun Zhao, Lining Sun

2603.08199 2026-03-09
TESTING

Algorithm with variable coefficients for computing matrix inverses

We present a general scheme for the construction of new eficient generalized Schultz iterative methods for computing the inverse matrix. These methods have the form $$ X_{k+1} = X_k(a_0^{(k)}I+a_1^...

Mihailo Krstić, Marko D. Petković, Kostadin Rajković, Marko Kostadinov

2603.08196 2026-03-09
AI LLM

Human-AI Collaboration for Scaling Agile Regression Testing: An Agentic-AI Teammate from Manual to Automated Testing

Agile organizations increasingly rely on automated regression testing to sustain rapid, high-quality software delivery. However, as systems grow and requirements evolve, a persistent bottleneck ari...

Moustapha El Outmani, Manthan Venkataramana Shenoy, Ahmad Hatahet, Andreas Rausch, Tim Niklas Kni...

2603.08190 2026-03-09