Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning

Test-time adaptation offers a promising avenue for improving reasoning performance in large language models without additional supervision, but existing approaches often apply a uniform optimizatio...

Mohammad Mahdi Moradi, Sudhir Mudur

2603.05357 2026-03-05
AI LLM

InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context

Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV...

Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang

2603.05353 2026-03-05
TESTING

Ailed: A Psyche-Driven Chess Engine with Dynamic Emotional Modulation

Chess engines passed human strength years ago, but they still don't play like humans. A grandmaster under clock pressure blunders in ways a club player on a hot streak never would. Conventional eng...

Diego Armando Resendez Prado

2603.05352 2026-03-05
TESTING

A Shift-Invariant Deep Learning Framework for Automated Analysis of XPS Spectra

X-ray Photoelectron Spectroscopy (XPS) is a crucial technique for material surface analysis, yet interpreting its spectra is often challenging for both human analysts and automated methods due to t...

Issa Saddiq, Yuxin Fan, Robert G. Palgrave, Mark A. Isaacs, David Morgan, Keith T. Butler

2603.05350 2026-03-05
AI LLM

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, exe...

Nghi D. Q. Bui

2603.05344 2026-03-05
TESTING

Evaluation of Feynman integrals via numerical integration of differential equations

We revisit the idea of numerically integrating the differential form of Feynman integrals. With a novel approach for the treatment of branch cuts, we develop an integrator capable of evaluating a b...

Pau Petit Rosàs

2603.05336 2026-03-05
TESTING

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

Punctuation restoration is essential for improving the readability and downstream utility of automatic speech recognition (ASR) outputs, yet remains underexplored for Persian despite its importance...

Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery

2603.05314 2026-03-05
TESTING

Exploring $T_{ΥΥ}$ tetraquark candidates in a coupled-channels formalism

We investigate the spectrum of $T_{ΥΥ}$ tetraquark candidates within a coupled-channels framework. The analysis includes all $L\leq2$ combinations of $Υ(1S)$, $Υ(2S)$, $η_b(1S)$, and $η_b(2S)$ in t...

P. G. Ortega, D. R. Entem, F. Fernandez, J. Segovia

2603.05311 2026-03-05
AI LLM

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achi...

Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan...

2603.05308 2026-03-05
TESTING

Maximum of sparsely equicorrelated Gaussian fields and applications

We investigate the extreme values of a sparse and equicorrelated Gaussian field on a triangle: the correlations on every vertical or horizontal line are all equal to a parameter $r \in [0,1/2]$ and...

Johannes Heiny, Tiefeng Jiang, Tuan Pham, Yongcheng Qi

2603.05306 2026-03-05
AI LLM

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take...

ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao

2603.05294 2026-03-05
AI LLM

Knowledge Divergence and the Value of Debate for Scalable Oversight

AI safety via debate and reinforcement learning from AI feedback (RLAIF) are both proposed methods for scalable oversight of advanced AI systems, yet no formal framework relates them or characteriz...

Robin Young

2603.05293 2026-03-05
AI LLM

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating patte...

Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song

2603.05290 2026-03-05
TESTING

The Local Tremaine-Weinberg Method for Galactic Pattern Speed: Theory and its Application to IllustrisTNG

The Tremaine-Weinberg (TW) method and its variations provide the most direct means to measure the pattern speeds of galactic bars. We establish a unifying framework by deriving an integral form of ...

Hangci Du, Yougang Wang, Junqiang Ge, Rui Guo

2603.05287 2026-03-05
TESTING

From Code to Road: A Vehicle-in-the-Loop and Digital Twin-Based Framework for Central Car Server Testing in Autonomous Driving

Simulation is one of the most essential parts in the development stage of automotive software. However, purely virtual simulations often struggle to accurately capture all real-world factors due to...

Chengdong Wu, Sven Kirchner, Nils Purschke, Axel Torschmied, Norbert Kroth, Yinglei Song, André S...

2603.05279 2026-03-05
AI LLM

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models

Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering l...

David Delgado, Lola Burgueño, Robert Clarisó

2603.05278 2026-03-05
AI LLM

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper int...

Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev

2603.05276 2026-03-05
TESTING

Monitoring Covariance in Multichannel Profiles via Functional Graphical Models

Most statistical process monitoring methods for multichannel profiles focus solely on the mean and are almost ineffective when changes involve the covariance structure. Although it is known to be c...

Christian Capezza, Davide Forcina, Antonio Lepore, Biagio Palumbo

2603.05274 2026-03-05
AI LLM

Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh

We present the Multilingual Cloud Corpus, the first national-scale, parallel, multimodal linguistic dataset of Bangladesh's ethnic and indigenous languages. Despite being home to approximately 40 m...

Mohammad Mamun Or Rashid

2603.05272 2026-03-05
AI LLM

VietJobs: A Vietnamese Job Advertisement Dataset

VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalitie...

Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj

2603.05262 2026-03-05