Papers
Research papers from arXiv and related sources
DiSCTT: Consensus-Guided Self-Curriculum for Efficient Test-Time Adaptation in Reasoning
Test-time adaptation offers a promising avenue for improving reasoning performance in large language models without additional supervision, but existing approaches often apply a uniform optimizatio...
Mohammad Mahdi Moradi, Sudhir Mudur
InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context
Retrieval-augmented generation (RAG) for long-context question answering is bottlenecked by inference-time prefilling over large retrieved contexts. A common strategy is to precompute key-value (KV...
Xin Teng, Canyu Zhang, Shaoyi Zheng, Danyang Zhuo, Tianyi Zhou, Shengjie Wang
Ailed: A Psyche-Driven Chess Engine with Dynamic Emotional Modulation
Chess engines passed human strength years ago, but they still don't play like humans. A grandmaster under clock pressure blunders in ways a club player on a hot streak never would. Conventional eng...
Diego Armando Resendez Prado
A Shift-Invariant Deep Learning Framework for Automated Analysis of XPS Spectra
X-ray Photoelectron Spectroscopy (XPS) is a crucial technique for material surface analysis, yet interpreting its spectra is often challenging for both human analysts and automated methods due to t...
Issa Saddiq, Yuxin Fan, Robert G. Palgrave, Mark A. Isaacs, David Morgan, Keith T. Butler
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned
The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, exe...
Nghi D. Q. Bui
Evaluation of Feynman integrals via numerical integration of differential equations
We revisit the idea of numerically integrating the differential form of Feynman integrals. With a novel approach for the treatment of branch cuts, we develop an integrator capable of evaluating a b...
Pau Petit Rosàs
PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration
Punctuation restoration is essential for improving the readability and downstream utility of automatic speech recognition (ASR) outputs, yet remains underexplored for Persian despite its importance...
Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery
Exploring $T_{ΥΥ}$ tetraquark candidates in a coupled-channels formalism
We investigate the spectrum of $T_{ΥΥ}$ tetraquark candidates within a coupled-channels framework. The analysis includes all $L\leq2$ combinations of $Υ(1S)$, $Υ(2S)$, $η_b(1S)$, and $η_b(2S)$ in t...
P. G. Ortega, D. R. Entem, F. Fernandez, J. Segovia
Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution
Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achi...
Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan...
Maximum of sparsely equicorrelated Gaussian fields and applications
We investigate the extreme values of a sparse and equicorrelated Gaussian field on a triangle: the correlations on every vertical or horizontal line are all equal to a parameter $r \in [0,1/2]$ and...
Johannes Heiny, Tiefeng Jiang, Tuan Pham, Yongcheng Qi
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take...
ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao
Knowledge Divergence and the Value of Debate for Scalable Oversight
AI safety via debate and reinforcement learning from AI feedback (RLAIF) are both proposed methods for scalable oversight of advanced AI systems, yet no formal framework relates them or characteriz...
Robin Young
X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating patte...
Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song
The Local Tremaine-Weinberg Method for Galactic Pattern Speed: Theory and its Application to IllustrisTNG
The Tremaine-Weinberg (TW) method and its variations provide the most direct means to measure the pattern speeds of galactic bars. We establish a unifying framework by deriving an integral form of ...
Hangci Du, Yougang Wang, Junqiang Ge, Rui Guo
From Code to Road: A Vehicle-in-the-Loop and Digital Twin-Based Framework for Central Car Server Testing in Autonomous Driving
Simulation is one of the most essential parts in the development stage of automotive software. However, purely virtual simulations often struggle to accurately capture all real-world factors due to...
Chengdong Wu, Sven Kirchner, Nils Purschke, Axel Torschmied, Norbert Kroth, Yinglei Song, André S...
A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models
Large language models (LLMs) can be used to support software development tasks, e.g., through code completion or code generation. However, their effectiveness drops significantly when considering l...
David Delgado, Lola Burgueño, Robert Clarisó
Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts
In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper int...
Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev
Monitoring Covariance in Multichannel Profiles via Functional Graphical Models
Most statistical process monitoring methods for multichannel profiles focus solely on the mean and are almost ineffective when changes involve the covariance structure. Although it is known to be c...
Christian Capezza, Davide Forcina, Antonio Lepore, Biagio Palumbo
Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh
We present the Multilingual Cloud Corpus, the first national-scale, parallel, multimodal linguistic dataset of Bangladesh's ethnic and indigenous languages. Despite being home to approximately 40 m...
Mohammad Mamun Or Rashid
VietJobs: A Vietnamese Job Advertisement Dataset
VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalitie...
Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj