Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Low-Degree Method Fails to Predict Robust Subspace Recovery

The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This ...

He Jia, Aravindan Vijayaraghavan

2603.02594 2026-03-03
TESTING

Maximizing Generalization: The Effect of Different Augmentation Techniques on Lightweight Vision Transformer for Bengali Character Classification

Deep learning models have proven to be highly effective in computer vision, with deep convolutional neural networks achieving impressive results across various computer vision tasks. However, these...

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha

2603.02591 2026-03-03
TESTING

ExpGuard: LLM Content Moderation in Specialized Domains

With the growing deployment of large language models (LLMs) in real-world applications, establishing robust safety guardrails to moderate their inputs and outputs has become essential to ensure adh...

Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, Jung...

2603.02588 2026-03-03
TESTING

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

As large language models grow more capable, general AI agents have become increasingly prevalent in practical applications. However, existing benchmarks face significant limitations, failing to rep...

Hao Li, Huan Wang, Jinjie Gu, Wenjie Wang, Chenyi Zhuang, Sikang Bian

2603.02586 2026-03-03
TESTING

An Augmented Rating System for Test cricket: adapting Glicko's model

ICC's current ranking system does not adequately account for key contextual factors such as home advantage, toss impact and scheduling imbalances; leading to inconsistencies in team evaluation in T...

Rhitankar Bandyopadhyay, Diganta Mukherjee

2603.02574 2026-03-03
TESTING

Molecular Dynamics Simulations Reveal PolyQ-Length-Dependent Conformational Changes in Huntingtin Exon-1: Implications for Environmental Co-Solvent Modulation of Aggregation-Prone States

Huntington's disease (HD) is caused by CAG-repeat expansion in HTT, which lengthens the polyglutamine (polyQ) tract in huntingtin (HTT) and promotes misfolding and aggregation. While polyQ-length-d...

Jai Geddes-Nelson, Xiaochen Liu, Ken-Tye Yong

2603.02572 2026-03-03
TESTING

An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation

Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligne...

Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, ...

2603.02569 2026-03-03
TESTING

Relevance Matters: A Multi-Task and Multi-Stage Large Language Model Approach for E-commerce Query Rewriting

For e-commerce search, user experience is measured by users' behavioral responses to returned products, like click-through rate and conversion rate, as well as the relevance between returned produc...

Aijun Dai, Jixiang Zhang, Haiqing Hu, Guoyu Tang, Lin Liu, Ziguang Cheng

2603.02555 2026-03-03
TESTING

Fuzzing Microservices in Face of Intrinsic Uncertainties

The widespread adoption of microservices has fundamentally transformed how modern software systems are designed, deployed, operated and maintained. However, well-known microservice properties (e.g....

Man Zhang, Tao Yue, Andrea Arcuri

2603.02551 2026-03-03
TESTING

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Large language models (LLMs) exhibit a unified "general factor" of capability across 10 benchmarks, a finding confirmed by our factor analysis of 156 models, yet they still struggle with simple, tr...

Faiz Ghifari Haznitrama, Faeyza Rishad Ardi, Alice Oh

2603.02540 2026-03-03
TESTING

Exploiting PendingIntent Provenance Confusion to Spoof Android SDK Authentication

A single authentication bypass in a partner SDK grants attackers the identity of every partner in the ecosystem -- and millions of apps use SDKs with exactly this vulnerability. OWASP's 2024 Mobile...

Ramanpreet Singh Khinda

2603.02539 2026-03-03
TESTING

PathSpace: Rapid continuous map approximation for efficient SLAM using B-Splines in constrained environments

Simultaneous Localization and Mapping (SLAM) plays a crucial role in enabling autonomous vehicles to navigate previously unknown environments. Semantic SLAM mostly extends visual SLAM, leveraging...

Aduen Benjumea, Andrew Bradley, Alexander Rast, Matthias Rolf

2603.02538 2026-03-03
TESTING

Agentic Mixed-Source Multi-Modal Misinformation Detection with Adaptive Test-Time Scaling

Vision-language models (VLMs) have been proven effective for detecting multi-modal misinformation on social platforms, especially in zero-shot settings with unavailable or delayed annotations. Howe...

Wei Jiang, Tong Chen, Wei Yuan, Quoc Viet Hung Nguyen, Hongzhi Yin

2603.02519 2026-03-03
TESTING

Multiscale Ultrabroadband Polymer Scattering Media with Tailored Emittance for Radiative Thermal Management

A surface that selectively emits heat in the long-wave infrared (LWIR) can enable passive cooling in hot environments while retaining partial radiative insulation in cold conditions, but its real-w...

Zhenpeng Li, Mathis Degeorges, Nithin Jo Varghese, Jyotirmoy Mandal

2603.02513 2026-03-03
TESTING

Measurement of a quantum system using spin-mechanical conversion

Levitated macroscopic particles exhibiting quantum mechanical effects are garnering increased attention as a means for precision sensing and testing quantum mechanics. Defects in diamond, such as t...

A. A. Wood, D. S. Rice, T. Xie, F. H. Cassells, R. M. Goldblatt, T. Delord, G. Hétet, A. M. Martin

2603.02507 2026-03-03
TESTING

NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

Large Language Models (LLMs) achieve strong performance on natural language tasks but remain unreliable in mathematical reasoning, frequently generating fluent yet logically inconsistent solutions....

Pratibha Zunjare, Michael Hsiao

2603.02504 2026-03-03
TESTING

Joint Estimation of Dynamic O-D Demand and Choice Models for Dynamic Multi-modal Networks: Computational Graph-Based Learning and Hypothesis Tests

Understanding travel demand and behavior, particularly route and mode choices, is critical for effective transportation planning and policy design in multi-modal systems with emerging mobility opti...

Xiaoyu Ma, Sean Qian

2603.02503 2026-03-03
TESTING

Probing Planck-Scale Physics with High-Frequency Gravitational Waves

We develop a framework for testing quantum gravity through the stochastic gravitational-wave background produced by evaporating near-Planck-mass primordial black holes. Because gravitons free-strea...

Stefano Profumo

2603.02493 2026-03-03
TESTING

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators oft...

Valentin Lacombe, Valentin Quesnel, Damien Sileo

2603.02208 2026-03-02
AI LLM

VoiceAgengRAG: Solving the RAG Latency Bottleneck in Real-Time Voice Agents Using Dual-Agent Architectures

We present VoiceAgentRAG, an open-source dual-agent memory router that decouples retrieval from response generation. A background Slow Thinker agent continuously monitors the conversation stream, p...

Jielin Qiu, Jianguo Zhang, Zixiang Chen, Liangwei Yang, Ming Zhu, Juntao Tan, Haolin Chen, Wentin...

2603.02206 2026-03-02