Personal Assistant Web

TESTING

Bayesian Optimization in Chemical Compound Sub-Spaces using Low-Dimensional Molecular Descriptors

Efficient optimization of molecules with targeted properties remains a significant challenge due to the vast size and discrete nature of chemical compound space. Conventional machine-learning-based...

Yun-Wen Mao, Roman V. Krems

2603.02605 • 2026-03-03

View PDF

TESTING

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows

Autonomous AI agents are deployed at unprecedented scale, yet no principled methodology exists for verifying that an agent has not regressed after changes to its prompts, tools, models, or orch...

Varun Pratap Bhardwaj

2603.02601 • 2026-03-03

View PDF

TESTING

Synthetic-Child: An AIGC-Based Synthetic Data Pipeline for Privacy-Preserving Child Posture Estimation

Accurate child posture estimation is critical for AI-powered study companion devices, yet collecting large-scale annotated datasets of children is both expensive and ethically prohibitive due to pr...

Taowen Zeng

2603.02598 • 2026-03-03

View PDF

TESTING

GPUTOK: GPU Accelerated Byte Level BPE Tokenization

As large language models move toward million-token context windows, CPU tokenizers become a major slowdown because they process text one step at a time while powerful GPUs sit unused. We built a GP...

Venu Gopal Kadamba, Kanishkha Jaisankar

2603.02597 • 2026-03-03

View PDF

TESTING

Low-Degree Method Fails to Predict Robust Subspace Recovery

The low-degree polynomial framework has been highly successful in predicting computational versus statistical gaps for high-dimensional problems in average-case analysis and machine learning. This ...

He Jia, Aravindan Vijayaraghavan

2603.02594 • 2026-03-03

View PDF

TESTING

Maximizing Generalization: The Effect of Different Augmentation Techniques on Lightweight Vision Transformer for Bengali Character Classification

Deep learning models have proven to be highly effective in computer vision, with deep convolutional neural networks achieving impressive results across various computer vision tasks. However, these...

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha

2603.02591 • 2026-03-03

View PDF

TESTING

ExpGuard: LLM Content Moderation in Specialized Domains

With the growing deployment of large language models (LLMs) in real-world applications, establishing robust safety guardrails to moderate their inputs and outputs has become essential to ensure adh...

Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, Jung...

2603.02588 • 2026-03-03

View PDF

TESTING

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

As large language models grow more capable, general AI agents have become increasingly prevalent in practical applications. However, existing benchmarks face significant limitations, failing to rep...

Hao Li, Huan Wang, Jinjie Gu, Wenjie Wang, Chenyi Zhuang, Sikang Bian

2603.02586 • 2026-03-03

View PDF

TESTING

An Augmented Rating System for Test cricket: adapting Glicko's model

ICC's current ranking system does not adequately account for key contextual factors such as home advantage, toss impact and scheduling imbalances; leading to inconsistencies in team evaluation in T...

Rhitankar Bandyopadhyay, Diganta Mukherjee

2603.02574 • 2026-03-03

View PDF

TESTING

Molecular Dynamics Simulations Reveal PolyQ-Length-Dependent Conformational Changes in Huntingtin Exon-1: Implications for Environmental Co-Solvent Modulation of Aggregation-Prone States

Huntington's disease (HD) is caused by CAG-repeat expansion in HTT, which lengthens the polyglutamine (polyQ) tract in huntingtin (HTT) and promotes misfolding and aggregation. While polyQ-length-d...

Jai Geddes-Nelson, Xiaochen Liu, Ken-Tye Yong

2603.02572 • 2026-03-03

View PDF

TESTING

An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation

Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligne...

Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, ...

2603.02569 • 2026-03-03

View PDF

TESTING

Relevance Matters: A Multi-Task and Multi-Stage Large Language Model Approach for E-commerce Query Rewriting

For e-commerce search, user experience is measured by users' behavioral responses to returned products, like click-through rate and conversion rate, as well as the relevance between returned produc...

Aijun Dai, Jixiang Zhang, Haiqing Hu, Guoyu Tang, Lin Liu, Ziguang Cheng

2603.02555 • 2026-03-03

View PDF

TESTING

Fuzzing Microservices in Face of Intrinsic Uncertainties

The widespread adoption of microservices has fundamentally transformed how modern software systems are designed, deployed, operated and maintained. However, well-known microservice properties (e.g....

Man Zhang, Tao Yue, Andrea Arcuri

2603.02551 • 2026-03-03

View PDF

TESTING

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Large language models (LLMs) exhibit a unified "general factor" of capability across 10 benchmarks, a finding confirmed by our factor analysis of 156 models, yet they still struggle with simple, tr...

Faiz Ghifari Haznitrama, Faeyza Rishad Ardi, Alice Oh

2603.02540 • 2026-03-03

View PDF

TESTING

Exploiting PendingIntent Provenance Confusion to Spoof Android SDK Authentication

A single authentication bypass in a partner SDK grants attackers the identity of every partner in the ecosystem -- and millions of apps use SDKs with exactly this vulnerability. OWASP's 2024 Mobile...

Ramanpreet Singh Khinda

2603.02539 • 2026-03-03

View PDF

TESTING

PathSpace: Rapid continuous map approximation for efficient SLAM using B-Splines in constrained environments

Simultaneous Localization and Mapping (SLAM) plays a crucial role in enabling autonomous vehicles to navigate previously unknown environments. Semantic SLAM mostly extends visual SLAM, leveraging...

Aduen Benjumea, Andrew Bradley, Alexander Rast, Matthias Rolf

2603.02538 • 2026-03-03

View PDF

TESTING

Agentic Mixed-Source Multi-Modal Misinformation Detection with Adaptive Test-Time Scaling

Vision-language models (VLMs) have been proven effective for detecting multi-modal misinformation on social platforms, especially in zero-shot settings with unavailable or delayed annotations. Howe...

Wei Jiang, Tong Chen, Wei Yuan, Quoc Viet Hung Nguyen, Hongzhi Yin

2603.02519 • 2026-03-03

View PDF

TESTING

Multiscale Ultrabroadband Polymer Scattering Media with Tailored Emittance for Radiative Thermal Management

A surface that selectively emits heat in the long-wave infrared (LWIR) can enable passive cooling in hot environments while retaining partial radiative insulation in cold conditions, but its real-w...

Zhenpeng Li, Mathis Degeorges, Nithin Jo Varghese, Jyotirmoy Mandal

2603.02513 • 2026-03-03

View PDF

TESTING

Measurement of a quantum system using spin-mechanical conversion

Levitated macroscopic particles exhibiting quantum mechanical effects are garnering increased attention as a means for precision sensing and testing quantum mechanics. Defects in diamond, such as t...

A. A. Wood, D. S. Rice, T. Xie, F. H. Cassells, R. M. Goldblatt, T. Delord, G. Hétet, A. M. Martin

2603.02507 • 2026-03-03

View PDF

TESTING

NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

Large Language Models (LLMs) achieve strong performance on natural language tasks but remain unreliable in mathematical reasoning, frequently generating fluent yet logically inconsistent solutions....

Pratibha Zunjare, Michael Hsiao

2603.02504 • 2026-03-03

View PDF

Papers