Personal Assistant Web

AI LLM

Anticipate, Adapt, Act: A Hybrid Framework for Task Planning

Anticipating and adapting to failures is a key capability robots need to collaborate effectively with humans in complex domains. This continues to be a challenge despite the impressive performance ...

Nabanita Dash, Ayush Kaura, Shivam Singh, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, K....

2602.19518 • 2026-02-23

View PDF

AI LLM

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

We introduce \CFE{} (\textbf{C}lassroom \textbf{F}inal \textbf{E}xam), a multimodal benchmark for evaluating the reasoning capabilities of large language models across more than 20 STEM domains. \C...

Chongyang Gao, Diji Yang, Shuyan Zhou, Xichen Yan, Luchuan Song, Shuo Li, Kezhen Chen

2602.19517 • 2026-02-23

View PDF

AI LLM

Pixel2Phys: Distilling Governing Laws from Visual Dynamics

Discovering physical laws directly from high-dimensional visual data is a long-standing human pursuit but remains a formidable challenge for machines, representing a fundamental goal of scientific ...

Ruikun Li, Jun Yao, Yingfan Hua, Shixiang Tang, Biqing Qi, Bin Liu, Wanli Ouyang, Yan Lu

2602.19516 • 2026-02-23

View PDF

AI LLM

Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study

Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPT...

Pulak Mehta

2602.19514 • 2026-02-23

View PDF

AI LLM

Real-time Win Probability and Latent Player Ability via STATS X in Team Sports

This study proposes a statistically grounded framework for real-time win probability evaluation and player assessment in score-based team sports, based on minute-by-minute cumulative box-score data...

Yasutaka Shimizu, Atsushi Yamanobe

2602.19513 • 2026-02-23

View PDF

AI LLM

Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While "Oracle" models (e.g., Llama-3-70B) achieve state-of-the-art accuracy, they are prohi...

Arindam Khaled

2602.19509 • 2026-02-23

View PDF

AI LLM

Conversational AI for Automated Patient Questionnaire Completion: Development Insights and Design Principles

Collecting patient-reported outcome measures (PROMs) is essential for clinical care and research, yet traditional form-based approaches are often tedious for patients and burdensome for clinicians....

David Fraile Navarro, Mor Peleg

2602.19507 • 2026-02-23

View PDF

AI LLM

Test-Time Computing for Referring Multimodal Large Language Models

We propose ControlMLLM++, a novel test-time adaptation framework that injects learnable visual prompts into frozen multimodal large language models (MLLMs) to enable fine-grained region-based visua...

Mingrui Wu, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Zhiyuan Liu, Liujuan Cao, Ming-Ming Cheng, Rongron...

2602.19505 • 2026-02-23

View PDF

TESTING

MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models

Recent advancements in Unified Multimodal Models (UMMs) have enabled remarkable image understanding and generation capabilities. However, while models like Gemini-2.5-Flash-Image show emerging abil...

Mingrui Wu, Hang Liu, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

2602.19497 • 2026-02-23

View PDF

TESTING

FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing

Traditional database fuzzing techniques primarily focus on syntactic correctness and general SQL structures, leaving critical yet obscure DBMS features, such as system-level modes (e.g., GTID), pro...

Yongxin Chen, Zhiyuan Jiang, Chao Zhang, Haoran Xu, Shenglin Xu, Jianping Tang, Zheming Li, Peida...

2602.19490 • 2026-02-23

View PDF

TESTING

Kaon decay constraints on vector bosons coupled to non-conserved currents

We study rare three- and four-body kaon decays as a probe of light vector and axial-vector bosons coupled to non-conserved currents. We find that searches for $K_L \to π^0 π^0 (X\to e^+e^-)$ decays...

Matheus Hostert, Maxim Pospelov, Adrian Thompson

2602.19479 • 2026-02-23

View PDF

TESTING

Physics-Aware, Shannon-Optimal Compression via Arithmetic Coding for Distributional Fidelity

Assessing whether two datasets are distributionally consistent has become a central theme in modern scientific analysis, particularly as generative artificial intelligence is increasingly used to p...

Cristiano Fanelli

2602.19476 • 2026-02-23

View PDF

TESTING

Zero Variance Portfolio

When the number of assets is larger than the sample size, the minimum variance portfolio interpolates the training data, delivering pathological zero in-sample variance. We show that if the weights...

Jinyuan Chang, Yi Ding, Zhentao Shi, Bo Zhang

2602.19462 • 2026-02-23

View PDF

TESTING

Optimal Error Estimates of a new Multiphysic Finite Element Method for Nonlinear Poroelasticity model with Hencky-Mises Stress Tensor

In this paper, we develop a new multiphysics finite element method for a nonlinear poroelastic model with Hencky-Mises stress tensor. By introducing some new notations, we reformulate the original ...

Yanan He, Zhihao Ge

2602.19457 • 2026-02-23

View PDF

TESTING

HD-TTA: Hypothesis-Driven Test-Time Adaptation for Safer Brain Tumor Segmentation

Standard Test-Time Adaptation (TTA) methods typically treat inference as a blind optimization task, applying generic objectives to all or filtered test samples. In safety-critical medical segmentat...

Kartik Jhawar, Lipo Wang

2602.19454 • 2026-02-23

View PDF

TESTING

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system, yet real deployments remain vulnerable to microarc...

Kunal Mukherjee

2602.19450 • 2026-02-23

View PDF

TESTING

OptiRepair: Closed-Loop Diagnosis and Repair of Supply Chain Optimization Models with LLM Agents

Problem Definition. Supply chain optimization models frequently become infeasible because of modeling errors. Diagnosis and repair require scarce OR expertise: analysts must interpret solver diagno...

Ruicheng Ao, David Simchi-Levi, Xinshang Wang

2602.19439 • 2026-02-23

View PDF

TESTING

A unified SPH framework for shell-related interactions

A unified Smoothed Particle Hydrodynamics (SPH) framework is proposed to simulate interaction dynamics involving thin shells modeled by a reduced-dimensional, single-layer particle discretization, ...

Dong Wu, Shuaihao Zhang, Weiyi Kong, Xiangyu Hu

2602.19429 • 2026-02-23

View PDF

TESTING

How Robust are Robustness Checks?

Robustness checks are routine in empirical work, but there is no standard statistical procedure to formally measure what one can learn from them. I propose a "robustness radius" measure to quantify...

Brenda Prallon

2602.19384 • 2026-02-22

View PDF

TESTING

On the Variability of Source Code in Maven Package Rebuilds

Rebuilding packages from open source is a common practice to improve the security of software supply chains, and is now done at an industrial scale. The basic principle is to acquire the source cod...

Jens Dietrich, Behnaz Hassanshahi

2602.19383 • 2026-02-22

View PDF

Papers