Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Personal Information Parroting in Language Models

Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, ...

Nishant Subramani, Kshitish Ghate, Mona Diab

2602.20580 2026-02-24
AI LLM

Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion

Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged as promising candidates for end-to-end autonomous driving. However, these models typically face challenges in inference l...

Jiaru Zhang, Manav Gagvani, Can Cui, Juntong Peng, Ruqi Zhang, Ziran Wang

2602.20577 2026-02-24
TESTING

GATES: Self-Distillation under Privileged Context with Consensus Gating

We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded ...

Alex Stein, Furong Huang, Tom Goldstein

2602.20574 2026-02-24
AI LLM

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct st...

Ayush Sawarni, Jiyuan Tan, Vasilis Syrgkanis

2602.20571 2026-02-24
AI LLM

AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery d...

Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, J...

2602.20569 2026-02-24
AI LLM

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs ...

Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Ha...

2602.20558 2026-02-24
AI LLM

CAD-Prompted SAM3: Geometry-Conditioned Instance Segmentation for Industrial Objects

Verbal-prompted segmentation is inherently limited by the expressiveness of natural language and struggles with uncommon, instance-specific, or difficult-to-describe objects: scenarios frequently e...

Zhenran Tang, Rohan Nagabhirava, Changliu Liu

2602.20551 2026-02-24
AI LLM

What Drives Students' Use of AI Chatbots? Technology Acceptance in Conversational AI

Conversational AI tools have been rapidly adopted by students and are becoming part of their learning routines. To understand what drives this adoption, we draw on the Technology Acceptance Model (...

Griffin Pitts, Sanaz Motamedi

2602.20547 2026-02-24
TESTING

Beyond Human Performance: A Vision-Language Multi-Agent Approach for Quality Control in Pharmaceutical Manufacturing

Colony-forming unit (CFU) detection is critical in pharmaceutical manufacturing, serving as a key component of Environmental Monitoring programs and ensuring compliance with stringent quality stand...

Subhra Jyoti Mandal, Lara Rachidi, Puneet Jain, Matthieu Duvinage, Sander W. Timmer

2602.20543 2026-02-24
AI LLM

Generative AI and Machine Learning Collaboration for Container Dwell Time Prediction via Data Standardization

Import container dwell time (ICDT) prediction is a key task for improving productivity in container terminals, as accurate predictions enable the reduction of container re-handling operations by ya...

Minseop Kim, Takhyeong Kim, Taekhyun Park, Hanbyeol Park, Hyerim Bae

2602.20540 2026-02-24
TESTING

Range Emulator: A Compact Paraxial Optical System to Emulate Long-Distance Monochromatic Laser Propagation

Emulating long-distance light propagation on a laboratory scale is essential for the ground-based testing of intersatellite optical systems. To address this challenge, we propose and analyze a nove...

Subaru Shibai, Kiwamu Izumi

2602.20538 2026-02-24
AI LLM

Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this...

Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, San...

2602.20532 2026-02-24
AI LLM

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to tok...

Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian ...

2602.20528 2026-02-24
AI LLM

Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs

A search for $CP$ violation with an entangled system of $Ξ^-$-$\barΞ^+$ pairs is performed, using $(10,087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII experiment. A nine-dimensional h...

BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...

2602.20524 2026-02-24
AI LLM

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of...

Rakshit Trivedi, Kartik Sharma, David C Parkes

2602.20517 2026-02-24
AI LLM

FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill

In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context. Sparse attention significantly reduces self-atte...

Rakshith Jayanth, Viktor Prasanna

2602.20515 2026-02-24
AI LLM

From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility

As large language models (LLMs) continue to improve at completing discrete tasks, they are being integrated into increasingly complex and diverse real-world systems. However, task-level success alo...

Gavin Levinson, Keith Feldman

2602.20513 2026-02-24
TESTING

Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models

What does it mean for a visual system to truly understand affordance? We argue that this understanding hinges on two complementary capacities: geometric perception, which identifies the structural ...

Qing Zhang, Xuesong Li, Jing Zhang

2602.20501 2026-02-24
TESTING

Fast Algorithms for Exact Confidence Intervals in Randomized Experiments with Binary Outcomes

We construct exact confidence intervals for the average treatment effect in randomized experiments with binary outcomes using sequences of randomization tests. Our approach does not rely on large-s...

Peng Zhang

2602.20498 2026-02-24
TESTING

Machine-learning cosmological parameters by eROSITA data

Context: We present the first Cosmological Parameter inferences from eROSITA X-ray observations of galaxy clusters using a Machine Learning algorithm. Methods: We train a Random Forest using mock c...

Fucheng Zhong, Nicola R. Napolitano, Johan Comparat, Klaus Dolag, Caroline Heneka, Zhiqi Huang, X...

2602.20483 2026-02-24