Papers
Research papers from arXiv and related sources
Personal Information Parroting in Language Models
Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, ...
Nishant Subramani, Kshitish Ghate, Mona Diab
Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion
Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged as promising candidates for end-to-end autonomous driving. However, these models typically face challenges in inference l...
Jiaru Zhang, Manav Gagvani, Can Cui, Juntong Peng, Ruqi Zhang, Ziran Wang
GATES: Self-Distillation under Privileged Context with Consensus Gating
We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded ...
Alex Stein, Furong Huang, Tom Goldstein
CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation
Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct st...
Ayush Sawarni, Jiyuan Tan, Vasilis Syrgkanis
AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents
We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery d...
Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, J...
From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production
Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs ...
Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Ha...
CAD-Prompted SAM3: Geometry-Conditioned Instance Segmentation for Industrial Objects
Verbal-prompted segmentation is inherently limited by the expressiveness of natural language and struggles with uncommon, instance-specific, or difficult-to-describe objects: scenarios frequently e...
Zhenran Tang, Rohan Nagabhirava, Changliu Liu
What Drives Students' Use of AI Chatbots? Technology Acceptance in Conversational AI
Conversational AI tools have been rapidly adopted by students and are becoming part of their learning routines. To understand what drives this adoption, we draw on the Technology Acceptance Model (...
Griffin Pitts, Sanaz Motamedi
Beyond Human Performance: A Vision-Language Multi-Agent Approach for Quality Control in Pharmaceutical Manufacturing
Colony-forming unit (CFU) detection is critical in pharmaceutical manufacturing, serving as a key component of Environmental Monitoring programs and ensuring compliance with stringent quality stand...
Subhra Jyoti Mandal, Lara Rachidi, Puneet Jain, Matthieu Duvinage, Sander W. Timmer
Generative AI and Machine Learning Collaboration for Container Dwell Time Prediction via Data Standardization
Import container dwell time (ICDT) prediction is a key task for improving productivity in container terminals, as accurate predictions enable the reduction of container re-handling operations by ya...
Minseop Kim, Takhyeong Kim, Taekhyun Park, Hanbyeol Park, Hyerim Bae
Range Emulator: A Compact Paraxial Optical System to Emulate Long-Distance Monochromatic Laser Propagation
Emulating long-distance light propagation on a laboratory scale is essential for the ground-based testing of intersatellite optical systems. To address this challenge, we propose and analyze a nove...
Subaru Shibai, Kiwamu Izumi
Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training
Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this...
Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, San...
Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to tok...
Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian ...
Precise Measurement of Matter-Antimatter Asymmetry with Entangled Hyperon Antihyperon Pairs
A search for $CP$ violation with an entangled system of $Ξ^-$-$\barΞ^+$ pairs is performed, using $(10,087\pm44)\times10^{6}$ $J/ψ$ events collected with the BESIII experiment. A nine-dimensional h...
BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso,...
Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of...
Rakshit Trivedi, Kartik Sharma, David C Parkes
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill
In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context. Sparse attention significantly reduces self-atte...
Rakshith Jayanth, Viktor Prasanna
From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility
As large language models (LLMs) continue to improve at completing discrete tasks, they are being integrated into increasingly complex and diverse real-world systems. However, task-level success alo...
Gavin Levinson, Keith Feldman
Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models
What does it mean for a visual system to truly understand affordance? We argue that this understanding hinges on two complementary capacities: geometric perception, which identifies the structural ...
Qing Zhang, Xuesong Li, Jing Zhang
Fast Algorithms for Exact Confidence Intervals in Randomized Experiments with Binary Outcomes
We construct exact confidence intervals for the average treatment effect in randomized experiments with binary outcomes using sequences of randomization tests. Our approach does not rely on large-s...
Peng Zhang
Machine-learning cosmological parameters by eROSITA data
Context: We present the first Cosmological Parameter inferences from eROSITA X-ray observations of galaxy clusters using a Machine Learning algorithm. Methods: We train a Random Forest using mock c...
Fucheng Zhong, Nicola R. Napolitano, Johan Comparat, Klaus Dolag, Caroline Heneka, Zhiqi Huang, X...