Personal Assistant Web

AI LLM

An LLM-driven Scenario Generation Pipeline Using an Extended Scenic DSL for Autonomous Driving Safety Validation

Real-world crash reports, which combine textual summaries and sketches, are valuable for scenario-based testing of autonomous driving systems (ADS). However, current methods cannot effectively tran...

Fida Khandaker Safa, Yupeng Jiang, Xi Zheng

2602.20644 • 2026-02-24

View PDF

AI LLM

Grounding LLMs in Scientific Discovery via Embodied Actions

Large Language Models (LLMs) have shown significant potential in scientific discovery but struggle to bridge the gap between theoretical reasoning and verifiable physical simulation. Existing solut...

Bo Zhang, Jinfeng Zhou, Yuxuan Chen, Jianing Yin, Minlie Huang, Hongning Wang

2602.20639 • 2026-02-24

View PDF

AI LLM

AI Combines, Humans Socialise: A SECI-based Experience Report on Business Simulation Games

Background. Business Simulation Games (BSG) are widely used to foster experiential learning in complex managerial and organisational contexts by exposing students to decision-making under uncertain...

Nordine Benkeltoum

2602.20633 • 2026-02-24

View PDF

AI LLM

QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs

As Large Language Models (LLMs) saturate elementary benchmarks, the research frontier has shifted from generation to the reliability of automated evaluation. We demonstrate that standard "LLM-as-a-...

Santiago Gonzalez, Alireza Amiri Bavandpour, Peter Ye, Edward Zhang, Ruslans Aleksejevs, Todor An...

2602.20629 • 2026-02-24

View PDF

AI LLM

When can we trust untrusted monitoring? A safety case sketch across collusion strategies

AIs are increasingly being deployed with greater autonomy and capabilities, which increases the risk that a misaligned AI may be able to cause catastrophic harm. Untrusted monitoring -- using one u...

Nelson Gardner-Challis, Jonathan Bostock, Georgiy Kozhevnikov, Morgan Sinclaire, Joan Velja, Ales...

2602.20628 • 2026-02-24

View PDF

AI LLM

Physics-based phenomenological characterization of cross-modal bias in multimodal models

The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparative (where fairness is understood as formal equality, such as "treat like cases as like") and no...

Hyeongmo Kim, Sohyun Kang, Yerin Choi, Seungyeon Ji, Junhyuk Woo, Hyunsuk Chung, Soyeon Caren Han...

2602.20624 • 2026-02-24

View PDF

AI LLM

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

The proliferation of AI-generated content has facilitated sophisticated face manipulation, severely undermining visual integrity and posing unprecedented challenges to intellectual property. In res...

Haonan An, Xiaohui Ye, Guang Hua, Yihang Tao, Hangcheng Cao, Xiangyu Yu, Yuguang Fang

2602.20618 • 2026-02-24

View PDF

AI LLM

Amortized Bayesian inference for actigraph time sheet data from mobile devices

Mobile data technologies use ``actigraphs'' to furnish information on health variables as a function of a subject's movement. The advent of wearable devices and related technologies has propelled t...

Daniel Zhou, Sudipto Banerjee

2602.20611 • 2026-02-24

View PDF

AI LLM

SpecMind: Cognitively Inspired, Interactive Multi-Turn Framework for Postcondition Inference

Specifications are vital for ensuring program correctness, yet writing them manually remains challenging and time-intensive. Recent large language model (LLM)-based methods have shown successes in ...

Cuong Chi Le, Minh V. T Pham, Tung Vu Duy, Cuong Duc Van, Huy N. Phan, Hoang N. Phan, Tien N. Nguyen

2602.20610 • 2026-02-24

View PDF

AI LLM

OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities enabling prompt leakage attacks. Prior studies ide...

Longxiang Wang, Xiang Zheng, Xuhao Zhang, Yao Zhang, Ye Wu, Cong Wang

2602.20595 • 2026-02-24

View PDF

AI LLM

Personal Information Parroting in Language Models

Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, ...

Nishant Subramani, Kshitish Ghate, Mona Diab

2602.20580 • 2026-02-24

View PDF

AI LLM

Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion

Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged as promising candidates for end-to-end autonomous driving. However, these models typically face challenges in inference l...

Jiaru Zhang, Manav Gagvani, Can Cui, Juntong Peng, Ruqi Zhang, Ziran Wang

2602.20577 • 2026-02-24

View PDF

AI LLM

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct st...

Ayush Sawarni, Jiyuan Tan, Vasilis Syrgkanis

2602.20571 • 2026-02-24

View PDF

AI LLM

AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

We present AIForge-Doc, the first dedicated benchmark targeting exclusively diffusion-model-based inpainting in financial and form documents with pixel-level annotation. Existing document forgery d...

Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, J...

2602.20569 • 2026-02-24

View PDF

AI LLM

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs ...

Yucheng Shi, Ying Li, Yu Wang, Yesu Feng, Arjun Rao, Rein Houthooft, Shradha Sehgal, Jin Wang, Ha...

2602.20558 • 2026-02-24

View PDF

AI LLM

CAD-Prompted SAM3: Geometry-Conditioned Instance Segmentation for Industrial Objects

Verbal-prompted segmentation is inherently limited by the expressiveness of natural language and struggles with uncommon, instance-specific, or difficult-to-describe objects: scenarios frequently e...

Zhenran Tang, Rohan Nagabhirava, Changliu Liu

2602.20551 • 2026-02-24

View PDF

AI LLM

What Drives Students' Use of AI Chatbots? Technology Acceptance in Conversational AI

Conversational AI tools have been rapidly adopted by students and are becoming part of their learning routines. To understand what drives this adoption, we draw on the Technology Acceptance Model (...

Griffin Pitts, Sanaz Motamedi

2602.20547 • 2026-02-24

View PDF

AI LLM

Generative AI and Machine Learning Collaboration for Container Dwell Time Prediction via Data Standardization

Import container dwell time (ICDT) prediction is a key task for improving productivity in container terminals, as accurate predictions enable the reduction of container re-handling operations by ya...

Minseop Kim, Takhyeong Kim, Taekhyun Park, Hanbyeol Park, Hyerim Bae

2602.20540 • 2026-02-24

View PDF

AI LLM

Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this...

Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, San...

2602.20532 • 2026-02-24

View PDF

AI LLM

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to tok...

Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian ...

2602.20528 • 2026-02-24

View PDF

Papers