Personal Assistant Web

AI LLM

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

While large language models have accelerated software development through "vibe coding", prototyping intelligent Extended Reality (XR) experiences remains inaccessible due to the friction of comple...

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Xingyue Chen...

2603.24591 • 2026-03-25

View PDF

AI LLM

Comparing Developer and LLM Biases in Code Evaluation

As LLMs are increasingly used as judges in code applications, they should be evaluated in realistic interactive settings that capture partial context and ambiguous intent. We present TRACE (Tool fo...

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donahue, Ameet Talwalkar,...

2603.24586 • 2026-03-25

View PDF

AI LLM

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliability and oversight cost. When deterministic workflows are replaced by stochastic policie...

Biplab Pal, Santanu Bhattacharya

2603.24582 • 2026-03-25

View PDF

AI LLM

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Retrieval-augmented generation (RAG) systems are increasingly used to analyze complex policy documents, but achieving sufficient reliability for expert usage remains challenging in domains characte...

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, Tunazzina Islam

2603.24580 • 2026-03-25

View PDF

AI LLM

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. W...

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie Hu, Yu Qin, Erchao ...

2603.24579 • 2026-03-25

View PDF

AI LLM

Anti-I2V: Safeguarding your photos from malicious image-to-video generation

Advances in diffusion-based video generation models, while significantly improving human animation, poses threats of misuse through the creation of fake videos from a specific person's photo and te...

Duc Vu, Anh Nguyen, Chi Tran, Anh Tran

2603.24570 • 2026-03-25

View PDF

AI LLM

Boosting LLMs for Mutation Generation

LLM-based mutation testing is a promising testing technology, but existing approaches typically rely on a fixed set of mutations as few-shot examples or none at all. This can result in generic low-...

Bo Wang, Ming Deng, Mingda Chen, Chengran Yang, Youfang Lin, Mark Harman, Mike Papadakis, Jie M. ...

2603.24560 • 2026-03-25

View PDF

AI LLM

Evaluating Chunking Strategies For Retrieval-Augmented Generation in Oil and Gas Enterprise Documents

Retrieval-Augmented Generation (RAG) has emerged as a framework to address the constraints of Large Language Models (LLMs). Yet, its effectiveness fundamentally hinges on document chunking - an oft...

Samuel Taiwo, Mohd Amaluddin Yusoff

2603.24556 • 2026-03-25

View PDF

AI LLM

Analysing the Safety Pitfalls of Steering Vectors

Activation steering has emerged as a powerful tool to shape LLM behavior without the need for weight updates. While its inherent brittleness and unreliability are well-documented, its safety implic...

Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci

2603.24543 • 2026-03-25

View PDF

AI LLM

Robust Multilingual Text-to-Pictogram Mapping for Scalable Reading Rehabilitation

Reading comprehension presents a significant challenge for children with Special Educational Needs and Disabilities (SEND), often requiring intensive one-on-one reading support. To assist therapist...

Soufiane Jhilal, Martina Galletti

2603.24536 • 2026-03-25

View PDF

AI LLM

No Single Metric Tells the Whole Story: A Multi-Dimensional Evaluation Framework for Uncertainty Attributions

Research on explainable AI (XAI) has frequently focused on explaining model predictions. More recently, methods have been proposed to explain prediction uncertainty by attributing it to input featu...

Emily Schiller, Teodor Chiaburu, Marco Zullich, Luca Longo

2603.24524 • 2026-03-25

View PDF

AI LLM

TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models

To embed domain-specific or specialized knowledge into pre-trained foundation models, fine-tuning using techniques such as parameter efficient fine-tuning (e.g. LoRA) is a common practice. However,...

Yushi Guan, Jeanine Ohene-Agyei, Daniel Kwan, Jean Sebastien Dandurand, Yifei Zhang, Nandita Vija...

2603.24518 • 2026-03-25

View PDF

AI LLM

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with...

Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haich...

2603.24517 • 2026-03-25

View PDF

AI LLM

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autores...

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym A...

2603.24511 • 2026-03-25

View PDF

AI LLM

Video-Only ToM: Enhancing Theory of Mind in Multimodal Large Language Models

As large language models (LLMs) continue to advance, there is increasing interest in their ability to infer human mental states and demonstrate a human-like Theory of Mind (ToM). Most existing ToM ...

Siqi Liu, Xinyang Li, Bochao Zou, Junbao Zhuo, Huimin Ma, Jiansheng Chen

2603.24484 • 2026-03-25

View PDF

AI LLM

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is always overconfident offers no useful signal for deferral. We present a multi-agent fr...

John Ray B. Martinez

2603.24481 • 2026-03-25

View PDF

AI LLM

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can...

Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim, Jiwon Jeon, Dongsheng Li, Yuqin...

2603.24472 • 2026-03-25

View PDF

AI LLM

Counting Without Numbers \& Finding Without Words

Every year, 10 million pets enter shelters, separated from their families. Despite desperate searches by both guardians and lost animals, 70% never reunite, not because matches do not exist, but be...

Badri Narayana Patro

2603.24470 • 2026-03-25

View PDF

AI LLM

Mechanic: Sorrifier-Driven Formal Decomposition Workflow for Automated Theorem Proving

Recent advances in large language models (LLMs) and LLM-based agents have substantially improved the capabilities of automated theorem proving. However, for problems requiring complex mathematical ...

Ruichen Qiu, Yichuan Cao, Junqi Liu, Dakai Guo, Xiao-Shan Gao, Lihong Zhi, Ruyong Feng

2603.24465 • 2026-03-25

View PDF

AI LLM

Unleashing Vision-Language Semantics for Deepfake Video Detection

Recent Deepfake Video Detection (DFD) studies have demonstrated that pre-trained Vision-Language Models (VLMs) such as CLIP exhibit strong generalization capabilities in detecting artifacts across ...

Jiawen Zhu, Yunqi Miao, Xueyi Zhang, Jiankang Deng, Guansong Pang

2603.24454 • 2026-03-25

View PDF

Papers