Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

TrustFed: Enabling Trustworthy Medical AI under Data Privacy Constraints

Protecting patient privacy remains a fundamental barrier to scaling machine learning across healthcare institutions, where centralizing sensitive data is often infeasible due to ethical, legal, and...

Vagish Kumar, Syed Bahauddin Alam, Souvik Chakraborty

2603.21656 2026-03-23
AI LLM

Are AI-assisted Development Tools Immune to Prompt Injection?

Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool use...

Charoes Huang, Xin Huang, Amin Milani Fard

2603.21642 2026-03-23
AI LLM

Auditing MCP Servers for Over-Privileged Tool Capabilities

The Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models (LLMs) to external tools and data. However, MCP servers often expose privileged capabilities, such as...

Charoes Huang, Xin Huang, Amin Milani Fard

2603.21641 2026-03-23
AI LLM

Engineering Distributed Governance for Regional Prosperity: A Socio-Technical Framework for Mitigating Under-Vibrancy via Human Data Engines

Most research in urban informatics and tourism focuses on mitigating overtourism in dense global cities. However, for regions experiencing demographic decline and structural stagnation, the primary...

Amil Khanzada, Takuji Takemoto

2603.21639 2026-03-23
AI LLM

Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks

Public benchmarks increasingly govern how large language models (LLMs) are ranked, selected, and deployed. We frame this benchmark-centered regime as Silicon Bureaucracy and AI Test-Oriented Educat...

Yiliang Song, Hongjun An, Jiangan Chen, Xuanchen Yan, Huan Song, Jiawei Shao, Xuelong Li

2603.21636 2026-03-23
AI LLM

EnterpriseLab: A Full-Stack Platform for developing and deploying agents in Enterprises

Deploying AI agents in enterprise environments requires balancing capability with data sovereignty and cost constraints. While small language models offer privacy-preserving alternatives to frontie...

Ankush Agarwal, Harsh Vishwakarma, Suraj Nagaje, Chaitanya Devaguptapu

2603.21630 2026-03-23
AI LLM

Efficient Zero-Shot AI-Generated Image Detection

The rapid progress of text-to-image models has made AI-generated images increasingly realistic, posing significant challenges for accurate detection of generated content. While training-based detec...

Ryosuke Sonoda, Ramya Srinivasan

2603.21619 2026-03-23
AI LLM

INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation

While retrieval-augmented generation (RAG) significantly improves the factual reliability of LLMs, it does not eliminate hallucinations, so robust uncertainty quantification (UQ) remains essential....

Alexandra Bazarova, Andrei Volodichev, Daria Kotova, Alexey Zaytsev

2603.21607 2026-03-23
AI LLM

Riemannian Geometry Speaks Louder Than Words: From Graph Foundation Model to Next-Generation Graph Intelligence

Graphs provide a natural description of the complex relationships among objects, and play a pivotal role in communications, transportation, social computing, the life sciences, etc. Currently, ther...

Philip S. Yu, Li Sun

2603.21601 2026-03-23
AI LLM

A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment

Modern clinical practice increasingly depends on reasoning over heterogeneous, evolving, and incomplete patient data. Although recent advances in multimodal foundation models have improved performa...

Sheng Liu, Long Chen, Zeyun Zhao, Qinglin Gou, Qingyue Wei, Arjun Masurkar, Kevin M. Spiegler, Ph...

2603.21597 2026-03-23
AI LLM

Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track

Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedica...

Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, Kirk Roberts

2603.21582 2026-03-23
AI LLM

Mind over Space: Can Multimodal Large Language Models Mentally Navigate?

Despite the widespread adoption of MLLMs in embodied agents, their capabilities remain largely confined to reactive planning from immediate observations, consistently failing in spatial reasoning a...

Qihui Zhu, Shouwei Ruan, Xiao Yang, Hao Jiang, Yao Huang, Shiji Zhao, Hanwei Fan, Hang Su, Xingxi...

2603.21577 2026-03-23
AI LLM

PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection

Long-context LLM inference is bottlenecked not by compute but by the O(n) memory bandwidth cost of scanning the KV cache at every decode step -- a wall that no amount of arithmetic scaling can brea...

Hyoseok Park, Yeonsang Park

2603.21576 2026-03-23
AI LLM

DATASHI: A Parallel English-Tashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing

DATASHI is a new parallel English-Tashlhiyt corpus that fills a critical gap in computational resources for Amazigh languages. It contains 5,000 sentence pairs, including a 1,500-sentence subset wi...

Nasser-Eddine Monir, Zakaria Baou

2603.21571 2026-03-23
AI LLM

Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy

Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges f...

Andrii Shportko

2603.21567 2026-03-23
AI LLM

CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation

We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Posit...

Mohammad Eslami, Dhanvinkumar Ganeshkumar, Saber Kazeminasab, Michael G. Morley, Michael V. Bolan...

2603.21566 2026-03-23
AI LLM

Counterfactual Credit Policy Optimization for Multi-Agent Collaboration

Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles and aggregating diverse hypotheses. Yet, reinforcement learning (RL) for such systems i...

Zhongyi Li, Wan Tian, Yikun Ban, Jinju Chen, Huiming Zhang, Yang Liu, Fuzhen Zhuang

2603.21563 2026-03-23
AI LLM

Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection

Unsupervised Continuous Anomaly Detection (UCAD) is gaining attention for effectively addressing the catastrophic forgetting and heavy computational burden issues in traditional Unsupervised Anomal...

Mingle Zhou, Jiahui Liu, Jin Wan, Gang Li, Min Li

2603.21562 2026-03-23
AI LLM

AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes

Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition de...

Haoran Xi, Minghao Shao, Kimberly Milner, Venkata Sai Charan Putrevu, Nanda Rani, Meet Udeshi, Pr...

2603.21551 2026-03-23
AI LLM

LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search

Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reli...

Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao

2603.21530 2026-03-23