Papers
Research papers from arXiv and related sources
TrustFed: Enabling Trustworthy Medical AI under Data Privacy Constraints
Protecting patient privacy remains a fundamental barrier to scaling machine learning across healthcare institutions, where centralizing sensitive data is often infeasible due to ethical, legal, and...
Vagish Kumar, Syed Bahauddin Alam, Souvik Chakraborty
Are AI-assisted Development Tools Immune to Prompt Injection?
Prompt injection is listed as the number-one vulnerability class in the OWASP Top 10 for LLM Applications that can subvert LLM guardrails, disclose sensitive data, and trigger unauthorized tool use...
Charoes Huang, Xin Huang, Amin Milani Fard
Auditing MCP Servers for Over-Privileged Tool Capabilities
The Model Context Protocol (MCP) has emerged as a standard for connecting Large Language Models (LLMs) to external tools and data. However, MCP servers often expose privileged capabilities, such as...
Charoes Huang, Xin Huang, Amin Milani Fard
Engineering Distributed Governance for Regional Prosperity: A Socio-Technical Framework for Mitigating Under-Vibrancy via Human Data Engines
Most research in urban informatics and tourism focuses on mitigating overtourism in dense global cities. However, for regions experiencing demographic decline and structural stagnation, the primary...
Amil Khanzada, Takuji Takemoto
Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks
Public benchmarks increasingly govern how large language models (LLMs) are ranked, selected, and deployed. We frame this benchmark-centered regime as Silicon Bureaucracy and AI Test-Oriented Educat...
Yiliang Song, Hongjun An, Jiangan Chen, Xuanchen Yan, Huan Song, Jiawei Shao, Xuelong Li
EnterpriseLab: A Full-Stack Platform for developing and deploying agents in Enterprises
Deploying AI agents in enterprise environments requires balancing capability with data sovereignty and cost constraints. While small language models offer privacy-preserving alternatives to frontie...
Ankush Agarwal, Harsh Vishwakarma, Suraj Nagaje, Chaitanya Devaguptapu
Efficient Zero-Shot AI-Generated Image Detection
The rapid progress of text-to-image models has made AI-generated images increasingly realistic, posing significant challenges for accurate detection of generated content. While training-based detec...
Ryosuke Sonoda, Ramya Srinivasan
INTRYGUE: Induction-Aware Entropy Gating for Reliable RAG Uncertainty Estimation
While retrieval-augmented generation (RAG) significantly improves the factual reliability of LLMs, it does not eliminate hallucinations, so robust uncertainty quantification (UQ) remains essential....
Alexandra Bazarova, Andrei Volodichev, Daria Kotova, Alexey Zaytsev
Riemannian Geometry Speaks Louder Than Words: From Graph Foundation Model to Next-Generation Graph Intelligence
Graphs provide a natural description of the complex relationships among objects, and play a pivotal role in communications, transportation, social computing, the life sciences, etc. Currently, ther...
Philip S. Yu, Li Sun
A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment
Modern clinical practice increasingly depends on reasoning over heterogeneous, evolving, and incomplete patient data. Although recent advances in multimodal foundation models have improved performa...
Sheng Liu, Long Chen, Zeyun Zhao, Qinglin Gou, Qingyue Wei, Arjun Masurkar, Kevin M. Spiegler, Ph...
Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track
Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedica...
Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, Kirk Roberts
Mind over Space: Can Multimodal Large Language Models Mentally Navigate?
Despite the widespread adoption of MLLMs in embodied agents, their capabilities remain largely confined to reactive planning from immediate observations, consistently failing in spatial reasoning a...
Qihui Zhu, Shouwei Ruan, Xiao Yang, Hao Jiang, Yao Huang, Shiji Zhao, Hanwei Fan, Hang Su, Xingxi...
PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection
Long-context LLM inference is bottlenecked not by compute but by the O(n) memory bandwidth cost of scanning the KV cache at every decode step -- a wall that no amount of arithmetic scaling can brea...
Hyoseok Park, Yeonsang Park
DATASHI: A Parallel English-Tashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing
DATASHI is a new parallel English-Tashlhiyt corpus that fills a critical gap in computational resources for Amazigh languages. It contains 5,000 sentence pairs, including a 1,500-sentence subset wi...
Nasser-Eddine Monir, Zakaria Baou
Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy
Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges f...
Andrii Shportko
CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation
We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Posit...
Mohammad Eslami, Dhanvinkumar Ganeshkumar, Saber Kazeminasab, Michael G. Morley, Michael V. Bolan...
Counterfactual Credit Policy Optimization for Multi-Agent Collaboration
Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles and aggregating diverse hypotheses. Yet, reinforcement learning (RL) for such systems i...
Zhongyi Li, Wan Tian, Yikun Ban, Jinju Chen, Huiming Zhang, Yang Liu, Fuzhen Zhuang
Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection
Unsupervised Continuous Anomaly Detection (UCAD) is gaining attention for effectively addressing the catastrophic forgetting and heavy computational burden issues in traditional Unsupervised Anomal...
Mingle Zhou, Jiahui Liu, Jin Wan, Gang Li, Min Li
AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes
Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition de...
Haoran Xi, Minghao Shao, Kimberly Milner, Venkata Sai Charan Putrevu, Nanda Rani, Meet Udeshi, Pr...
LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search
Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reli...
Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao