Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

AI Psychosis: Does Conversational AI Amplify Delusion-Related Language?

Conversational AI systems are increasingly used for personal reflection and emotional disclosure, raising concerns about their effects on vulnerable users. Recent anecdotal reports suggest that pro...

Soorya Ram Shimgekar, Vipin Gunda, Jiwon Kim, Violeta J. Rodriguez, Hari Sundaram, Koustuv Saha

2603.19574 2026-03-20
AI LLM

AI as Relational Translator: Rethinking Belonging and Mutual Legibility in Cross-Cultural Contexts

Against rising global loneliness, AI companions promise connection, yet accumulating evidence suggests that, for some users and contexts, intensive companion-style use can correlate with increased ...

Yao Xiao, Rafael A. Calvo

2603.19568 2026-03-20
AI LLM

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveal...

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang, Jun Zhu, Deyu Meng

2603.19562 2026-03-20
AI LLM

TextReasoningBench: Does Reasoning Really Improve Text Classification in Large Language Models?

Eliciting explicit, step-by-step reasoning traces from large language models (LLMs) has emerged as a dominant paradigm for enhancing model capabilities. Although such reasoning strategies were orig...

Xinyu Guo, Yazhou Zhang, Jing Qin

2603.19558 2026-03-20
AI LLM

SpecZoo: An AI-Powered Platform for Spectral Analysis and Visualization in Science and Education

Astronomical spectra, which encode rich astrophysical and chemical information, are fundamental to understanding celestial objects and universal laws. The advent of large-scale spectroscopic survey...

Yuan-Hao Pu, Guo-Hong Lei, Yang Xu, Xun-Zhou Chen, Hai-Jun Tian

2603.19555 2026-03-20
AI LLM

Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing

This study examined how moral disengagement influences Filipino college students' intention to use ChatGPT in academic writing. The model tested five mechanisms: moral justification, euphemistic la...

John Paul P. Miranda, Rhiziel P. Manalese, Mark Anthony A. Castro, Renen Paul M. Viado, Vernon Gr...

2603.19549 2026-03-20
AI LLM

FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment

We introduce an expert curated, real-world benchmark for evaluating document-grounded question-answering (QA) motivated by generic drug assessment, using the U.S. Food and Drug Administration (FDA)...

Betty Xiong, Jillian Fisher, Benjamin Newman, Meng Hu, Shivangi Gupta, Yejin Choi, Lanyan Fang, R...

2603.19539 2026-03-20
AI LLM

EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models

Large Language Models (LLMs) are fluent but prone to hallucinations, producing answers that appear plausible yet are unsupported by available evidence. This failure is especially problematic in hig...

J. Ben Tamo, Yuxing Lu, Benoit L. Marteau, Micky C. Nnamdi, May D. Wang

2603.19532 2026-03-20
AI LLM

Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2

Generative video models are increasingly capable of producing complex depictions of mental health experiences, yet little is known about how these systems represent conditions like depression. This...

Matthew Flathers, Griffin Smith, Julian Herpertz, Zhitong Zhou, John Torous

2603.19527 2026-03-19
AI LLM

Inducing Sustained Creativity and Diversity in Large Language Models

We address a not-widely-recognized subset of exploratory search, where a user sets out on a typically long "search quest" for the perfect wedding dress, overlooked research topic, killer company id...

Queenie Luo, Gary King, Michael Puett, Michael D. Smith

2603.19519 2026-03-19
AI LLM

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models

Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on specific reasoning or plann...

Tianlong Wang, Pinqiao Wang, Weili Shi, Sheng li

2603.19515 2026-03-19
AI LLM

Learning to Disprove: Formal Counterexample Generation with Large Language Models

Mathematical reasoning demands two critical, complementary skills: constructing rigorous proofs for true statements and discovering counterexamples that disprove false ones. However, current AI eff...

Zenan Li, Zhaoyu Li, Kaiyu Yang, Xiaoxing Ma, Zhendong Su

2603.19514 2026-03-19
AI LLM

FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy

The reliability of artificial intelligence (AI) in medical imaging critically depends on its robustness to heterogeneous and corrupted images acquired with diverse devices across different hospital...

Ivan Reyes-Amezcua, Francisco Lopez-Tiro, Clément Larose, Christian Daul, Andres Mendez-Vazquez, ...

2603.19512 2026-03-19
AI LLM

AI-Ready Control System for the Fermilab Accelerator Complex

Reliable, high-intensity operation of the Fermilab Accelerator Complex is critical to the success of the Long-Baseline Neutrino Facility and Deep Underground Neutrino Experiment. We describe the re...

Tia Miceli, Erik Gottschalk, Donovan Tooke, Evan Milton, Robert Santucci, Hayden Hoschouer, Micha...

2603.19507 2026-03-19
AI LLM

Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks

More scientists are now using AI, but prior studies have examined only how they use it 'at the desk' for computer-based work. However, given that scientific work often happens 'beyond the desk' at ...

Irene Hou, Alexander Qin, Lauren Cheng, Philip J. Guo

2603.19504 2026-03-19
AI LLM

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

Off-policy problems such as policy staleness and training-inference mismatch, has become a major bottleneck for training stability and further exploration for LLM RL. To enhance inference efficienc...

Chenlu Ye, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang, Abhinav Gullapalli, Hao Chen, Jing Hu...

2603.19470 2026-03-19
AI LLM

A Framework for Formalizing LLM Agent Security

Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the a...

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song

2603.19469 2026-03-19
AI LLM

Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3

We analyze a fixed-point iteration $v \leftarrow φ(v)$ arising in the optimization of a regularized nuclear norm objective involving the Hadamard product structure, posed in~\cite{denisov} in the c...

Keith Rush

2603.19465 2026-03-19
AI LLM

Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification

Robotic path planning problems are often NP-hard, and practical solutions typically rely on approximation algorithms with provable performance guarantees for general cases. While designing such alg...

Zhengbang Yang, Md. Tasin Tazwar, Minghan Wei, Zhuangdi Zhu

2603.19464 2026-03-19
AI LLM

Hyperagents

Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed,...

Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, Tat...

2603.19461 2026-03-19