Papers
Research papers from arXiv and related sources
AI Psychosis: Does Conversational AI Amplify Delusion-Related Language?
Conversational AI systems are increasingly used for personal reflection and emotional disclosure, raising concerns about their effects on vulnerable users. Recent anecdotal reports suggest that pro...
Soorya Ram Shimgekar, Vipin Gunda, Jiwon Kim, Violeta J. Rodriguez, Hari Sundaram, Koustuv Saha
AI as Relational Translator: Rethinking Belonging and Mutual Legibility in Cross-Cultural Contexts
Against rising global loneliness, AI companions promise connection, yet accumulating evidence suggests that, for some users and contexts, intensive companion-style use can correlate with increased ...
Yao Xiao, Rafael A. Calvo
Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination
Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveal...
Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang, Jun Zhu, Deyu Meng
TextReasoningBench: Does Reasoning Really Improve Text Classification in Large Language Models?
Eliciting explicit, step-by-step reasoning traces from large language models (LLMs) has emerged as a dominant paradigm for enhancing model capabilities. Although such reasoning strategies were orig...
Xinyu Guo, Yazhou Zhang, Jing Qin
SpecZoo: An AI-Powered Platform for Spectral Analysis and Visualization in Science and Education
Astronomical spectra, which encode rich astrophysical and chemical information, are fundamental to understanding celestial objects and universal laws. The advent of large-scale spectroscopic survey...
Yuan-Hao Pu, Guo-Hong Lei, Yang Xu, Xun-Zhou Chen, Hai-Jun Tian
Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing
This study examined how moral disengagement influences Filipino college students' intention to use ChatGPT in academic writing. The model tested five mechanisms: moral justification, euphemistic la...
John Paul P. Miranda, Rhiziel P. Manalese, Mark Anthony A. Castro, Renen Paul M. Viado, Vernon Gr...
FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment
We introduce an expert curated, real-world benchmark for evaluating document-grounded question-answering (QA) motivated by generic drug assessment, using the U.S. Food and Drug Administration (FDA)...
Betty Xiong, Jillian Fisher, Benjamin Newman, Meng Hu, Shivangi Gupta, Yejin Choi, Lanyan Fang, R...
EvidenceRL: Reinforcing Evidence Consistency for Trustworthy Language Models
Large Language Models (LLMs) are fluent but prone to hallucinations, producing answers that appear plausible yet are unsupported by available evidence. This failure is especially problematic in hig...
J. Ben Tamo, Yuxing Lu, Benoit L. Marteau, Micky C. Nnamdi, May D. Wang
Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2
Generative video models are increasingly capable of producing complex depictions of mental health experiences, yet little is known about how these systems represent conditions like depression. This...
Matthew Flathers, Griffin Smith, Julian Herpertz, Zhitong Zhou, John Torous
Inducing Sustained Creativity and Diversity in Large Language Models
We address a not-widely-recognized subset of exploratory search, where a user sets out on a typically long "search quest" for the perfect wedding dress, overlooked research topic, killer company id...
Queenie Luo, Gary King, Michael Puett, Michael D. Smith
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models
Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on specific reasoning or plann...
Tianlong Wang, Pinqiao Wang, Weili Shi, Sheng li
Learning to Disprove: Formal Counterexample Generation with Large Language Models
Mathematical reasoning demands two critical, complementary skills: constructing rigorous proofs for true statements and discovering counterexamples that disprove false ones. However, current AI eff...
Zenan Li, Zhaoyu Li, Kaiyu Yang, Xiaoxing Ma, Zhendong Su
FedAgain: A Trust-Based and Robust Federated Learning Strategy for an Automated Kidney Stone Identification in Ureteroscopy
The reliability of artificial intelligence (AI) in medical imaging critically depends on its robustness to heterogeneous and corrupted images acquired with diverse devices across different hospital...
Ivan Reyes-Amezcua, Francisco Lopez-Tiro, Clément Larose, Christian Daul, Andres Mendez-Vazquez, ...
AI-Ready Control System for the Fermilab Accelerator Complex
Reliable, high-intensity operation of the Fermilab Accelerator Complex is critical to the success of the Long-Baseline Neutrino Facility and Deep Underground Neutrino Experiment. We describe the re...
Tia Miceli, Erik Gottschalk, Donovan Tooke, Evan Milton, Robert Santucci, Hayden Hoschouer, Micha...
Beyond the Desk: Barriers and Future Opportunities for AI to Assist Scientists in Embodied Physical Tasks
More scientists are now using AI, but prior studies have examined only how they use it 'at the desk' for computer-based work. However, given that scientific work often happens 'beyond the desk' at ...
Irene Hou, Alexander Qin, Lauren Cheng, Philip J. Guo
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
Off-policy problems such as policy staleness and training-inference mismatch, has become a major bottleneck for training stability and further exploration for LLM RL. To enhance inference efficienc...
Chenlu Ye, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang, Abhinav Gullapalli, Hao Chen, Jing Hu...
A Framework for Formalizing LLM Agent Security
Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the a...
Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3
We analyze a fixed-point iteration $v \leftarrow φ(v)$ arising in the optimization of a regularized nuclear norm objective involving the Hadamard product structure, posed in~\cite{denisov} in the c...
Keith Rush
Can LLMs Prove Robotic Path Planning Optimality? A Benchmark for Research-Level Algorithm Verification
Robotic path planning problems are often NP-hard, and practical solutions typically rely on approximation algorithms with provable performance guarantees for general cases. While designing such alg...
Zhengbang Yang, Md. Tasin Tazwar, Minghan Wei, Zhuangdi Zhu
Hyperagents
Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed,...
Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, Tat...