Papers
Research papers from arXiv and related sources
Good-Enough LLM Obfuscation (GELO)
Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy ...
Anatoly Belikov, Ilya Fedotov
The "Gold Rush" in AI and Robotics Patenting Activity. Do innovation systems have a role?
This paper studies patenting trends in artificial intelligence (AI) and robotics from 1980 to 2019. We introduce a novel distinction between traditional robotics and robotics embedding AI functiona...
Giovanni Guidetti, Riccardo Leoncini, Mariele Macaluso
AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
AI agents that build user interfaces on the fly assembling buttons, forms, and data displays from structured protocol payloads are becoming common in production systems. The trouble is that a paylo...
Mohd Safwan Uddin, Saba Hajira
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly observed to exhibit risky behaviors when subjected to survival pressure, such as the threat of bein...
Yida Lu, Jianwei Fang, Xuyang Shao, Zixuan Chen, Shiyao Cui, Shanshan Bian, Guangyao Su, Pei Ke, ...
S5-SHB Agent: Society 5.0 enabled Multi-model Agentic Blockchain Framework for Smart Home
The smart home is a key application domain within the Society 5.0 vision for a human-centered society. As smart home ecosystems expand with heterogeneous IoT protocols, diverse devices, and evolvin...
Janani Rangila, Akila Siriweera, Incheon Paik, Keitaro Naruse, Isuru Jayanada, Vishmika Devindi
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
Building software repositories typically requires significant manual effort. Recent advances in large language model (LLM) agents have accelerated automation in software engineering (SWE). We intro...
Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin, Liao Zhu, Xiaosong Huang, Geng Zhang, Yikai Zhang...
Measuring the Fragility of Trust: Devising Credibility Index via Explanation Stability (CIES) for Business Decision Support Systems
Explainable Artificial Intelligence (XAI) methods (SHAP, LIME) are increasingly adopted to interpret models in high-stakes businesses. However, the credibility of these explanations, their stabilit...
Alin-Gabriel Vaduva, Simona-Vasilica Oprea, Adela Bara
Haptics in Cognition: Disruptor or Enabler of Memory?
This exploratory pilot study investigates the impact of haptic perception --specifically tactile sensitivity (touch) and kinaesthetic intensity (movement)-- on learning, operationalized as informat...
Bibeg Limbu, Irene-Angelica Chounta
BioLLMAgent: A Hybrid Framework with Enhanced Structural Interpretability for Simulating Human Decision-Making in Computational Psychiatry
Computational psychiatry faces a fundamental trade-off: traditional reinforcement learning (RL) models offer interpretability but lack behavioral realism, while large language model (LLM) agents ge...
Zuo Fei, Kezhi Wang, Xiaomin Chen, Yizhou Huang
Tell2Adapt: A Unified Framework for Source Free Unsupervised Domain Adaptation via Vision Foundation Model
Source Free Unsupervised Domain Adaptation (SFUDA) is critical for deploying deep learning models across diverse clinical settings. However, existing methods are typically designed for low-gap, spe...
Yulong Shi, Shijie Li, Ziyi Li, Lin Qi
Observational and Thermodynamic aspects of one-dimensional Dark Energy EoS parametrization models
We examine the observational viability and physical implications of the Gong-Zhang (GZ) dark--energy equation-of-state parametrizations using exclusively late-time cosmological probes. Two one-dime...
Anirban Chatterjee, Yungui Gong
Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks
Graph Neural Networks (GNNs) have achieved remarkable results in various tasks. Recent studies reveal that graph backdoor attacks can poison the GNN model to predict test nodes with triggers attach...
Yuxiang Zhang, Bin Ma, Enyan Dai
ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts
The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded risks underexplored. In this work, we investigate LL...
Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul, Pakhapoom Sarapat
Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis
Can targeted user training unlock the productive potential of generative artificial intelligence (GenAI) in professional settings? We investigate this question using a randomized study involving 16...
Benjamin M. Chen, Hong Bao
Think, Then Verify: A Hypothesis-Verification Multi-Agent Framework for Long Video Understanding
Long video understanding is challenging due to dense visual redundancy, long-range temporal dependencies, and the tendency of chain-of-thought and retrieval-based agents to accumulate semantic drif...
Zheng Wang, Haoran Chen, Haoxuan Qin, Zhipeng Wei, Tianwen Qian, Cong Bai
3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
Reinforcement Learning with Verifiable Rewards ( RLVR ) has emerged as a transformative paradigm for enhancing the reasoning capabilities of Large Language Models ( LLMs), yet its potential in 3D s...
Xiongkun Linghu, Jiangyong Huang, Baoxiong Jia, Siyuan Huang
VRM: Teaching Reward Models to Understand Authentic Human Preferences
Large Language Models (LLMs) have achieved remarkable success across diverse natural language tasks, yet the reward models employed for aligning LLMs often encounter challenges of reward hacking, w...
Biao Liu, Ning Xu, Junming Yang, Hao Xu, Xin Geng
Functionality-Oriented LLM Merging on the Fisher--Rao Manifold
Weight-space merging aims to combine multiple fine-tuned LLMs into a single model without retraining, yet most existing approaches remain fundamentally parameter-space heuristics. This creates thre...
Jiayu Wang, Zuojun Ye, Wenpeng Yin
MPCEval: A Benchmark for Multi-Party Conversation Generation
Multi-party conversation generation, such as smart reply and collaborative assistants, is an increasingly important capability of generative AI, yet its evaluation remains a critical bottleneck. Co...
Minxing Zhang, Yi Yang, Zhuofan Jia, Xuan Yang, Jian Pei, Yuchen Zang, Xingwang Deng, Xianglong Chen
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Preference alignment is an essential step in adapting large language models (LLMs) to human values, but existing approaches typically depend on costly human annotations or large-scale API-based mod...
Amirabbas Afzali, Myeongho Jeon, Maria Brbic