Papers
Research papers from arXiv and related sources
Bridging Physically Based Rendering and Diffusion Models with Stochastic Differential Equation
Diffusion-based image generators excel at producing realistic content from text or image conditions, but they offer only limited explicit control over low-level, physically grounded shading and mat...
Junwei Shu, Wenjie Liu, Changgu Chen, Hantang Liu, Yang Li, Changbo Wang
Deep Reinforcement Learning Based Block Coordinate Descent for Downlink Weighted Sum-rate Maximization on AI-Native Wireless Networks
This paper introduces a deep reinforcement learning-based block coordinate descent (DRL-based BCD) algorithm to address the nonconvex weighted sum-rate maximization (WSRM) problem with a total powe...
Siya Chen, Chee Wei Tan, H. Vincent Poor
CleanStyle: Plug-and-Play Style Conditioning Purification for Text-to-Image Stylization
Style transfer in diffusion models enables controllable visual generation by injecting the style of a reference image. However, recent encoder-based methods, while efficient and tuning-free, often ...
Xiaoman Feng, Mingkun Lei, Yang Wang, Dingwen Fu, Chi Zhang
AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs
The integration of external data services (e.g., Model Context Protocol, MCP) has made large language model-based agents increasingly powerful for complex task execution. However, this advancement ...
Che Wang, Jiaming Zhang, Ziqi Zhang, Zijie Wang, Yinghui Wang, Jianbo Gao, Tao Wei, Zhong Chen, W...
PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring
As Large Language Models (LLMs) are increasingly integrated into software development workflows, their trustworthiness has become a critical concern. However, in dependency recommendation scenarios...
Xiting Liu, Yuetong Liu, Yitong Zhang, Jia Li, Shi-Min Hu
Counterfactual Simulation Training for Chain-of-Thought Faithfulness
Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM produced its output. But well-known problems with CoT faithfulness severely limit what insights can ...
Peter Hase, Christopher Potts
Onboard-Targeted Segmentation of Straylight in Space Camera Sensors
This study details an artificial intelligence (AI)-based methodology for the semantic segmentation of space camera faults. Specifically, we address the segmentation of straylight effects induced by...
Riccardo Gallon, Fabian Schiemenz, Alessandra Menicucci, Eberhard Gill
ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typic...
Che Wang, Fuyao Zhang, Jiaming Zhang, Ziqi Zhang, Yinghui Wang, Longtao Huang, Jianbo Gao, Zhong ...
PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding
Reliable AI systems require large language models (LLMs) to exhibit behaviors aligned with human preferences and values. However, most existing alignment approaches operate at training time and rel...
Baolong Bi, Yuyao Ge, Shenghua Liu, Yuchen He, Siqian Tong, Lizhe Chen, Lingrui Mei, Zehao Li, Yi...
Agile V: A Compliance-Ready Framework for AI-Augmented Engineering -- From Concept to Audit-Ready Delivery
Current AI-assisted engineering workflows lack a built-in mechanism to maintain task-level verification and regulatory traceability at machine-speed delivery. Agile V addresses this gap by embeddin...
Christopher Koch, Joshua Andreas Wellbrock
Grid-Mind: An LLM-Orchestrated Multi-Fidelity Agent for Automated Connection Impact Assessment
Large language models (LLMs) have demonstrated remarkable tool-use capabilities, yet their application to power system operations remains largely unexplored. This paper presents Grid-Mind, a domain...
Mohamed Shamseldein
UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witness...
Wei Chen, Yuqian Wu, Junle Chen, Xiaofang Zhou, Yuxuan Liang
PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization
In search systems, effectively coordinating the two core objectives of search relevance matching and click-through rate (CTR) prediction is crucial for discovering users' interests and enhancing pl...
Shuzhi Cao, Rong Chen, Ailong He, Shuguang Han, Jufeng Chen
BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models
Text-to-image models have rapidly advanced in realism and controllability, with recent approaches leveraging long, detailed captions to support fine-grained generation. However, a fundamental param...
Eliran Kachlon, Alexander Visheratin, Nimrod Sarid, Tal Hacham, Eyal Gutflaish, Saar Huberman, He...
Autonomous Laboratory Agent via Customized Domain-Specific Language Model and Modular AI Interface
We introduce a system architecture that addresses a fundamental challenge in deploying language-model agents for autonomous control of scientific instrumentation: ensuring reliability in safety-cri...
Zhuo Diao, Kouma Matsumoto, Linfeng Hou, Hayato Yamashita, Masayuki Abe
AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
Custom Storyboard Generation (CSG) aims to produce high-quality, multi-character consistent storytelling. Current approaches based on static diffusion models, whether used in a one-shot manner or w...
Hailong Yan, Shice Liu, Tao Wang, Xiangtao Zhang, Yijie Zhong, Jinwei Chen, Le Zhang, Bo Li
ICSSPulse: A Modular LLM-Assisted Platform for Industrial Control System Penetration Testing
It is well established that industrial control systems comprise the operational backbone of modern critical infrastructures, yet their increasing connectivity exposes them to cyber threats that are...
Michail Takaronis, Athanasia Kollarou, Vyron Kampourakis, Vasileios Gkioulos, Sokratis Katsikas
TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence
The deployment of Large Language Models (LLMs) for real-time intelligence on edge devices is rapidly growing. However, conventional hardware architectures face a fundamental memory wall challenge, ...
Hongyi Guan, Yijia Zhang, Wenqiang Wang, Yizhao Gao, Shijie Cao, Chen Zhang, Ningyi Xu
Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training
Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging. We present La...
Guanbin Xu, ZhenGuo Xu, Yuzhe Li, Youhui Bai, Ping Gong, Chaoyi Ruan, Cheng Li
CARE: An Explainable Computational Framework for Assessing Client-Perceived Therapeutic Alliance Using Large Language Models
Client perceptions of the therapeutic alliance are critical for counseling effectiveness. Accurately capturing these perceptions remains challenging, as traditional post-session questionnaires are ...
Anqi Li, Chenxiao Wang, Yu Lu, Renjun Xu, Lizhi Ma, Zhenzhong Lan