Papers
Research papers from arXiv and related sources
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses
Prompt design is a primary control interface for large language models (LLMs), yet standard evaluations largely reduce performance to answer correctness, obscuring why a prompt succeeds or fails an...
Minki Hong, Eunsoo Lee, Sohyun Park, Jihie Kim
Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs
The alignment of large language models (LLMs) has progressed substantially in single-agent settings through paradigms such as RLHF and Constitutional AI, with recent work exploring scalable alterna...
Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi
Aligning Large Language Models with Searcher Preferences
The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set i...
Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-...
Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, S...
Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble
This study proposes a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) ensemble framework that leverages diverse temporal input resolutions to mitigate error accumulation and improv...
Jihoon Kim, Heejung Youn
Machinagogy: Experiments in Staging Teaching Dramas with LLMs
This paper describes an AI tutoring system built upon two psycho-social theoretic constructs: Hegelian recognition and Freudian psychodynamics. Two related interventions are proposed: recognition-e...
Liam Magee
Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models
Machine unlearning aims to remove specific outputs from trained models, often at the concept level, such as forgetting all occurrences of a particular celebrity or filtering content via text prompt...
Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun
The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
Large language models trained on natural language exhibit pronounced anisotropy: a small number of directions concentrate disproportionate energy, while the remaining dimensions form a broad semant...
Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zha...
3D Spectrum Awareness for Radio Dynamic Zones Using Kriging and Matrix Completion
Radio Dynamic Zones (RDZs) are geographically defined areas specifically allocated for testing new wireless technologies. It is essential to safeguard the regular spectrum users outside the zones f...
Mushfiqur Rahman, Sung Joon Maeng, Ismail Guvenc, Chau-Wai Wong
CSST-PSFNet: A Point Spread Function Reconstruction Model for the CSST Based on Deep Learning
This paper presents CSST-PSFNet, a deep learning method for high-fidelity point spread function (PSF) reconstruction developed for the Chinese Space Station Survey Telescope (CSST). The model integ...
Peipei Wang, Peng Wei, Chao Liu, Rui Wang, Feng Wang, Xin Zhang
World2Act: Latent Action Post-Training via Skill-Compositional World Models
World Models (WMs) have emerged as a promising approach for post-training Vision-Language-Action (VLA) policies to improve robustness and generalization under environmental changes. However, most W...
An Dinh Vuong, Tuan Van Vo, Abdullah Sohail, Haoran Ding, Liang Ma, Xiaodan Liang, Anqing Duan, I...
FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System
We present FireRedASR2S, a state-of-the-art industrial-grade all-in-one automatic speech recognition (ASR) system. It integrates four modules in a unified pipeline: ASR, Voice Activity Detection (V...
Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu
Designing Service Systems from Textual Evidence
Designing service systems requires selecting among alternative configurations -- choosing the best chatbot variant, the optimal routing policy, or the most effective quality control procedure. In m...
Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi
Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities
Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques...
Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi
Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw
Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the Ope...
Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu
Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability
Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretical...
Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu
Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning
Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over...
Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz
Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas
Emerging experimental evidence shows that writing with AI assistance can change both the views people express in writing and the opinions they hold afterwards. Yet, we lack substantive understandin...
Advait Bhat, Marianne Aubin Le Quéré, Mor Naaman, Maurice Jakesch
Speech Codec Probing from Semantic and Phonetic Perspectives
Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for dow...
Xuan Shi, Chang Zeng, Tiantian Feng, Shih-Heng Wang, Jianbo Ma, Shrikanth Narayanan
Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking
The performance of task-oriented dialogue models is strongly tied to how well they track dialogue states, which records and updates user information across multi-turn interactions. However, current...
Haoxiang Su, Ruiyu Fang, Liting Jiang, Xiaomeng Huang, Shuangyong Song