Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

Prompt design is a primary control interface for large language models (LLMs), yet standard evaluations largely reduce performance to answer correctness, obscuring why a prompt succeeds or fails an...

Minki Hong, Eunsoo Lee, Sohyun Park, Jihie Kim

2603.10477 2026-03-11
AI LLM

Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs

The alignment of large language models (LLMs) has progressed substantially in single-agent settings through paradigms such as RLHF and Constitutional AI, with recent work exploring scalable alterna...

Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi

2603.10476 2026-03-11
AI LLM

Aligning Large Language Models with Searcher Preferences

The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set i...

Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong

2603.10473 2026-03-11
AI LLM

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-...

Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, S...

2603.10468 2026-03-11
AI LLM

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

This study proposes a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) ensemble framework that leverages diverse temporal input resolutions to mitigate error accumulation and improv...

Jihoon Kim, Heejung Youn

2603.10453 2026-03-11
AI LLM

Machinagogy: Experiments in Staging Teaching Dramas with LLMs

This paper describes an AI tutoring system built upon two psycho-social theoretic constructs: Hegelian recognition and Freudian psychodynamics. Two related interventions are proposed: recognition-e...

Liam Magee

2603.10450 2026-03-11
AI LLM

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

Machine unlearning aims to remove specific outputs from trained models, often at the concept level, such as forgetting all occurrences of a particular celebrity or filtering content via text prompt...

Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun

2603.10445 2026-03-11
AI LLM

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Large language models trained on natural language exhibit pronounced anisotropy: a small number of directions concentrate disproportionate energy, while the remaining dimensions form a broad semant...

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zha...

2603.10444 2026-03-11
TESTING

3D Spectrum Awareness for Radio Dynamic Zones Using Kriging and Matrix Completion

Radio Dynamic Zones (RDZs) are geographically defined areas specifically allocated for testing new wireless technologies. It is essential to safeguard the regular spectrum users outside the zones f...

Mushfiqur Rahman, Sung Joon Maeng, Ismail Guvenc, Chau-Wai Wong

2603.10443 2026-03-11
TESTING

CSST-PSFNet: A Point Spread Function Reconstruction Model for the CSST Based on Deep Learning

This paper presents CSST-PSFNet, a deep learning method for high-fidelity point spread function (PSF) reconstruction developed for the Chinese Space Station Survey Telescope (CSST). The model integ...

Peipei Wang, Peng Wei, Chao Liu, Rui Wang, Feng Wang, Xin Zhang

2603.10424 2026-03-11
AI LLM

World2Act: Latent Action Post-Training via Skill-Compositional World Models

World Models (WMs) have emerged as a promising approach for post-training Vision-Language-Action (VLA) policies to improve robustness and generalization under environmental changes. However, most W...

An Dinh Vuong, Tuan Van Vo, Abdullah Sohail, Haoran Ding, Liang Ma, Xiaodan Liang, Anqing Duan, I...

2603.10422 2026-03-11
AI LLM

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

We present FireRedASR2S, a state-of-the-art industrial-grade all-in-one automatic speech recognition (ASR) system. It integrates four modules in a unified pipeline: ASR, Voice Activity Detection (V...

Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu

2603.10420 2026-03-11
AI LLM

Designing Service Systems from Textual Evidence

Designing service systems requires selecting among alternative configurations -- choosing the best chatbot variant, the optimal routing policy, or the most effective quality control procedure. In m...

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi

2603.10400 2026-03-11
AI LLM

Verbalizing LLM's Higher-order Uncertainty via Imprecise Probabilities

Despite the growing demand for eliciting uncertainty from large language models (LLMs), empirical evidence suggests that LLM behavior is not always adequately captured by the elicitation techniques...

Anita Yang, Krikamol Muandet, Michele Caprio, Siu Lun Chau, Masaki Adachi

2603.10396 2026-03-11
AI LLM

Don't Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw

Code agents powered by large language models can execute shell commands on behalf of users, introducing severe security vulnerabilities. This paper presents a two-phase security analysis of the Ope...

Zhengyang Shan, Jiayun Xin, Yue Zhang, Minghui Xu

2603.10387 2026-03-11
AI LLM

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretical...

Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu

2603.10384 2026-03-11
AI LLM

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over...

Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz

2603.10377 2026-03-11
AI LLM

Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas

Emerging experimental evidence shows that writing with AI assistance can change both the views people express in writing and the opinions they hold afterwards. Yet, we lack substantive understandin...

Advait Bhat, Marianne Aubin Le Quéré, Mor Naaman, Maurice Jakesch

2603.10374 2026-03-11
AI LLM

Speech Codec Probing from Semantic and Phonetic Perspectives

Speech tokenizers are essential for connecting speech to large language models (LLMs) in multimodal systems. These tokenizers are expected to preserve both semantic and acoustic information for dow...

Xuan Shi, Chang Zeng, Tiantian Feng, Shih-Heng Wang, Jianbo Ma, Shrikanth Narayanan

2603.10371 2026-03-11
AI LLM

Dynamic Knowledge Fusion for Multi-Domain Dialogue State Tracking

The performance of task-oriented dialogue models is strongly tied to how well they track dialogue states, which records and updates user information across multi-turn interactions. However, current...

Haoxiang Su, Ruiyu Fang, Liting Jiang, Xiaomeng Huang, Shuangyong Song

2603.10367 2026-03-11