Papers
Research papers from arXiv and related sources
Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and Study
During training, Large Language Models (LLMs) learn social regularities that can lead to gender bias in downstream applications. Most mitigation efforts focus on reducing bias in generated outputs,...
Nour Bouchouchi, Thiabult Laugel, Xavier Renard, Christophe Marsala, Marie-Jeanne Lesot, Marcin D...
The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
RLHF-aligned language models exhibit response homogenization: on TruthfulQA (n=790), 40-79% of questions produce a single semantic cluster across 10 i.i.d. samples. On affected questions, sampling-...
Mingyi Liu
How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian
Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not ...
Teodora Răgman, Adrian Bogdan Stânea, Horia Cucu, Adriana Stan
Granular Ball Guided Stable Latent Domain Discovery for Domain-General Crowd Counting
Single-source domain generalization for crowd counting remains highly challenging because a single labeled source domain often contains heterogeneous latent domains, while test data may exhibit sev...
Fan Chen, Shuyin Xia, Yi Wang, Xinbo Gao
Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~...
Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongt...
Predicting Grain Growth Evolution Under Complex Thermal Profiles with Deep Learning through Thermal Descriptor Modulation
Predicting microstructure evolution during thermomechanical treatment is essential for determining the final mechanical properties of a material, yet conventional simulations based on Partial Diffe...
Pungponhavoan Tep, Marc Bernacki
LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation
Diffusion models have demonstrated high-quality performance in conditional text-to-image generation, particularly with structural cues such as edges, layouts, and depth. However, lighting condition...
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Ko Watanabe, Riku Takahashi, Andreas Dengel
LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale
Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles ...
Muhammed Saeed, Simon Razniewski
When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm
Recently, multimodal large language models (MLLMs) have emerged as a unified paradigm for language and image generation. Compared with diffusion models, MLLMs possess a much stronger capability for...
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin, Chao Shen, Michael Backes, Yun Shen, Yang Zhang
PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation
We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-a...
Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou
ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing
Knowledge Tracing (KT) is a critical technique for modeling student knowledge to support personalized learning. However, most KT systems focus on binary correctness prediction and cannot diagnose t...
Yu-Chen Kang, Yu-Chien Tang, An-Zi Yen
Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding
Current prompting paradigms for large language models (LLMs), including Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT), follow linear or tree-structured reasoning paths that lack persistent memo...
Florian Odi Stummer
SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tas...
Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu
Hierarchical Spatial-Temporal Graph-Enhanced Model for Map-Matching
The integration of GNSS data into portable devices has led to the generation of vast amounts of trajectory data, which is crucial for applications such as map-matching. To tackle the limitations of...
Anjun Gao, Zhenglin Wan, Pingfu Chao, Shunyu Yao
FinToolSyn: A forward synthesis Framework for Financial Tool-Use Dialogue Data with Dynamic Tool Retrieval
Tool-use capabilities are vital for Large Language Models (LLMs) in finance, a domain characterized by massive investment targets and data-intensive inquiries. However, existing data synthesis meth...
Caishuang Huang, Yang Qiao, Rongyu Zhang, Junjie Ye, Pu Lu, Wenxi Wu, Meng Zhou, Xiku Du, Tao Gui...
Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence
Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image gen...
Sunwhi Kim, Sunyul Kim
Minimal Sufficient Representations for Self-interpretable Deep Neural Networks
Deep neural networks (DNNs) achieve remarkable predictive performance but remain difficult to interpret, largely due to overparameterization that obscures the minimal structure required for interpr...
Zhiyao Tan, Liu Li, Huazhen Lin
From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs
Contextual automatic speech recognition (ASR) with Speech-LLMs is typically trained with oracle conversation history, but relies on error-prone history at inference, causing a train-test mismatch i...
Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang, Haihua Xu, Wei Shi
Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
Open-Vocabulary Temporal Action Detection (OV-TAD) aims to classify and localize action segments in untrimmed videos for unseen categories. Previous methods rely solely on global alignment between ...
Sa Zhu, Wanqian Zhang, Lin Wang, Xiaohua Chen, Chenxu Cui, Jinchao Zhang, Bo Li
Blind Quality Enhancement for G-PCC Compressed Dynamic Point Clouds
Point cloud compression often introduces noticeable reconstruction artifacts, which makes quality enhancement necessary. Existing approaches typically assume prior knowledge of the distortion level...
Tian Guo, Hui Yuan, Chang Sun, Wei Zhang, Raouf Hamzaoui, Sam Kwong