Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Alignment Reduces Expressed but Not Encoded Gender Bias: A Unified Framework and Study

During training, Large Language Models (LLMs) learn social regularities that can lead to gender bias in downstream applications. Most mitigation efforts focus on reducing bias in generated outputs,...

Nour Bouchouchi, Thiabult Laugel, Xavier Renard, Christophe Marsala, Marie-Jeanne Lesot, Marcin D...

2603.24125 2026-03-25
AI LLM

The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation

RLHF-aligned language models exhibit response homogenization: on TruthfulQA (n=790), 40-79% of questions produce a single semantic cluster across 10 i.i.d. samples. On affected questions, sampling-...

Mingyi Liu

2603.24124 2026-03-25
TESTING

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian

Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not ...

Teodora Răgman, Adrian Bogdan Stânea, Horia Cucu, Adriana Stan

2603.24116 2026-03-25
TESTING

Granular Ball Guided Stable Latent Domain Discovery for Domain-General Crowd Counting

Single-source domain generalization for crowd counting remains highly challenging because a single labeled source domain often contains heterogeneous latent domains, while test data may exhibit sev...

Fan Chen, Shuyin Xia, Yi Wang, Xinbo Gao

2603.24106 2026-03-25
AI LLM

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~...

Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongt...

2603.24093 2026-03-25
TESTING

Predicting Grain Growth Evolution Under Complex Thermal Profiles with Deep Learning through Thermal Descriptor Modulation

Predicting microstructure evolution during thermomechanical treatment is essential for determining the final mechanical properties of a material, yet conventional simulations based on Partial Diffe...

Pungponhavoan Tep, Marc Bernacki

2603.24090 2026-03-25
AI LLM

LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation

Diffusion models have demonstrated high-quality performance in conditional text-to-image generation, particularly with structural cues such as edges, layouts, and depth. However, lighting condition...

Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Ko Watanabe, Riku Takahashi, Andreas Dengel

2603.24086 2026-03-25
AI LLM

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Benchmarks such as MMLU suggest flagship language models approach factuality saturation, with scores above 90\%. We show this picture is incomplete. \emph{LLMpedia} generates encyclopedic articles ...

Muhammed Saeed, Simon Razniewski

2603.24080 2026-03-25
AI LLM

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Recently, multimodal large language models (MLLMs) have emerged as a unified paradigm for language and image generation. Compared with diffusion models, MLLMs possess a much stronger capability for...

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin, Chao Shen, Michael Backes, Yun Shen, Yang Zhang

2603.24079 2026-03-25
AI LLM

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-a...

Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou

2603.24078 2026-03-25
AI LLM

ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing

Knowledge Tracing (KT) is a critical technique for modeling student knowledge to support personalized learning. However, most KT systems focus on binary correctness prediction and cannot diagnose t...

Yu-Chen Kang, Yu-Chien Tang, An-Zi Yen

2603.24073 2026-03-25
AI LLM

Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding

Current prompting paradigms for large language models (LLMs), including Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT), follow linear or tree-structured reasoning paths that lack persistent memo...

Florian Odi Stummer

2603.24065 2026-03-25
AI LLM

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tas...

Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

2603.24060 2026-03-25
TESTING

Hierarchical Spatial-Temporal Graph-Enhanced Model for Map-Matching

The integration of GNSS data into portable devices has led to the generation of vast amounts of trajectory data, which is crucial for applications such as map-matching. To tackle the limitations of...

Anjun Gao, Zhenglin Wan, Pingfu Chao, Shunyu Yao

2603.24054 2026-03-25
AI LLM

FinToolSyn: A forward synthesis Framework for Financial Tool-Use Dialogue Data with Dynamic Tool Retrieval

Tool-use capabilities are vital for Large Language Models (LLMs) in finance, a domain characterized by massive investment targets and data-intensive inquiries. However, existing data synthesis meth...

Caishuang Huang, Yang Qiao, Rongyu Zhang, Junjie Ye, Pu Lu, Wenxi Wu, Meng Zhou, Xiku Du, Tao Gui...

2603.24051 2026-03-25
AI LLM

Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence

Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image gen...

Sunwhi Kim, Sunyul Kim

2603.24048 2026-03-25
TESTING

Minimal Sufficient Representations for Self-interpretable Deep Neural Networks

Deep neural networks (DNNs) achieve remarkable predictive performance but remain difficult to interpret, largely due to overparameterization that obscures the minimal structure required for interpr...

Zhiyao Tan, Liu Li, Huazhen Lin

2603.24041 2026-03-25
AI LLM

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Contextual automatic speech recognition (ASR) with Speech-LLMs is typically trained with oracle conversation history, but relies on error-prone history at inference, causing a train-test mismatch i...

Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang, Haihua Xu, Wei Shi

2603.24034 2026-03-25
AI LLM

Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection

Open-Vocabulary Temporal Action Detection (OV-TAD) aims to classify and localize action segments in untrimmed videos for unseen categories. Previous methods rely solely on global alignment between ...

Sa Zhu, Wanqian Zhang, Lin Wang, Xiaohua Chen, Chenxu Cui, Jinchao Zhang, Bo Li

2603.24030 2026-03-25
TESTING

Blind Quality Enhancement for G-PCC Compressed Dynamic Point Clouds

Point cloud compression often introduces noticeable reconstruction artifacts, which makes quality enhancement necessary. Existing approaches typically assume prior knowledge of the distortion level...

Tian Guo, Hui Yuan, Chang Sun, Wei Zhang, Raouf Hamzaoui, Sam Kwong

2603.24026 2026-03-25