Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Good Arguments Against the People Pleasers: How Reasoning Mitigates (Yet Masks) LLM Sycophancy

Alignment techniques often inadvertently induce sycophancy in LLMs. While prior studies studied this behaviour in direct-answer settings, the role of Chain-of-Thought (CoT) reasoning remains under-...

Zhaoxin Feng, Zheng Chen, Jianfei Ma, Yip Tin Po, Emmanuele Chersoni, Bo Li

2603.16643 2026-03-17
AI LLM

When AI Navigates the Fog of War

Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-...

Ming Li, Xirui Li, Tianyi Zhou

2603.16642 2026-03-17
AI LLM

FlowComposer: Composable Flows for Compositional Zero-Shot Learning

Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models...

Zhenqi He, Lin Li, Long Chen

2603.16641 2026-03-17
AI LLM

Why We Need to Destroy the Illusion of Speaking to A Human: Critical Reflections On Ethics at the Front-End for LLMs

Conversation with chatbots based on Large Language Models (LLMs) such as ChatGPT has become one of the major forms of interaction with Artificial Intelligence (AI) in everyday life. What makes this...

Sarah Diefenbach, Daniel Ullrich

2603.16633 2026-03-17
AI LLM

CoEmpaTeam: Enhancing Cognitive Empathy using LLM-based Avatars and Dynamic Role Play in Virtual Reality

Cognitive empathy, the ability to understand others' perspectives, is essential for effective communication, reducing biases, and constructive negotiation. However, this skill is declining in a per...

Dehui Kong, Martin Feick, Shi Liu, Alexander Maedche

2603.16614 2026-03-17
AI LLM

Retrieval-Augmented Sketch-Guided 3D Building Generation

In the early design stage of Japanese detached houses, the lack of a unified design representation among clients, sales representatives, and designers leads to design drift and inefficient feedback...

Zhengyang Wang, Nuttapong Rochanavibhata, Yuxiao Ren, Xusheng Du, Ye Zhang, Haoran Xie

2603.16612 2026-03-17
AI LLM

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

Cross-lingual sentence encoders typically cover only a few hundred languages and often trade downstream quality for stronger alignment, limiting their adoption. We introduce OmniSONAR, a new family...

Omnilingual SONAR Team, João Maria Janeiro, Pere-Lluís Huguet Cabot, Ioannis Tsiamas, Yen Meng, ...

2603.16606 2026-03-17
AI LLM

Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLMReward Models

Generative reward models (GRMs) for vision-language models (VLMs) often evaluate outputs via a three-stage pipeline: rubric generation, criterion-based scoring, and a final verdict. However, the in...

Weijie Qiu, Dai Guan, Junxin Wang, Zhihang Li, Yongbo Gai, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang...

2603.16600 2026-03-17
AI LLM

BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization

Microscaling floating-point (MXFP) formats have emerged as a promising standard for deploying Multi-modal Large Language Models (MLLMs) and Large Language Models (LLMs) on modern accelerator archit...

Ji-Fu Li, Manyi Zhang, Xiaobo Xia, Han Bao, Haoli Bai, Zhenhua Dong, Xianzhi Yu

2603.16590 2026-03-17
AI LLM

Runtime Governance for AI Agents: Policies on Paths

AI agents -- systems that plan, reason, and act using large language models -- produce non-deterministic, path-dependent behavior that cannot be fully governed at design time, where with governed w...

Maurits Kaptein, Vassilis-Javed Khan, Andriy Podstavnychy

2603.16586 2026-03-17
AI LLM

When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective

Although outcome-based reinforcement learning (RL) significantly advances the mathematical reasoning capabilities of Large Language Models (LLMs), its reliance on computationally expensive ground-t...

Zelin Zhang, Fei Cheng, Chenhui Chu

2603.16578 2026-03-17
AI LLM

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of copyrighted content and the generation of offensive ...

Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei, Yunyun Dong, Li Tang, Wei Zhou, Renyang Liu

2603.16576 2026-03-17
AI LLM

Malicious Or Not: Adding Repository Context to Agent Skill Classification

Agent skills extend local AI agents, such as Claude Code or Open Claw, with additional functionality, and their popularity has led to the emergence of dedicated skill marketplaces, similar to app s...

Florian Holzbauer, David Schmidt, Gabriel Gegenhuber, Sebastian Schrittwieser, Johanna Ullrich

2603.16572 2026-03-17
AI LLM

Characterizing Delusional Spirals through Human-LLM Chat Logs

As large language models (LLMs) have proliferated, disturbing anecdotal reports of negative psychological effects, such as delusions, self-harm, and ``AI psychosis,'' have emerged in global media a...

Jared Moore, Ashish Mehta, William Agnew, Jacy Reese Anthis, Ryan Louie, Yifan Mai, Peggy Yin, My...

2603.16567 2026-03-17
AI LLM

VideoMatGen: PBR Materials through Joint Generative Modeling

We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, ...

Jon Hasselgren, Zheng Zeng, Milos Hasan, Jacob Munkberg

2603.16566 2026-03-17
AI LLM

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

Large language models (LLMs) increasingly store user preferences in persistent memory to support personalization across interactions. However, in third-party communication settings governed by soci...

Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, A...

2603.16557 2026-03-17
AI LLM

EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models

Large language models (LLMs) demonstrate strong cognitive intelligence (IQ), yet many real-world interactions also require emotional intelligence (EQ) to produce responses that are both factually r...

Yifei Zhang, Mingyang Li, Henry Gao, Liang Zhao

2603.16553 2026-03-17
AI LLM

CompDiff: Hierarchical Compositional Diffusion for Fair and Zero-Shot Intersectional Medical Image Generation

Generative models are increasingly used to augment medical imaging datasets for fairer AI. Yet a key assumption often goes unexamined: that generators themselves produce equally high-quality images...

Mahmoud Ibrahim, Bart Elen, Chang Sun, Gokhan Ertaylan, Michel Dumontier

2603.16551 2026-03-17
AI LLM

DanceHA: A Multi-Agent Framework for Document-Level Aspect-Based Sentiment Analysis

Aspect-Based Sentiment Intensity Analysis (ABSIA) has garnered increasing attention, though research largely focuses on domain-specific, sentence-level settings. In contrast, document-level ABSIA--...

Lei Wang, Min Huang, Eduard Dragut

2603.16546 2026-03-17
AI LLM

How often do Answers Change? Estimating Recency Requirements in Question Answering

Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-t...

Bhawna Piryani, Zehra Mert, Adam Jatowt

2603.16544 2026-03-17