Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

LAD: Learning Advantage Distribution for Reasoning

Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dominant reward signals, while neglectin...

Wendi Li, Sharon Li

2602.20132 2026-02-23
AI LLM

To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Sele...

Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang

2602.20130 2026-02-23
AI LLM

NanoKnow: How to Know What Your Language Model Knows

How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a "black box" -- unknown or inaccessible. The recent release o...

Lingwei Gu, Nour Jedidi, Jimmy Lin

2602.20122 2026-02-23
TESTING

Improving the Power of Bonferroni Adjustments under Joint Normality and Exchangeability

Bonferroni's correction is a popular tool to address multiplicity but is notorious for its low power when tests are dependent. This paper proposes a practical modification of Bonferroni's correctio...

Caleb Hiltunen, Yeonwoo Rho

2602.20118 2026-02-23
AI LLM

Benchmarking Unlearning for Vision Transformers

Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, research into transformer architecture...

Kairan Zhao, Iurie Luca, Peter Triantafillou

2602.20114 2026-02-23
AI LLM

Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration

In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance ...

Hasan Amin, Ming Yin, Rajiv Khanna

2602.20104 2026-02-23
AI LLM

BarrierSteer: LLM Safety via Learning Barrier Steering

Despite the state-of-the-art performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a major obstacle to ...

Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao

2602.20102 2026-02-23
AI LLM

Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine

The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wav...

Soumick Chatterjee

2602.20100 2026-02-23
AI LLM

CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching

As large language models (LLMs) witness increasing deployment in complex, high-stakes decision-making scenarios, it becomes imperative to ground their reasoning in causality rather than spurious co...

Yuzhe Wang, Yaochen Zhu, Jundong Li

2602.20094 2026-02-23
TESTING

ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Sequential recommendation increasingly employs latent multi-step reasoning to enhance test-time computation. Despite empirical gains, existing approaches largely drive intermediate reasoning states...

Kun Yang, Yuxuan Zhu, Yazhe Chen, Siyao Zheng, Bangyang Hong, Kangle Wu, Yabo Ni, Anxiang Zeng, C...

2602.20093 2026-02-23
AI LLM

How Retrieved Context Shapes Internal Representations in RAG

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retrieved context is often non-trivial. In r...

Samuel Yeh, Sharon Li

2602.20091 2026-02-23
AI LLM

Do Large Language Models Understand Data Visualization Principles?

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can genera...

Martin Sinnona, Valentin Bonas, Viviana Siless, Emmanuel Iarussi

2602.20084 2026-02-23
AI LLM

Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

We report on using an agentic coding assistant (Claude Code, powered by Claude Opus 4.6) to mechanize a substantial Rocq correctness proof from scratch, with human guidance but without human proof ...

Zoe Paraskevopoulou

2602.20082 2026-02-23
AI LLM

The Digital Gorilla: Rebalancing Power in the Age of AI

Contemporary artificial intelligence (AI) policy suffers from a basic categorical error. Existing frameworks rely on analogizing AI to inherited technology types -- such as products, platforms, or ...

M. Alejandra Parra-Orlandoni, Roxanne A. Schnyder, Christopher J. Mallet

2602.20080 2026-02-23
TESTING

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise: when agents share a common reward, the actions of all $N$ agents jointly determine each ...

Shan Yang, Yang Liu

2602.20078 2026-02-23
TESTING

Computational Social Choice: Research & Development

Computational social choice (COMSOC) studies principled ways to aggregate conflicting individual preferences into collective decisions. In this paper, we call for an increased effort towards Comput...

Dorothea Baumeister, Ratip Emin Berker, Niclas Boehmer, Sylvain Bouveret, Andreas Darmann, Piotr ...

2602.20074 2026-02-23
AI LLM

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-sho...

Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer

2602.20066 2026-02-23
TESTING

Conservation laws, fluxes, and symmetries: lessons from a perturbative approach for self-organized turbulence

Some turbulent flows self-organize into large-scale structures, rather than breaking up into ever-smaller scales. Underpinning this phenomenon is the existence of two sign-definite quantities which...

Anna Frishman, Sébastien Gomé, Anton Svirsky

2602.20067 2026-02-23
AI LLM

Multilingual Large Language Models do not comprehend all natural languages to equal degrees

Large Language Models (LLMs) play a critical role in how humans access information. While their core use relies on comprehending written requests, our understanding of this ability is currently lim...

Natalia Moskvina, Raquel Montero, Masaya Yoshida, Ferdy Hubers, Paolo Morosi, Walid Irhaymi, Jin ...

2602.20065 2026-02-23
AI LLM

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding conversation. AI agents build such conversations automatical...

Zac Garby, Andrew D. Gordon, David Sands

2602.20064 2026-02-23