Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidd...

Alexandre Le Mercier, Thomas Demeester, Chris Develder

2603.12206 2026-03-12
AI LLM

Long-Context Encoder Models for Polish Language Understanding

While decoder-only Large Language Models (LLMs) have recently dominated the NLP landscape, encoder-only architectures remain a cost-effective and parameter-efficient standard for discriminative tas...

Sławomir Dadas, Rafał Poświata, Marek Kozłowski, Małgorzata Grębowiec, Michał Perełkiewicz, Paweł...

2603.12191 2026-03-12
TESTING

Shifted-geodesic approximation for spinning-body gravitational wave fluxes

We present a shifted-geodesic framework for computing gravitational-wave fluxes from spinning test bodies moving on bound orbits of Kerr black holes. The method provides a simple and efficient mean...

Lisa V. Drummond, Scott A. Hughes, Viktor Skoupý, Philip Lynch, Gabriel Andres Piovano

2603.12189 2026-03-12
TESTING

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely sto...

Łukasz Borchmann, Jordy Van Landeghem, Michał Turski, Shreyansh Padarha, Ryan Othniel Kearns, Ada...

2603.12180 2026-03-12
AI LLM

BehaviorVLM: Unified Finetuning-Free Behavioral Understanding with Vision-Language Reasoning

Understanding freely moving animal behavior is central to neuroscience, where pose estimation and behavioral understanding form the foundation for linking neural activity to natural actions. Yet bo...

Jingyang Ke, Weihan Li, Amartya Pradhan, Jeffrey Markowitz, Anqi Wu

2603.12176 2026-03-12
AI LLM

QAQ: Bidirectional Semantic Coherence for Selecting High-Quality Synthetic Code Instructions

Synthetic data has become essential for training code generation models, yet it introduces significant noise and hallucinations that are difficult to detect with current metrics. Existing data sele...

Jiayin Lei, Ming Ma, Yunxi Duan, Chenxi Li, Tianming Yang

2603.12165 2026-03-12
AI LLM

Investigating student perceptions of creativity and generative ai in computational physics

Generative Artificial Intelligence (gen-AI) is rapidly becoming more integrated into today's classrooms in all ranges of education. In higher education, Gen-AI is often seen as a resource for stude...

Pachi Her, Patti Hamerski

2603.12154 2026-03-12
AI LLM

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-w...

Feiyu Duan, Xuanjing Huang, Zhongyu Wei

2603.12152 2026-03-12
AI LLM

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinforcement learning (RL) post-training of large language models (LLMs) remain poorly understood. We ...

Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang,...

2603.12151 2026-03-12
TESTING

Linking Perception, Confidence and Accuracy in MLLMs

Recent advances in Multi-modal Large Language Models (MLLMs) have predominantly focused on enhancing visual perception to improve accuracy. However, a critical question remains unexplored: Do model...

Yuetian Du, Yucheng Wang, Rongyu Zhang, Zhijie Xu, Boyu Yang, Ming Kong, Jie Liu, Qiang Zhu

2603.12149 2026-03-12
TESTING

Automatic Generation of High-Performance RL Environments

Translating complex reinforcement learning (RL) environments into high-performance implementations has traditionally required months of specialized engineering. We present a reusable recipe - a gen...

Seth Karten, Rahul Dev Appapogu, Chi Jin

2603.12145 2026-03-12
TESTING

TopoBench: Benchmarking LLMs on Hard Topological Reasoning

Solving topological grid puzzles requires reasoning over global spatial invariants such as connectivity, loop closure, and region symmetry and remains challenging for even the most powerful large l...

Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresqu...

2603.12133 2026-03-12
AI LLM

Increasing intelligence in AI agents can worsen collective outcomes

When resources are scarce, will a population of AI agents coordinate in harmony, or descend into tribal chaos? Diverse decision-making AI from different developers is entering everyday devices -- f...

Neil F. Johnson

2603.12129 2026-03-12
TESTING

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforwa...

Tae-Eun Song

2603.12123 2026-03-12
TESTING

CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance

We introduce CRAFT hand, a tendon-driven anthropomorphic hand with hybrid hard-soft compliance for contact-rich manipulation. The design is based on a simple idea: contact is not uniform across the...

Leo Lin, Shivansh Patel, Jay Moon, Svetlana Lazebnik, Unnat Jain

2603.12120 2026-03-12
AI LLM

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models...

Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury

2603.12118 2026-03-12
TESTING

SommBench: Assessing Sommelier Expertise of Language Models

With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchm...

William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie D...

2603.12117 2026-03-12
AI LLM

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

Reinforcement learning (RL) with outcome-based rewards has achieved significant success in training large language model (LLM) agents for complex reasoning tasks. However, in active reasoning where...

Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng

2603.12109 2026-03-12
AI LLM

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate ...

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Martínez, Pedro Reviriego

2603.12105 2026-03-12
AI LLM

Human-Centred LLM Privacy Audits: Findings and Frictions

Large language models (LLMs) learn statistical associations from massive training corpora and user interactions, and deployed systems can surface or infer information about individuals. Yet people ...

Dimitri Staufer, Kirsten Morehouse, David Hartmann, Bettina Berendt

2603.12094 2026-03-12