Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Not All Features Are Created Equal: A Mechanistic Study of Vision-Language-Action Models

Vision-Language-Action (VLA) models combine perception, language, and motor control in a single architecture, yet how they translate multimodal inputs into actions remains poorly understood. We app...

Bryce Grant, Xijia Zhao, Peng Wang

2603.19233 2026-03-19
AI LLM

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals ...

Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan, Santu Karmaker, Aritra Dutta

2603.19225 2026-03-19
AI LLM

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly availabl...

Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang

2603.19223 2026-03-19
AI LLM

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical...

Zhuolin Yang, Zihan Liu, Yang Chen, Wenliang Dai, Boxin Wang, Sheng-Chieh Lin, Chankyu Lee, Yangy...

2603.19220 2026-03-19
AI LLM

LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

Recent advancements in omnimodal large language models (OmniLLMs) have significantly improved the comprehension of audio and video inputs. However, current evaluations primarily focus on short audi...

Keda Tao, Yuhua Zheng, Jia Xu, Wenjie Du, Kele Shao, Hesong Wang, Xueyi Chen, Xin Jin, Junhan Zhu...

2603.19217 2026-03-19
AI LLM

$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence

Let $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three specia...

Dimitri Kanevsky, Julian Salazar, Matt Harvey

2603.19215 2026-03-19
AI LLM

Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems

As AI systems increasingly permeate high-stakes decision-making, the terminology regarding human involvement - Human-in-the-Loop (HITL), Human-on-the-Loop (HOTL), and Human Oversight - has become v...

Kevin Baum, Johann Laux

2603.19213 2026-03-19
AI LLM

Tinted Frames: Question Framing Blinds Vision-Language Models

Vision-Language Models (VLMs) have been shown to be blind, often underutilizing their visual inputs even on tasks that require visual reasoning. In this work, we demonstrate that VLMs are selective...

Wan-Cyuan Fan, Jiayun Luo, Declan Kutscher, Leonid Sigal, Ritwik Gupta

2603.19203 2026-03-19
AI LLM

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how ...

Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng...

2603.19195 2026-03-19
AI LLM

Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- su...

Zou Qiang

2603.19182 2026-03-19
AI LLM

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity t...

Edward Lin, Sahil Modi, Siva Kumar Sastry Hari, Qijing Huang, Zhifan Ye, Nestor Qin, Fengzhe Zhou...

2603.19173 2026-03-19
AI LLM

Evaluating Counterfactual Strategic Reasoning in Large Language Models

We evaluate Large Language Models (LLMs) in repeated game-theoretic settings to assess whether strategic performance reflects genuine reasoning or reliance on memorized patterns. We consider two ca...

Dimitrios Georgousis, Maria Lymperaiou, Angeliki Dimitriou, Giorgos Filandrianos, Giorgos Stamou

2603.19167 2026-03-19
AI LLM

cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization

Combinatorial optimization problems arise in logistics, scheduling, and resource allocation, yet existing approaches face a fundamental trade-off among generality, performance, and usability. We pr...

Yuyang Liu

2603.19163 2026-03-19
AI LLM

Adaptive Auxiliary Prompt Blending for Target-Faithful Diffusion Generation

Diffusion-based text-to-image (T2I) models have made remarkable progress in generating photorealistic and semantically rich images. However, when the target concepts lie in low-density regions of t...

Kwanyoung Lee, SeungJu Cha, Yebin Ahn, Hyunwoo Oh, Sungho Koh, Dong-Jin Kim

2603.19158 2026-03-19
AI LLM

ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation

Generating rare compositional concepts in text-to-image synthesis remains a challenge for diffusion models, particularly for attributes that are uncommon in the training data. While recent approach...

Kwanyoung Lee, Hyunwoo Oh, SeungJu Cha, Sungho Koh, Dong-Jin Kim

2603.19157 2026-03-19
AI LLM

UGID: Unified Graph Isomorphism for Debiasing Large Language Models

Large language models (LLMs) exhibit pronounced social biases. Output-level or data-optimization--based debiasing methods cannot fully resolve these biases, and many prior works have shown that bia...

Zikang Ding, Junchi Yao, Junhao Li, Yi Zhang, Wenbo Jiang, Hongbo Liu, Lijie Hu

2603.19144 2026-03-19
AI LLM

Implicit Patterns in LLM-Based Binary Analysis

Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize explora...

Qiang Li, XiangRui Zhang, Haining Wang

2603.19138 2026-03-19
AI LLM

A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference

Recent advancements and widespread adoption of Large Language Models (LLMs) in both industry and academia have catalyzed significant demand for LLM serving. However, traditional cloud services incu...

Yida Zhang, Zhiyong Gao, Shuaibing Yue, Jie Li, Rui Wang

2603.19133 2026-03-19
AI LLM

From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models

Vision-Language-Action (VLA) models have recently enabled embodied agents to perform increasingly complex tasks by jointly reasoning over visual, linguistic, and motor modalities. However, we find ...

Zhuofan Li, Hongkun Yang, Zhenyang Chen, Yangxuan Chen, Yingyan, Lin, Chaojian Li

2603.19131 2026-03-19
AI LLM

On Optimizing Multimodal Jailbreaks for Spoken Language Models

As Spoken Language Models (SLMs) integrate speech and text modalities, they inherit the safety vulnerabilities of their LLM backbone and an expanded attack surface. SLMs have been previously shown ...

Aravind Krishnan, Karolina Stańczak, Dietrich Klakow

2603.19127 2026-03-19