Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation

Large Vision-Language Models (LVLMs) frequently hallucinate, limiting their safe deployment in real-world applications. Existing LLM self-evaluation methods rely on a model's ability to estimate th...

Seongheon Park, Changdae Oh, Hyeong Kyu Choi, Xuefeng Du, Sharon Li

2602.21054 2026-02-24
AI LLM

PaperTrail: A Claim-Evidence Interface for Grounding Provenance in LLM-based Scholarly Q&A

Large language models (LLMs) are increasingly used in scholarly question-answering (QA) systems to help researchers synthesize vast amounts of literature. However, these systems often produce subtl...

Anna Martin-Boyle, Cara A. C. Leckey, Martha C. Brown, Harmanpreet Kaur

2602.21045 2026-02-24
AI LLM

LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where success is defined by producing a single correct proof. However, many real-world reasoning proble...

Yanrui Wu, Lingling Zhang, Xinyu Zhang, Jiayu Chang, Pengyu Li, Xu Jiang, Jingtao Hu, Jun Liu

2602.21044 2026-02-24
TESTING

Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise

Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classif...

Rafael Hanashiro, Abhishek Shetty, Patrick Jaillet

2602.21039 2026-02-24
TESTING

Empirically Calibrated Conditional Independence Tests

Conditional independence tests (CIT) are widely used for causal discovery and feature selection. Even with false discovery rate (FDR) control procedures, they often fail to provide frequentist guar...

Milleno Pan, Antoine de Mathelin, Wesley Tansey

2602.21036 2026-02-24
AI LLM

From Perception to Action: An Interactive Benchmark for Vision Reasoning

Understanding the physical structure is essential for real-world applications such as embodied agents, interactive design, and long-horizon manipulation. Yet, prevailing Vision-Language Model (VLM)...

Yuhao Wu, Maojia Song, Yihuai Lan, Lei Wang, Zhiqiang Hu, Yao Xiao, Heng Zhou, Weihua Zheng, Dyla...

2602.21015 2026-02-24
AI LLM

International AI Safety Report 2026

The International AI Safety Report 2026 synthesises the current scientific evidence on the capabilities, emerging risks, and safety of general-purpose AI systems. The report series was mandated by ...

Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Malcolm Murray,...

2602.21012 2026-02-24
TESTING

HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders

Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints. Whi...

Kun Yuan, Junyu Bi, Daixuan Cheng, Changfa Wu, Shuwen Xiao, Binbin Cao, Jian Wu, Yuning Jiang

2602.21009 2026-02-24
TESTING

HINORA II: Testing the Existence of the Council of Giants in ΛCDM simulations

The discovery of the galaxy ring known as the Council of Giants (CoG) highlights the need to explain such structures in the Local Universe. In the first paper of this series we presented HINORA - a...

Edward Olex, Alexander Knebe, Noam I. Libeskind, Stefan Gottlöber, Dmitry I. Makarov

2602.21008 2026-02-24
TESTING

Constraints on dynamically-formed massive black holes in Little Red Dots from X-ray non-detections

The existence of massive, compact galaxies (Little Red Dots, LRDs) at $z \sim 2$ challenges early structure formation models, suggesting rapid stellar and black hole (BH) assembly. While LRDs are e...

M. Liempi, D. R. G. Schleicher, M. A. Latif, R. Schneider, F. Flammini Dotti, A. Escala, M. C. Ve...

2602.21002 2026-02-24
AI LLM

VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models

Image-to-Video (I2V) generation models, which condition video generation on reference images, have shown emerging visual instruction-following capability, allowing certain visual cues in reference ...

Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You

2602.20999 2026-02-24
TESTING

Characterization-free classification and identification of the environment between two quantum players

Classifying the causal structure of quantum channels is essential for verifying quantum networks and certifying quantum resources. We introduce a characterization-free protocol enabling two isolate...

Masahito Hayashi, Longyang Cao, Baichu Yu, Yuan-Yuan Zhao

2602.20997 2026-02-24
AI LLM

Generative Pseudo-Labeling for Pre-Ranking with LLMs

Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discre...

Junyu Bi, Xinting Niu, Daixuan Cheng, Kun Yuan, Tao Wang, Binbin Cao, Jian Wu, Yuning Jiang

2602.20995 2026-02-24
TESTING

Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models

Scaling multimodal alignment between video and audio is challenging, particularly due to limited data and the mismatch between text descriptions and frame-level video information. In this work, we ...

Christian Simon, MAsato Ishii, Wei-Yao Wang, Koichi Saito, Akio Hayakawa, Dongseok Shim, Zhi Zhon...

2602.20981 2026-02-24
AI LLM

Toward an Agentic Infused Software Ecosystem

Fully leveraging the capabilities of AI agents in software development requires a rethinking of the software ecosystem itself. To this end, this paper outlines the creation of an Agentic Infused So...

Mark Marron

2602.20979 2026-02-24
AI LLM

Evaluating Proactive Risk Awareness of Large Language Models

As large language models (LLMs) are increasingly embedded in everyday decision-making, their safety responsibilities extend beyond reacting to explicit harmful intent toward anticipating unintended...

Xuan Luo, Yubin Chen, Zhiyu Hou, Linpu Yu, Geng Tu, Jing Li, Ruifeng Xu

2602.20976 2026-02-24
AI LLM

Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving

To comprehensively evaluate the mathematical reasoning capabilities of Large Language Models (LLMs), researchers have introduced abundant mathematical reasoning datasets. However, most existing dat...

Yuliang Ji, Fuchen Shen, Jian Wu, Qiujie Xie, Yue Zhang

2602.20973 2026-02-24
AI LLM

Are Multimodal Large Language Models Good Annotators for Image Tagging?

Image tagging, a fundamental vision task, traditionally relies on human-annotated datasets to train multi-label classifiers, which incurs significant labor and costs. While Multimodal Large Languag...

Ming-Kun Xie, Jia-Hao Xiao, Zhiqiang Kou, Zhongnian Li, Gang Niu, Masashi Sugiyama

2602.20972 2026-02-24
TESTING

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for mode...

Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta

2602.20971 2026-02-24
AI LLM

Blackbird Language Matrices: A Framework to Investigate the Linguistic Competence of Language Models

This article describes a novel language task, the Blackbird Language Matrices (BLM) task, inspired by intelligence tests, and illustrates the BLM datasets, their construction and benchmarking, and ...

Paola Merlo, Chunyang Jiang, Giuseppe Samo, Vivi Nastase

2602.20966 2026-02-24