Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image gen...

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan...

2603.23500 2026-03-24
AI LLM

Failure of contextual invariance in gender inference with large language models

Standard evaluation practices assume that large language model (LLM) outputs are stable under contextually equivalent formulations of a task. Here, we test this assumption in the setting of gender ...

Sagar Kumar, Ariel Flint, Luca Maria Aiello, Andrea Baronchelli

2603.23485 2026-03-24
AI LLM

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascade...

Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo

2603.23483 2026-03-24
AI LLM

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

Requirements engineering is a vital, yet labor-intensive, stage in the software development process. This article introduces ReqFusion: an AI-enhanced system that automates the extraction, classifi...

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun

2603.23482 2026-03-24
AI LLM

Evidence of political bias in search engines and language models before major elections

Search engines (SEs) and large language models (LLMs) are central to political information access, yet their algorithmic decisions and potential underlying biases remain underexplored. We developed...

Íris Damião, Paulo Almeida, João Franco, Nuno Santos, Pedro C. Magalhães, Joana Gonçalves-Sá

2603.23474 2026-03-24
AI LLM

Regulating AI Agents

AI agents -- systems that can independently take actions to pursue complex goals with only limited human oversight -- have entered the mainstream. These systems are now being widely used to produce...

Kathrin Gardhouse, Amin Oueslati, Noam Kolt

2603.23471 2026-03-24
AI LLM

ConceptCoder: Improve Code Reasoning via Concept Learning

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce Conc...

Md Mahbubur Rahman, Hengbo Tong, Wei Le

2603.23470 2026-03-24
AI LLM

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

AI-driven cybersecurity systems often fail under cross-environment deployment due to fragmented, event-centric telemetry representations. We introduce the Canonical Security Telemetry Substrate (CS...

Abdul Rahman

2603.23459 2026-03-24
AI LLM

DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection

Multi-Modal LLMs (MLLMs) demonstrate strong visual grounding capabilities on popular object detection benchmarks like OdinW-13 and RefCOCO. However, state-of-the-art models still struggle to genera...

Gautam Rajendrakumar Gare, Neehar Peri, Matvei Popov, Shruti Jain, John Galeotti, Deva Ramanan

2603.23455 2026-03-24
AI LLM

Code Review Agent Benchmark

Software engineering agents have shown significant promise in writing code. As AI agents permeate code writing, and generate huge volumes of code automatically -- the matter of code quality comes f...

Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf, Haifeng Ruan, Ridwan Shariffdeen, Abhik Roychoud...

2603.23448 2026-03-24
AI LLM

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

While multi-modality large language models excel in object-centric or indoor scenarios, scaling them to 3D city-scale environments remains a formidable challenge. To bridge this gap, we propose 3DC...

Yiping Chen, Jinpeng Li, Wenyu Ke, Yang Luo, Jie Ouyang, Zhongjie He, Li Liu, Hongchao Fan, Hao Wu

2603.23447 2026-03-24
AI LLM

Evaluating LLM-Based Test Generation Under Software Evolution

Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply rep...

Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar

2603.23443 2026-03-24
AI LLM

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adapta...

Connor Mclaughlin, Nigel Lee, Lili Su

2603.23436 2026-03-24
AI LLM

Mecha-nudges for Machines

Nudges are subtle changes to the way choices are presented to human decision-makers (e.g., opt-in vs. opt-out by default) that shift behavior without restricting options or changing incentives. As ...

Giulio Frey, Kawin Ethayarajh

2603.23433 2026-03-24
AI LLM

Bilevel Autoresearch: Meta-Autoresearching Itself

If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every ...

Yaonan Qu, Meng Lu

2603.23420 2026-03-24
AI LLM

Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback

Human decision-making is strongly influenced by cognitive biases, particularly under conditions of uncertainty and risk. While prior work has examined bias in single-step decisions with immediate o...

Teerthaa Parakh, Karen M. Feigh

2603.23419 2026-03-24
AI LLM

Integrating GenAI in Filmmaking: From Co-Creativity to Distributed Creativity

The integration of Generative AI (GenAI) into audio-visual production is often presented as a radical break from past traditions. However, through a sociomaterial and historical lens, this paper ar...

Pierluigi Masai, Lorenzo Carta, Mateusz Miroslaw Lis

2603.23415 2026-03-24
AI LLM

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation....

Yiqi Zhang, Huiqiang Jiang, Xufang Luo, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li,...

2603.23414 2026-03-24
AI LLM

Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations ...

Hanzhong Zhang, Siyang Song, Jindong Wang

2603.23406 2026-03-24
AI LLM

Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning

Existing Multimodal Large Language Models (MLLMs) struggle with 3D spatial reasoning, as they fail to construct structured abstractions of the 3D environment depicted in video inputs. To bridge thi...

Jiacheng Hua, Yishu Yin, Yuhang Wu, Tai Wang, Yifei Huang, Miao Liu

2603.23404 2026-03-24