Papers
Research papers from arXiv and related sources
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation
Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image gen...
Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan...
Failure of contextual invariance in gender inference with large language models
Standard evaluation practices assume that large language model (LLM) outputs are stable under contextually equivalent formulations of a task. Here, we test this assumption in the setting of gender ...
Sagar Kumar, Ariel Flint, Luca Maria Aiello, Andrea Baronchelli
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascade...
Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo
ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains
Requirements engineering is a vital, yet labor-intensive, stage in the software development process. This article introduces ReqFusion: an AI-enhanced system that automates the extraction, classifi...
Muhammad Khalid, Manuel Oriol, Yilmaz Uygun
Evidence of political bias in search engines and language models before major elections
Search engines (SEs) and large language models (LLMs) are central to political information access, yet their algorithmic decisions and potential underlying biases remain underexplored. We developed...
Íris Damião, Paulo Almeida, João Franco, Nuno Santos, Pedro C. Magalhães, Joana Gonçalves-Sá
Regulating AI Agents
AI agents -- systems that can independently take actions to pursue complex goals with only limited human oversight -- have entered the mainstream. These systems are now being widely used to produce...
Kathrin Gardhouse, Amin Oueslati, Noam Kolt
ConceptCoder: Improve Code Reasoning via Concept Learning
Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce Conc...
Md Mahbubur Rahman, Hengbo Tong, Wei Le
CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection
AI-driven cybersecurity systems often fail under cross-environment deployment due to fragmented, event-centric telemetry representations. We introduce the Canonical Security Telemetry Substrate (CS...
Abdul Rahman
DetPO: In-Context Learning with Multi-Modal LLMs for Few-Shot Object Detection
Multi-Modal LLMs (MLLMs) demonstrate strong visual grounding capabilities on popular object detection benchmarks like OdinW-13 and RefCOCO. However, state-of-the-art models still struggle to genera...
Gautam Rajendrakumar Gare, Neehar Peri, Matvei Popov, Shruti Jain, John Galeotti, Deva Ramanan
Code Review Agent Benchmark
Software engineering agents have shown significant promise in writing code. As AI agents permeate code writing, and generate huge volumes of code automatically -- the matter of code quality comes f...
Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf, Haifeng Ruan, Ridwan Shariffdeen, Abhik Roychoud...
3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding
While multi-modality large language models excel in object-centric or indoor scenarios, scaling them to 3D city-scale environments remains a formidable challenge. To bridge this gap, we propose 3DC...
Yiping Chen, Jinpeng Li, Wenyu Ke, Yang Luo, Jie Ouyang, Zhongjie He, Li Liu, Hongchao Fan, Hao Wu
Evaluating LLM-Based Test Generation Under Software Evolution
Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply rep...
Sabaat Haroon, Mohammad Taha Khan, Muhammad Ali Gulzar
Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning
Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adapta...
Connor Mclaughlin, Nigel Lee, Lili Su
Mecha-nudges for Machines
Nudges are subtle changes to the way choices are presented to human decision-makers (e.g., opt-in vs. opt-out by default) that shift behavior without restricting options or changing incentives. As ...
Giulio Frey, Kawin Ethayarajh
Bilevel Autoresearch: Meta-Autoresearching Itself
If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every ...
Yaonan Qu, Meng Lu
Biased Error Attribution in Multi-Agent Human-AI Systems Under Delayed Feedback
Human decision-making is strongly influenced by cognitive biases, particularly under conditions of uncertainty and risk. While prior work has examined bias in single-step decisions with immediate o...
Teerthaa Parakh, Karen M. Feigh
Integrating GenAI in Filmmaking: From Co-Creativity to Distributed Creativity
The integration of Generative AI (GenAI) into audio-visual production is often presented as a radical break from past traditions. However, through a sociomaterial and historical lens, this paper ar...
Pierluigi Masai, Lorenzo Carta, Mateusz Miroslaw Lis
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation....
Yiqi Zhang, Huiqiang Jiang, Xufang Luo, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li,...
Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies
While large language models simulate social behaviors, their capacity for stable stance formation and identity negotiation during complex interventions remains unclear. To overcome the limitations ...
Hanzhong Zhang, Siyang Song, Jindong Wang
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning
Existing Multimodal Large Language Models (MLLMs) struggle with 3D spatial reasoning, as they fail to construct structured abstractions of the 3D environment depicted in video inputs. To bridge thi...
Jiacheng Hua, Yishu Yin, Yuhang Wu, Tai Wang, Yifei Huang, Miao Liu