Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents

Large language model (LLM) agents deployed in unknown environments must learn task structure at test time, but current approaches require thousands of interactions to form useful hypotheses. We pre...

Mohsen Arjmandi

2603.17683 2026-03-18
AI LLM

WeatherReasonSeg: A Benchmark for Weather-Aware Reasoning Segmentation in Visual Language Models

Existing vision-language models (VLMs) have demonstrated impressive performance in reasoning-based segmentation. However, current benchmarks are primarily constructed from high-quality images captu...

Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, Shunli Zhang

2603.17680 2026-03-18
AI LLM

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

LLM agents are increasingly relevant to research domains such as vulnerability discovery. Yet, the strongest systems remain closed and cloud-only, making them resource-intensive, difficult to repro...

Philipp Normann, Andreas Happe, Jürgen Cito, Daniel Arp

2603.17673 2026-03-18
AI LLM

AgentVLN: Towards Agentic Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires an embodied agent to ground complex natural-language instructions into long-horizon navigation in unseen environments. While Vision-Language Models (VL...

Zihao Xin, Wentong Li, Yixuan Jiang, Ziyuan Huang, Bin Wang, Piji Li, Jianke Zhu, Jie Qin, Shengj...

2603.17670 2026-03-18
AI LLM

Halo: Domain-Aware Query Optimization for Long-Context Question Answering

Long-context question answering (QA) over lengthy documents is critical for applications such as financial analysis, legal review, and scientific research. Current approaches, such as processing en...

Pramod Chunduri, Francisco Romero, Ali Payani, Kexin Rong, Joy Arulraj

2603.17668 2026-03-18
AI LLM

From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering

The advent of Large Language Models (LLMs) represents a turning point in the theoretical foundations of Information Systems Engineering. Beyond their technical significance, LLMs challenge the onto...

José Palazzo Moreira de Oliveira

2603.17659 2026-03-18
AI LLM

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interact...

Dongqiang Gou, Xuming He

2603.17647 2026-03-18
AI LLM

Who's Sense is This? Possibility for Impacting Human Insights in AI-assisted Sensemaking

Sensemaking is an important preceding step for activities like consensus building and decision-making. When groups of people make sense of large amounts of information, their understanding graduall...

Zhuoyi Cheng, Steven Houben

2603.17643 2026-03-18
AI LLM

VeriGrey: Greybox Agent Validation

Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front end, it conducts autonomous decision-making by com...

Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury

2603.17639 2026-03-18
AI LLM

A Multi-Agent System for Building-Age Cohort Mapping to Support Urban Energy Planning

Determining the age distribution of the urban building stock is crucial for sustainable municipal heat planning and upgrade prioritization. However, existing approaches often rely on datasets gathe...

Kundan Thota, Thorsten Schlachter, Veit Hagenmeyer

2603.17626 2026-03-18
AI LLM

Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis

Understanding whether large language models (LLMs) capture structured meaning requires examining how they represent concept relationships. In this work, we study three models of increasing scale: P...

Andor Diera, Ansgar Scherp

2603.17624 2026-03-18
AI LLM

Complementary Reinforcement Learning

Reinforcement Learning (RL) has emerged as a powerful paradigm for training LLM-based agents, yet remains limited by low sample efficiency, stemming not only from sparse outcome feedback but also f...

Dilxat Muhtar, Jiashun Liu, Wei Gao, Weixun Wang, Shaopan Xiong, Ju Huang, Siran Yang, Wenbo Su, ...

2603.17621 2026-03-18
AI LLM

VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation

LLMs have recently demonstrated strong capabilities in automatic RTL code generation, achieving high syntactic and functional correctness. However, most methods focus on functional correctness whil...

Yaoxiang Wang, Qi Shi, ShangZhan Li, Qingguo Hu, Xinyu Yin, Bo Guo, Xu Han, Maosong Sun, Jinsong Su

2603.17613 2026-03-18
AI LLM

Modeling Changing Scientific Concepts with Complex Networks: A Case Study on the Chemical Revolution

While context embeddings produced by LLMs can be used to estimate conceptual change, these representations are often not interpretable nor time-aware. Moreover, bias augmentation in historical data...

Sofía Aguilar-Valdez, Stefania Degaetano-Ortlieb

2603.17594 2026-03-18
AI LLM

A Contextual Help Browser Extension to Assist Digital Illiterate Internet Users

This paper describes the design, implementation, and evaluation of a browser extension that provides contextual help to users who hover over technological acronyms and abbreviations on web pages. T...

Christos Koutsiaris

2603.17592 2026-03-18
AI LLM

From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation

Large language models (LLMs) are currently applied to scientific paper evaluation by assigning an absolute score to each paper independently. However, since score scales vary across conferences, ti...

Pujun Zheng, Jiacheng Yao, Jinquan Zheng, Chenyang Gu, Guoxiu He, Jiawei Liu, Yong Huang, Tianrui...

2603.17588 2026-03-18
AI LLM

Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

Large Language Models (LLMs) have demonstrated strong capabilities in biomedical question answering, yet their tendency to generate plausible but unverified claims poses serious risks in clinical s...

Soumya Ranjan Sahoo, Gagan N., Sanand Sasidharan, Divya Bharti

2603.17580 2026-03-18
AI LLM

LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely...

Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Pé...

2603.17576 2026-03-18
AI LLM

KA2L: A Knowledge-Aware Active Learning Framework for LLMs

Fine-tuning large language models (LLMs) with high-quality knowledge has been shown to enhance their performance effectively. However, there is a paucity of research on the depth of domain-specific...

Haoxuan Yin, Bojian Liu, Chen Tang, Yangfan Wang, Lian Yan, Jingchi Jiang

2603.17566 2026-03-18
AI LLM

In Trust We Survive: Emergent Trust Learning

We introduce Emergent Trust Learning (ETL), a lightweight, trust-based control algorithm that can be plugged into existing AI agents. It enables these to reach cooperation in competitive game envir...

Qianpu Chen, Giulio Barbero, Mike Preuss, Derya Soydaner

2603.17564 2026-03-18