Papers
Research papers from arXiv and related sources
Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents
Large language model (LLM) agents deployed in unknown environments must learn task structure at test time, but current approaches require thousands of interactions to form useful hypotheses. We pre...
Mohsen Arjmandi
WeatherReasonSeg: A Benchmark for Weather-Aware Reasoning Segmentation in Visual Language Models
Existing vision-language models (VLMs) have demonstrated impressive performance in reasoning-based segmentation. However, current benchmarks are primarily constructed from high-quality images captu...
Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, Shunli Zhang
Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards
LLM agents are increasingly relevant to research domains such as vulnerability discovery. Yet, the strongest systems remain closed and cloud-only, making them resource-intensive, difficult to repro...
Philipp Normann, Andreas Happe, Jürgen Cito, Daniel Arp
AgentVLN: Towards Agentic Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an embodied agent to ground complex natural-language instructions into long-horizon navigation in unseen environments. While Vision-Language Models (VL...
Zihao Xin, Wentong Li, Yixuan Jiang, Ziyuan Huang, Bin Wang, Piji Li, Jianke Zhu, Jie Qin, Shengj...
Halo: Domain-Aware Query Optimization for Long-Context Question Answering
Long-context question answering (QA) over lengthy documents is critical for applications such as financial analysis, legal review, and scientific research. Current approaches, such as processing en...
Pramod Chunduri, Francisco Romero, Ali Payani, Kexin Rong, Joy Arulraj
From Symbol to Meaning: Ontological and Philosophical Reflections on Large Language Models in Information Systems Engineering
The advent of Large Language Models (LLMs) represents a turning point in the theoretical foundations of Information Systems Engineering. Beyond their technical significance, LLMs challenge the onto...
José Palazzo Moreira de Oliveira
Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment
Grounding natural language questions to functionally relevant regions in 3D objects -- termed language-driven 3D affordance grounding -- is essential for embodied intelligence and human-AI interact...
Dongqiang Gou, Xuming He
Who's Sense is This? Possibility for Impacting Human Insights in AI-assisted Sensemaking
Sensemaking is an important preceding step for activities like consensus building and decision-making. When groups of people make sense of large amounts of information, their understanding graduall...
Zhuoyi Cheng, Steven Houben
VeriGrey: Greybox Agent Validation
Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front end, it conducts autonomous decision-making by com...
Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury
A Multi-Agent System for Building-Age Cohort Mapping to Support Urban Energy Planning
Determining the age distribution of the urban building stock is crucial for sustainable municipal heat planning and upgrade prioritization. However, existing approaches often rely on datasets gathe...
Kundan Thota, Thorsten Schlachter, Veit Hagenmeyer
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
Understanding whether large language models (LLMs) capture structured meaning requires examining how they represent concept relationships. In this work, we study three models of increasing scale: P...
Andor Diera, Ansgar Scherp
Complementary Reinforcement Learning
Reinforcement Learning (RL) has emerged as a powerful paradigm for training LLM-based agents, yet remains limited by low sample efficiency, stemming not only from sparse outcome feedback but also f...
Dilxat Muhtar, Jiashun Liu, Wei Gao, Weixun Wang, Shaopan Xiong, Ju Huang, Siran Yang, Wenbo Su, ...
VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation
LLMs have recently demonstrated strong capabilities in automatic RTL code generation, achieving high syntactic and functional correctness. However, most methods focus on functional correctness whil...
Yaoxiang Wang, Qi Shi, ShangZhan Li, Qingguo Hu, Xinyu Yin, Bo Guo, Xu Han, Maosong Sun, Jinsong Su
Modeling Changing Scientific Concepts with Complex Networks: A Case Study on the Chemical Revolution
While context embeddings produced by LLMs can be used to estimate conceptual change, these representations are often not interpretable nor time-aware. Moreover, bias augmentation in historical data...
Sofía Aguilar-Valdez, Stefania Degaetano-Ortlieb
A Contextual Help Browser Extension to Assist Digital Illiterate Internet Users
This paper describes the design, implementation, and evaluation of a browser extension that provides contextual help to users who hover over technological acronyms and abbreviations on web pages. T...
Christos Koutsiaris
From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation
Large language models (LLMs) are currently applied to scientific paper evaluation by assigning an absolute score to each paper independently. However, since score scales vary across conferences, ti...
Pujun Zheng, Jiacheng Yao, Jinquan Zheng, Chenyang Gu, Guoxiu He, Jiawei Liu, Yong Huang, Tianrui...
Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA
Large Language Models (LLMs) have demonstrated strong capabilities in biomedical question answering, yet their tendency to generate plausible but unverified claims poses serious risks in clinical s...
Soumya Ranjan Sahoo, Gagan N., Sanand Sasidharan, Divya Bharti
LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation
Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely...
Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Pé...
KA2L: A Knowledge-Aware Active Learning Framework for LLMs
Fine-tuning large language models (LLMs) with high-quality knowledge has been shown to enhance their performance effectively. However, there is a paucity of research on the depth of domain-specific...
Haoxuan Yin, Bojian Liu, Chen Tang, Yangfan Wang, Lian Yan, Jingchi Jiang
In Trust We Survive: Emergent Trust Learning
We introduce Emergent Trust Learning (ETL), a lightweight, trust-based control algorithm that can be plugged into existing AI agents. It enables these to reach cooperation in competitive game envir...
Qianpu Chen, Giulio Barbero, Mike Preuss, Derya Soydaner