Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?

Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow -- triag...

Xiangsen Chen, Xuan Feng, Shuo Chen, Matthieu Maitre, Sudipto Rakshit, Diana Duvieilh, Ashley Pic...

2603.09452 2026-03-10
AI LLM

A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation

Delineating the clinical target volume (CTV) in radiotherapy involves complex margins constrained by tumor location and anatomical barriers. While deep learning models automate this process, their ...

Yoon Jo Kim, Wonyoung Cho, Jongmin Lee, Han Joo Chae, Hyunki Park, Sang Hoon Seo, Noh Jae Myung, ...

2603.09448 2026-03-10
AI LLM

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a land...

Athanasios Davvetas, Michael Papademas, Xenia Ziouvelou, Vangelis Karkaletsis

2603.09435 2026-03-10
AI LLM

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As such, it is crucial that these models remain both morally grounded and knowled...

Saugata Purkayastha, Pranav Kushare, Pragya Paramita Pal, Sukannya Purkayastha

2603.09434 2026-03-10
AI LLM

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health

Large Language Models (LLMs) excel in Natural Language Processing (NLP) tasks, but they often propagate biases embedded in their training data, which is potentially impactful in sensitive domains l...

Trung Hieu Ngo, Adrien Bazoge, Solen Quiniou, Pierre-Antoine Gourraud, Emmanuel Morin

2603.09416 2026-03-10
AI LLM

PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

Document Layout Analysis (DLA) is crucial for document artificial intelligence and has recently received increasing attention, resulting in an influx of large-scale public DLA datasets. Existing wo...

Zirui Zhang, Yaping Zhang, Lu Xiang, Yang Zhao, Feifei Zhai, Yu Zhou, Chengqing Zong

2603.09414 2026-03-10
AI LLM

LLM as a Meta-Judge: Synthetic Data for NLP Evaluation Metric Validation

Validating evaluation metrics for NLG typically relies on expensive and time-consuming human annotations, which predominantly exist only for English datasets. We propose \textit{LLM as a Meta-Judge...

Lukáš Eigler, Jindřich Libovický, David Hurych

2603.09403 2026-03-10
AI LLM

Reward Prediction with Factorized World States

Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inh...

Yijun Shen, Delong Chen, Xianming Hu, Jiaming Mi, Hongbo Zhao, Kai Zhang, Pascale Fung

2603.09400 2026-03-10
AI LLM

Quantifying and extending the coverage of spatial categorization data sets

Variation in spatial categorization across languages is often studied by eliciting human labels for the relations depicted in a set of scenes known as the Topological Relations Picture Series (TRPS...

Wanchun Li, Alexandra Carstensen, Yang Xu, Terry Regier, Charles Kemp

2603.09373 2026-03-10
AI LLM

Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically e...

Anshul Thakur, Soheila Molaei, Pafue Christy Nganjimi, Joshua Fieggen, Andrew A. S. Soltan, Danie...

2603.09356 2026-03-10
AI LLM

The Virtuous Cycle: AI-Powered Vector Search and Vector Search-Augmented AI

Modern AI and vector search are rapidly converging, forming a promising research frontier in intelligent information systems. On one hand, advances in AI have substantially improved the semantic ac...

Jiuqi Wei, Quanqing Xu, Chuanhui Yang

2603.09347 2026-03-10
AI LLM

TaSR-RAG: Taxonomy-guided Structured Reasoning for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) helps large language models (LLMs) answer knowledge-intensive and time-sensitive questions by conditioning generation on external evidence. However, most RAG sy...

Jiashuo Sun, Yixuan Xie, Jimeng Shi, Shaowen Wang, Jiawei Han

2603.09341 2026-03-10
AI LLM

Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments

Large Language Models (LLMs) have achieved strong performance on static reasoning benchmarks, yet their effectiveness as interactive agents operating in adversarial, time-sensitive environments rem...

Yang Li, Xing Chen, Yutao Liu, Gege Qi, Yanxian BI, Zizhe Wang, Yunjian Zhang, Yao Zhu

2603.09337 2026-03-10
AI LLM

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

System requirement specifications (SyRSs) are central, natural-language (NL) artifacts. Access to real SyRS for research purposes is highly valuable but limited by proprietary restrictions or confi...

Alex R. Mattukat, Florian M. Braun, Horst Lichter

2603.09335 2026-03-10
AI LLM

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

In VR interactions with embodied conversational agents, users' emotional intent is often conveyed more by how something is said than by what is said. However, most VR agent pipelines rely on speech...

SangYeop Jeong, Yeongseo Na, Seung Gyu Jeong, Jin-Woo Jeong, Seong-Eun Kim

2603.09324 2026-03-10
AI LLM

Curveball Steering: The Right Direction To Steer Isn't Always Linear

Activation steering is a widely used approach for controlling large language model (LLM) behavior by intervening on internal representations. Existing methods largely rely on the Linear Representat...

Shivam Raval, Hae Jin Song, Linlin Wu, Abir Harrasse, Jeff Phillips, Amirali Abdullah

2603.09313 2026-03-10
AI LLM

Rescaling Confidence: What Scale Design Reveals About LLM Metacognition

Verbalized confidence, in which LLMs report a numerical certainty score, is widely used to estimate uncertainty in black-box settings, yet the confidence scale itself (typically 0--100) is rarely e...

Yuyang Dai

2603.09309 2026-03-10
AI LLM

Investor risk profiles of large language models

This paper investigates how large language models (LLMs) form and express investor risk profiles, a critical component of retail investment advising. We examine three LLMs (GPT, Gemini, and Llama) ...

Hanyong Cho, Geumil Bae, Jang Ho Kim

2603.09303 2026-03-10
AI LLM

Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

This study introduces a benchmark framework for evaluating the financial decision-making capabilities of large language models (LLMs) through portfolio optimization problems with mathematically exp...

Hanyong Cho, Jang Ho Kim

2603.09301 2026-03-10
AI LLM

TA-Mem: Tool-Augmented Autonomous Memory Retrieval for LLM in Long-Term Conversational QA

Large Language Model (LLM) has exhibited strong reasoning ability in text-based contexts across various domains, yet the limitation of context window poses challenges for the model on long-range in...

Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan, Yichao Wu, Penghao Liang

2603.09297 2026-03-10