Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
TESTING

Spatial Colour Mixing Illusions as a Perception Stress Test for Vision-Language Models

Vision-language models (VLMs) achieve strong benchmark results, yet can exhibit systematic perceptual weaknesses: structured, large changes to pixel values can cause confident yet nonsensical predi...

Nicoleta-Nina Basoc, Adrian Cosma, Emilian Radoi

2603.06141 2026-03-06
AI LLM

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to...

Puneet Mathur, Branislav Kveton, Subhojyoti Mukherjee, Viet Dac Lai

2603.06138 2026-03-06
AI LLM

Making Implicit Premises Explicit in Logical Understanding of Enthymemes

Real-world arguments in text and dialogues are normally enthymemes (i.e. some of their premises and/or claims are implicit). Natural language processing (NLP) methods for handling enthymemes can po...

Xuyao Feng, Anthony Hunter

2603.06114 2026-03-06
TESTING

Untangling dust emission and CIB anisotropies with the Scattering Transform Statistics

Template-fit approach is often used to separate the Galactic dust emission and the cosmic infrared background (CIB) anisotropies at low $\text{HI}$ column density regions with an underlying assumpt...

Srijita Sinha, Tuhin Ghosh, Erwan Allys, François Boulanger, Jean-Marc Delouis

2603.06110 2026-03-06
TESTING

Real-World Fault Detection for C-Extended Python Projects with Automated Unit Test Generation

Many popular Python libraries use C-extensions for performance-critical operations allowing users to combine the best of the two worlds: The simplicity and versatility of Python and the performance...

Lucas Berg, Lukas Krodinger, Stephan Lukasczyk, Annibale Panichella, Gordon Fraser, Wim Vanhoof, ...

2603.06107 2026-03-06
TESTING

Variability Study and Searching for QPOs with day-like periods in the blazar S5 0716+714 with TESS

Using an unprecedented cadence of 30 minutes provided by the Transiting Exoplanet Survey Satellite (TESS), we have examined the optical light curves (LCs) of the blazar S5 0716+714 obtained from it...

Shubham Kishore, Alok C. Gupta, Paul J. Wiita

2603.06099 2026-03-06
TESTING

Enhancing Neural Video Compression of Static Scenes with Positive-Incentive Noise

Static scene videos, such as surveillance feeds and videotelephony streams, constitute a dominant share of storage consumption and network traffic. However, both traditional standardized codecs and...

Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li

2603.06095 2026-03-06
AI LLM

Experiences Build Characters: The Linguistic Origins and Functional Impact of LLM Personality

Human problem-solving is enriched by a diversity of styles and personality traits, yet the development of Large Language Models (LLMs) has largely prioritized uniform performance benchmarks that fa...

Xi Wang, Mengdie Zhuang, Jiqun Liu

2603.06088 2026-03-06
AI LLM

Lyapunov Probes for Hallucination Detection in Large Foundation Models

We address hallucination detection in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) by framing the problem through the lens of dynamical systems stability theory. Rather...

Bozhi Luan, Gen Li, Yalan Qin, Jifeng Guo, Yun Zhou, Faguo Wu, Hongwei Zheng, Wenjun Wu, Zhaoxin Fan

2603.06081 2026-03-06
AI LLM

Distributed Semantic Alignment over Interference Channels: A Game-Theoretic Approach

Semantic communication acts as a key enabler for effective task execution in AI-driven systems, prioritizing the extraction of the underlying meaning before transmission. However, when devices rely...

Giuseppe Di Poce, Mattia Merluzzi, Emilio Calvanese Strinati, Paolo Di Lorenzo

2603.06077 2026-03-06
TESTING

Aggregative Semantics for Quantitative Bipolar Argumentation Frameworks

Formal argumentation is being used increasingly in artificial intelligence as an effective and understandable way to model potentially conflicting pieces of information, called arguments, and ident...

Yann Munro, Isabelle Bloch, Marie-Jeanne Lesot

2603.06067 2026-03-06
AI LLM

Evaluating Austrian A-Level German Essays with Large Language Models for Automated Essay Scoring

Automated Essay Scoring (AES) has been explored for decades with the goal to support teachers by reducing grading workload and mitigating subjective biases. While early systems relied on handcrafte...

Jonas Kubesch, Lena Huber, Clemens Havas

2603.06066 2026-03-06
AI LLM

ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning

Conversational shopping agents represent a critical consumer-facing application of Large Language Model (LLM)-powered agents, yet how to effectively apply post-training Reinforcement Learning (RL) ...

Yiruo Cheng, Kelong Mao, Tianhao Li, Jiejun Tan, Ji-Rong Wen, Zhicheng Dou

2603.06065 2026-03-06
AI LLM

Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation

Task planning, the problem of sequencing actions to reach a goal from an initial state, is a core capability requirement for autonomous robotic systems. Whether large language models (LLMs) can ser...

Kai Göbel, Pierrick Lorang, Patrik Zips, Tobias Glück

2603.06064 2026-03-06
TESTING

RODEO: RObotic DEcentralized Organization

Robots are improving their autonomy with minimal human supervision. However, auditable actions, transparent decision processes, and new human-robot interaction models are still missing requirements...

Milan Groshev, Eduardo Castelló Ferrer

2603.06058 2026-03-06
AI LLM

A LINDDUN-based Privacy Threat Modeling Framework for GenAI

As generative AI (GenAI) systems become increasingly prevalent across various technological stacks, the question of how such systems handle sensitive and personal data flows becomes increasingly im...

Qianying Liao, Jonah Bellemans, Laurens Sion, Xue Jiang, Dmitrii Usynin, Xuebing Zhou, Dimitri Va...

2603.06051 2026-03-06
AI LLM

Pre-AI Baseline: Developer IDE Satisfaction and Tool Autonomy in 2022

To quantify the impact of AI on software development, the community requires a robust pre-AI baseline. This study analyzes valid satisfaction data from 1,155 software developers collected in July 2...

Nikola Balić

2603.06050 2026-03-06
AI LLM

Detecting Semantic Alignments between Textual Specifications and Domain Models

Context: Having domain models derived from textual specifications has proven to be very useful in the early phases of software engineering. However, creating correct domain models and establishing ...

Shwetali Shimangaud, Lola Burgueño, Rijul Saini, Jörg Kienzle

2603.06037 2026-03-06
TESTING

Ensemble Learning with Sparse Hypercolumns

Directly inspired by findings in biological vision, high-dimensional hypercolumns are feature vectors built by concatenating multi-scale activations of convolutional neural networks for a single im...

Julia Dietlmeier, Vayangi Ganepola, Oluwabukola G. Adegboro, Mayug Maniparambil, Claudia Mazo, No...

2603.06036 2026-03-06
TESTING

Occlusion-Aware SORT: Observing Occlusion for Robust Multi-Object Tracking

Multi-object tracking (MOT) involves analyzing object trajectories and counting the number of objects in video sequences. However, 2D MOT faces challenges due to positional cost confusion arising f...

Chunjiang Li, Jianbo Ma, Li Shen, Yanru Chen, Liangyin Chen

2603.06034 2026-03-06