Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
TESTING

Semantics for 2D Rasterization

Rasterization is the process of determining the color of every pixel drawn by an application. Powerful rasterization libraries like Skia, CoreGraphics, and Direct2D put exceptional effort into draw...

Bhargav Kulkarni, Henry Whiting, Pavel Panchekha

2603.23696 2026-03-24
TESTING

Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots

The rapid adoption of large language models (LLMs) in education raises profound challenges for assessment design. To adapt assessments to the presence of LLM-based tools, it is crucial to character...

Licol Zeinfeld, Alona Strugatski, Ziva Bar-Dov, Ron Blonder, Shelley Rap, Giora Alexandron

2603.23682 2026-03-24
TESTING

Prototype Fusion: A Training-Free Multi-Layer Approach to OOD Detection

Deep learning models are increasingly deployed in safety-critical applications, where reliable out-of-distribution (OOD) detection is essential to ensure robustness. Existing methods predominantly ...

Shreen Gul, Mohamed Elmahallawy, Ardhendu Tripathy, Sanjay Madria

2603.23677 2026-03-24
TESTING

PerturbationDrive: A Framework for Perturbation-Based Testing of ADAS

Advanced driver assistance systems (ADAS) often rely on deep neural networks to interpret driving images and support vehicle control. Although reliable under nominal conditions, these systems remai...

Hannes Leonhard, Stefano Carlo Lambertenghi, Andrea Stocco

2603.23661 2026-03-24
TESTING

Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages

We present Ethio-ASR, a suite of multilingual CTC-based automatic speech recognition (ASR) models jointly trained on five Ethiopian languages: Amharic, Tigrinya, Oromo, Sidaama, and Wolaytta. These...

Badr M. Abdullah, Israel Abebe Azime, Atnafu Lambebo Tonja, Jesujoba O. Alabi, Abel Mulat Alemu, ...

2603.23654 2026-03-24
TESTING

Foundation Model Embeddings Meet Blended Emotions: A Multimodal Fusion Approach for the BLEMORE Challenge

We present our system for the BLEMORE Challenge at FG 2026 on blended emotion recognition with relative salience prediction. Our approach combines six encoder families through late probability fusi...

Masoumeh Chapariniya, Aref Farhadipour, Sarah Ebling, Volker Dellwo, Teodora Vukovic

2603.23650 2026-03-24
TESTING

QuickQudits: A Framework for Efficient Simulation of Noisy Qudit Clifford Circuits via an Extended Stabilizer Tableau Formalism

We present a comprehensive and self-contained framework for the efficient classical simulation of Clifford circuits acting on $d$-dimensional qudits, including realistic Pauli/Weyl noise via stocha...

Nina Brandl, Mykyta Cherniak, Johannes Kofler, Richard Kueng

2603.23641 2026-03-24
TESTING

Testing Dark Energy with Black Hole Ringdown

We show that dynamical dark energy theories can imprint $O(1)$ modifications on the quasi-normal mode (QNM) spectrum characterising black hole ringdown. The time dependence of dynamical dark energy...

Laurens Smulders, Johannes Noller, Sergi Sirera

2603.23634 2026-03-24
TESTING

Detect--Repair--Verify for LLM-Generated Code: A Multi-Language, Multi-Granularity Empirical Study

Large language models can generate runnable software artifacts, but their security remains difficult to evaluate end to end. This study examines that problem through a Detect--Repair--Verify (DRV) ...

Cheng Cheng

2603.23633 2026-03-24
TESTING

OccAny: Generalized Unconstrained Urban 3D Occupancy

Relying on in-domain annotations and precise sensor-rig priors, existing 3D occupancy prediction methods are limited in both scalability and out-of-domain generalization. While recent visual geomet...

Anh-Quan Cao, Tuan-Hung Vu

2603.23502 2026-03-24
TESTING

MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Vision Language Models (VLMs) are increasingly used for tasks like medical report generation and visual question answering. However, fluent diagnostic text does not guarantee safe visual understand...

Ufaq Khan, Umair Nawaz, L D M S S Teja, Numaan Saeed, Muhammad Bilal, Yutong Xie, Mohammad Yaqub,...

2603.23501 2026-03-24
AI LLM

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image gen...

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan...

2603.23500 2026-03-24
TESTING

Estimating Flow Velocity and Vehicle Angle-of-Attack from Non-invasive Piezoelectric Structural Measurements Using Deep Learning

Accurate estimation of aerodynamic state variables such as freestream velocity and angle of attack (AoA) is important for aerodynamic load prediction, flight control, and model validation. This wor...

Chandler B. Smith, S. Hales Swift, Andrew Steyer, Ihab El-Kady

2603.23496 2026-03-24
AI LLM

Failure of contextual invariance in gender inference with large language models

Standard evaluation practices assume that large language model (LLM) outputs are stable under contextually equivalent formulations of a task. Here, we test this assumption in the setting of gender ...

Sagar Kumar, Ariel Flint, Luca Maria Aiello, Andrea Baronchelli

2603.23485 2026-03-24
AI LLM

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision) achieve remarkable reasoning capabilities through iterative visual tool invocation. However, the cascade...

Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo

2603.23483 2026-03-24
AI LLM

ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

Requirements engineering is a vital, yet labor-intensive, stage in the software development process. This article introduces ReqFusion: an AI-enhanced system that automates the extraction, classifi...

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun

2603.23482 2026-03-24
AI LLM

Evidence of political bias in search engines and language models before major elections

Search engines (SEs) and large language models (LLMs) are central to political information access, yet their algorithmic decisions and potential underlying biases remain underexplored. We developed...

Íris Damião, Paulo Almeida, João Franco, Nuno Santos, Pedro C. Magalhães, Joana Gonçalves-Sá

2603.23474 2026-03-24
AI LLM

Regulating AI Agents

AI agents -- systems that can independently take actions to pursue complex goals with only limited human oversight -- have entered the mainstream. These systems are now being widely used to produce...

Kathrin Gardhouse, Amin Oueslati, Noam Kolt

2603.23471 2026-03-24
AI LLM

ConceptCoder: Improve Code Reasoning via Concept Learning

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce Conc...

Md Mahbubur Rahman, Hengbo Tong, Wei Le

2603.23470 2026-03-24
AI LLM

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

AI-driven cybersecurity systems often fail under cross-environment deployment due to fragmented, event-centric telemetry representations. We introduce the Canonical Security Telemetry Substrate (CS...

Abdul Rahman

2603.23459 2026-03-24