Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning

The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performance in production. The root cause often lies i...

Jianan Huang, Rodolfo V. Valentim, Luca Vassio, Matteo Boffa, Marco Mellia, Idilio Drago, Dario R...

2603.20181 2026-03-20
AI LLM

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a ...

Eric A. Moreno, Samuel Bright-Thonney, Andrzej Novak, Dolores Garcia, Philip Harris

2603.20179 2026-03-20
AI LLM

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying that faithfulness is an objective, measurable pr...

Richard J. Young

2603.20172 2026-03-20
AI LLM

Learning Dynamic Belief Graphs for Theory-of-mind Reasoning

Theory of Mind (ToM) reasoning with Large Language Models (LLMs) requires inferring how people's implicit, evolving beliefs shape what they seek and how they act under uncertainty -- especially in ...

Ruxiao Chen, Xilei Zhao, Thomas J. Cova, Frank A. Drews, Susu Xu

2603.20170 2026-03-20
AI LLM

Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio

With the advancements in AI speech synthesis, it is easier than ever before to generate realistic audio in a target voice. One only needs a few seconds of reference audio from the target, quite lit...

Candice R. Gerstner

2603.20165 2026-03-20
TESTING

SPT-3G D1: Maps of the millimeter-wave sky from 2019 and 2020 observations of the SPT-3G Main field

Maps of the sky in millimeter wavelengths contain rich information on cosmology through anisotropies of the cosmic microwave background (CMB). Creating multifrequency sky maps of anisotropies in th...

W. Quan, E. Camphuis, C. Daley, N. Huang, Y. Omori, F. Guidi, E. Anderes, A. J. Anderson, B. Ansa...

2603.20163 2026-03-20
AI LLM

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence f...

Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

2603.20161 2026-03-20
AI LLM

Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case

Engineering system design -- whether mechatronic, control, or embedded -- often proceeds in an ad hoc manner, with requirements left implicit and traceability from intent to parameters largely abse...

H. Sinan Bank, Daniel R. Herber, Thomas H. Bradley

2603.20151 2026-03-20
TESTING

HortiMulti: A Multi-Sensor Dataset for Localisation and Mapping in Horticultural Polytunnels

Agricultural robotics is gaining increasing relevance in both research and real-world deployment. As these systems are expected to operate autonomously in more complex tasks, the availability of re...

Shuoyuan Xu, Zhipeng Zhong, Tiago Barros, Matthew Coombes, Cristiano Premebida, Hao Wu, Cunjia Liu

2603.20150 2026-03-20
TESTING

Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification

The Hyperspace Analogue to Language (HAL) model relies on global word co-occurrence matrices to construct distributional semantic representations. While these representations capture lexical relati...

Ali Sakour, Zoalfekar Sakour

2603.20149 2026-03-20
AI LLM

Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

Automated building facade inspection is a critical component of urban resilience and smart city maintenance. Traditionally, this field has relied on specialized discriminative models (e.g., YOLO, M...

Hui Zhong, Yichun Gao, Luyan Liu, Hai Yang, Wang Wang, Haowei Zhang, Xinhu Zheng

2603.20148 2026-03-20
TESTING

AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning

Recent advances in reinforcement learning (RL) have enabled impressive humanoid behaviors in simulation, yet transferring these results to new robots remains challenging. In many real deployments, ...

Huihua Zhao, Rafael Cathomen, Lionel Gulich, Wei Liu, Efe Arda Ongan, Michael Lin, Shalin Jain, S...

2603.20147 2026-03-20
TESTING

A New Multi-Constraint Potential Field Source Surface (PFSS) Extrapolation Model

The Potential Field Source Surface (PFSS) model is the most used approach for extrapolating the global coronal magnetic field, offering efficiency and strong performance at large scales. However, P...

C. Antonio, I. Chifu, R. Gafeira, J. J. G. Lima

2603.20142 2026-03-20
TESTING

Classifier-Based Nonparametric Sequential Hypothesis Testing

We consider the problem of constructing sequential power-one tests where the null and alternative classes are specified indirectly through historical or offline data. More specifically, given an of...

Chia-Yu Hsu, Shubhanshu Shekhar

2603.20135 2026-03-20
AI LLM

Reasoning Gets Harder for LLMs Inside A Dialogue

Large Language Models (LLMs) achieve strong performance on many reasoning benchmarks, yet these evaluations typically focus on isolated tasks that differ from real-world usage in task-oriented dial...

Ivan Kartáč, Mateusz Lango, Ondřej Dušek

2603.20133 2026-03-20
AI LLM

Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents

Large language models have achieved great success in multiple challenging tasks, and their capacity can be further boosted by the emerging agentic AI techniques. This new computing paradigm has alr...

Cen Wan, Alex A. Freitas

2603.20132 2026-03-20
AI LLM

An Agentic Multi-Agent Architecture for Cybersecurity Risk Management

Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are ge...

Ravish Gupta, Saket Kumar, Shreeya Sharma, Maulik Dang, Abhishek Aggarwal

2603.20131 2026-03-20
TESTING

Numerically stable equations for the orbital evolution of compact object binaries

The orbital and eccentricity evolution for compact object binaries through gravitational wave emission first derived by Peters and Mathews are used extensively throughout the gravitational wave com...

Max M. Briel, Jeff J. Andrews

2603.20124 2026-03-20
AI LLM

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs, including those from long-tail distribution...

Wenjing Hong, Zhonghua Rong, Li Wang, Feng Chang, Jian Zhu, Ke Tang, Zexuan Zhu, Yew-Soon Ong

2603.20122 2026-03-20
TESTING

BioDCASE 2026 Challenge Baseline for Cross-Domain Mosquito Species Classification

Mosquito-borne diseases affect more than one billion people each year and cause close to one million deaths. Traditional surveillance methods rely on traps and manual identification that are slow, ...

Yuanbo Hou, Vanja Zdravkovic, Marianne Sinka, Yunpeng Li, Wenwu Wang, Mark D. Plumbley, Kathy Wil...

2603.20118 2026-03-20