Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Video-driven human reaction generation aims to synthesize 3D human motions that directly react to observed video sequences, which is crucial for building human-like interactive AI systems. However,...

Yuan Zhou, Yongzhi Li, Yanqi Dai, Xingyu Zhu, Yi Tan, Qingshan Xu, Beier Zhu, Richang Hong, Hanwa...

2603.20187 2026-03-20
AI LLM

IndoorR2X: Indoor Robot-to-Everything Coordination with LLM-Driven Planning

Although robot-to-robot (R2R) communication improves indoor scene understanding beyond what a single robot can achieve, R2R alone cannot overcome partial observability without substantial explorati...

Fan Yang, Soumya Teotia, Shaunak A. Mehta, Prajit KrisshnaKumar, Quanting Xie, Jun Liu, Yueqi Son...

2603.20182 2026-03-20
AI LLM

Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning

The use of ML in cybersecurity has long been impaired by generalization issues: Models that work well in controlled scenarios fail to maintain performance in production. The root cause often lies i...

Jianan Huang, Rodolfo V. Valentim, Luca Vassio, Matteo Boffa, Marco Mellia, Idilio Drago, Dario R...

2603.20181 2026-03-20
AI LLM

AI Agents Can Already Autonomously Perform Experimental High Energy Physics

Large language model-based AI agents are now able to autonomously execute substantial portions of a high energy physics (HEP) analysis pipeline with minimal expert-curated input. Given access to a ...

Eric A. Moreno, Samuel Bright-Thonney, Andrzej Novak, Dolores Garcia, Philip Harris

2603.20179 2026-03-20
AI LLM

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

Recent work on chain-of-thought (CoT) faithfulness reports single aggregate numbers (e.g., DeepSeek-R1 acknowledges hints 39% of the time), implying that faithfulness is an objective, measurable pr...

Richard J. Young

2603.20172 2026-03-20
AI LLM

Learning Dynamic Belief Graphs for Theory-of-mind Reasoning

Theory of Mind (ToM) reasoning with Large Language Models (LLMs) requires inferring how people's implicit, evolving beliefs shape what they seek and how they act under uncertainty -- especially in ...

Ruxiao Chen, Xilei Zhao, Thomas J. Cova, Frank A. Drews, Susu Xu

2603.20170 2026-03-20
AI LLM

Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio

With the advancements in AI speech synthesis, it is easier than ever before to generate realistic audio in a target voice. One only needs a few seconds of reference audio from the target, quite lit...

Candice R. Gerstner

2603.20165 2026-03-20
AI LLM

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence f...

Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

2603.20161 2026-03-20
AI LLM

Design-OS: A Specification-Driven Framework for Engineering System Design with a Control-Systems Design Case

Engineering system design -- whether mechatronic, control, or embedded -- often proceeds in an ad hoc manner, with requirements left implicit and traceability from intent to parameters largely abse...

H. Sinan Bank, Daniel R. Herber, Thomas H. Bradley

2603.20151 2026-03-20
AI LLM

Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

Automated building facade inspection is a critical component of urban resilience and smart city maintenance. Traditionally, this field has relied on specialized discriminative models (e.g., YOLO, M...

Hui Zhong, Yichun Gao, Luyan Liu, Hai Yang, Wang Wang, Haowei Zhang, Xinhu Zheng

2603.20148 2026-03-20
AI LLM

Reasoning Gets Harder for LLMs Inside A Dialogue

Large Language Models (LLMs) achieve strong performance on many reasoning benchmarks, yet these evaluations typically focus on isolated tasks that differ from real-world usage in task-oriented dial...

Ivan Kartáč, Mateusz Lango, Ondřej Dušek

2603.20133 2026-03-20
AI LLM

Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents

Large language models have achieved great success in multiple challenging tasks, and their capacity can be further boosted by the emerging agentic AI techniques. This new computing paradigm has alr...

Cen Wan, Alex A. Freitas

2603.20132 2026-03-20
AI LLM

An Agentic Multi-Agent Architecture for Cybersecurity Risk Management

Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are ge...

Ravish Gupta, Saket Kumar, Shreeya Sharma, Maulik Dang, Abhishek Aggarwal

2603.20131 2026-03-20
AI LLM

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs, including those from long-tail distribution...

Wenjing Hong, Zhonghua Rong, Li Wang, Feng Chang, Jian Zhu, Ke Tang, Zexuan Zhu, Yew-Soon Ong

2603.20122 2026-03-20
AI LLM

Current LLMs still cannot 'talk much' about grammar modules: Evidence from syntax

We aim to examine the extent to which Large Language Models (LLMs) can 'talk much' about grammar modules, providing evidence from syntax core properties translated by ChatGPT into Arabic. We collec...

Mohammed Q. Shormani

2603.20114 2026-03-20
AI LLM

GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression

Current network data telemetry pipelines consist of massive streams of fine-grained Key Performance Indicators (KPIs) from multiple distributed sources towards central aggregators, making data stor...

Pietro Talli, Qi Liao, Alessandro Lieto, Parijat Bhattacharjee, Federico Chiariotti, Andrea Zanella

2603.20109 2026-03-20
AI LLM

The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus

LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and...

Amartya Roy, Rasul Tutunov, Xiaotong Ji, Matthieu Zimmer, Haitham Bou-Ammar

2603.20105 2026-03-20
AI LLM

Pitfalls in Evaluating Interpretability Agents

Automated interpretability systems aim to reduce the need for human labor and scale analysis to increasingly large models and diverse tasks. Recent efforts toward this goal leverage large language ...

Tal Haklay, Nikhil Prakash, Sana Pandey, Antonio Torralba, Aaron Mueller, Jacob Andreas, Tamar Ro...

2603.20101 2026-03-20
AI LLM

LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This pape...

Antonio De Santis, Marco Balduini, Matteo Belcao, Andrea Proia, Marco Brambilla, Emanuele Della V...

2603.20094 2026-03-20
AI LLM

Beyond Accuracy: Towards a Robust Evaluation Methodology for AI Systems for Language Education

The rapid adoption of large language models in AI-powered language education has created an urgent need for evaluations that assess pedagogical effectiveness, particularly in language learning--one...

James Edgell, Wm. Matthew Kennedy, Isaac Pattis, Ben Knight, Danielle Carvalho, Elizabeth Wonnacott

2603.20088 2026-03-20