Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Agentic Code Reasoning

Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a s...

Shubham Ugare, Satish Chandra

2603.01896 2026-03-02
AI LLM

VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications

We introduce VietSuperSpeech, a large-scale Vietnamese automatic speech recognition (ASR) dataset of 52,023 audio-text pairs totaling 267.39 hours, with a distinctive focus on casual conversational...

Loan Do, Thanh Ngoc Nguyen, Thanh Pham, Vinh Do, Hien Nguyen, Charlotte Nguyen

2603.01894 2026-03-02
AI LLM

Diagnosing Generalization Failures from Representational Geometry Markers

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conve...

Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang, SueYeon Chung

2603.01879 2026-03-02
AI LLM

CTForensics: A Comprehensive Dataset and Method for AI-Generated CT Image Detection

With the rapid development of generative AI in medical imaging, synthetic Computed Tomography (CT) images have demonstrated great potential in applications such as data augmentation and clinical di...

Yiheng Li, Zichang Tan, Guoqing Xu, Yijun Ye, Yang Yang, Zhen Lei

2603.01878 2026-03-02
AI LLM

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models

Knowledge distillation (KD) is an essential technique to compress large language models (LLMs) into smaller ones. However, despite the distinct roles of the student model and the teacher model in K...

Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu

2603.01875 2026-03-02
AI LLM

Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection

Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven ph...

Tailai Song, Pedro Casas, Michela Meo

2603.01874 2026-03-02
AI LLM

Guaranteed Image Classification via Goal-oriented Joint Semantic Source and Channel Coding

To enable critical applications such as remote diagnostics, image classification must be guaranteed under bandwidth constraints and unreliable wireless channels through joint source and channel cod...

Wenchao Wu, Min Qiu, Yansha Deng, Jinhong Yuan

2603.01872 2026-03-02
AI LLM

Sovereign AI-based Public Services are Viable and Affordable

The rapid expansion of AI-based remote services has intensified debates about the long-term implications of growing structural concentration in infrastructure and expertise. As AI capabilities beco...

António Branco, Luís Gomes, Rodrigo Santos, Eduardo Santos, João Silva, Nuno Marques, Madalena Ro...

2603.01869 2026-03-02
AI LLM

CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation

LLM-as-judge evaluation has become standard practice for open-ended model assessment; however, judges exhibit systematic biases that cannot be eliminated by increasing the number of scenarios or ge...

Ziyi Zhu, Olivier Tieleman, Alexey Bukhtiyarov, Jinghong Chen

2603.01865 2026-03-02
AI LLM

Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering

Temporal Knowledge Graph Question Answering (TKGQA) demands multi-hop reasoning under temporal constraints. Prior approaches based on large language models (LLMs) typically rely on rigid, hand-craf...

Xufei Lv, Jiahui Yang, Yifu Gao, Linbo Qiao, Houde Liu

2603.01853 2026-03-02
AI LLM

Constrained Particle Seeking: Solving Diffusion Inverse Problems with Just Forward Passes

Diffusion models have gained prominence as powerful generative tools for solving inverse problems due to their ability to model complex data distributions. However, existing methods typically rely ...

Hongkun Dou, Zike Chen, Zeyu Li, Hongjue Li, Lijun Yang, Yue Deng

2603.01837 2026-03-02
AI LLM

Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions

Large language models are increasingly applied to materials science, yet fundamental questions remain about their reliability and knowledge encoding. Evaluating 25 LLMs across four materials scienc...

Vineeth Venugopal, Soroush Mahjoubi, Elsa Olivetti

2603.01834 2026-03-02
AI LLM

OpenAutoNLU: Open Source AutoML Library for NLU

OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existin...

Grigory Arshinov, Aleksandr Boriskin, Sergey Senichev, Ayaz Zaripov, Daria Galimzianova, Daniil K...

2603.01824 2026-03-02
AI LLM

Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models

Both humans and Large Language Models (LLMs) store a vast repository of semantic memories. In humans, efficient and strategic access to this memory store is a critical foundation for a variety of c...

Eric Lacosse, Mariana Duarte, Peter M. Todd, Daniel C. McNamee

2603.01822 2026-03-02
AI LLM

Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding

Emotional and cognitive factors are essential for understanding mental health disorders. However, existing methods often treat multi-modal data as classification tasks, limiting interpretability es...

Zhiyuan Zhou, Yanrong Guo, Shijie Hao

2603.01816 2026-03-02
AI LLM

Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition

Implementing new features across an entire codebase presents a formidable challenge for Large Language Models (LLMs). This proactive task requires a deep understanding of the global system architec...

Mingwei Liu, Zhenxi Chen, Zheng Pei, Zihao Wang, Yanlin Wang, Zibin Zheng

2603.01814 2026-03-02
AI LLM

SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph

Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, p...

Haochen Niu, Lantao Zhang, Xingwu Ji, Rendong Ying, Peilin Liu, Fei Wen

2603.01813 2026-03-02
AI LLM

Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments

We study the ongoing debate regarding the statistical fidelity of AI-generated data compared to human-generated data in the context of non-verbal communication using full body motion. Concretely, w...

Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu

2603.01804 2026-03-02
AI LLM

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a LLMs should not know is important for ensuring alignment and thus safe use...

Xunlei Chen, Jinyu Guo, Yuang Li, Zhaokun Wang, Yi Gong, Jie Zou, Jiwei Wei, Wenhong Tian

2603.01792 2026-03-02
AI LLM

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for...

Andreas Happe, Jürgen Cito

2603.01789 2026-03-02