Papers
Research papers from arXiv and related sources
Agentic Code Reasoning
Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a s...
Shubham Ugare, Satish Chandra
VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications
We introduce VietSuperSpeech, a large-scale Vietnamese automatic speech recognition (ASR) dataset of 52,023 audio-text pairs totaling 267.39 hours, with a distinctive focus on casual conversational...
Loan Do, Thanh Ngoc Nguyen, Thanh Pham, Vinh Do, Hien Nguyen, Charlotte Nguyen
Diagnosing Generalization Failures from Representational Geometry Markers
Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conve...
Chi-Ning Chou, Artem Kirsanov, Yao-Yuan Yang, SueYeon Chung
CTForensics: A Comprehensive Dataset and Method for AI-Generated CT Image Detection
With the rapid development of generative AI in medical imaging, synthetic Computed Tomography (CT) images have demonstrated great potential in applications such as data augmentation and clinical di...
Yiheng Li, Zichang Tan, Guoqing Xu, Yijun Ye, Yang Yang, Zhen Lei
KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models
Knowledge distillation (KD) is an essential technique to compress large language models (LLMs) into smaller ones. However, despite the distinct roles of the student model and the teacher model in K...
Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu
Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection
Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven ph...
Tailai Song, Pedro Casas, Michela Meo
Guaranteed Image Classification via Goal-oriented Joint Semantic Source and Channel Coding
To enable critical applications such as remote diagnostics, image classification must be guaranteed under bandwidth constraints and unreliable wireless channels through joint source and channel cod...
Wenchao Wu, Min Qiu, Yansha Deng, Jinhong Yuan
Sovereign AI-based Public Services are Viable and Affordable
The rapid expansion of AI-based remote services has intensified debates about the long-term implications of growing structural concentration in infrastructure and expertise. As AI capabilities beco...
António Branco, Luís Gomes, Rodrigo Santos, Eduardo Santos, João Silva, Nuno Marques, Madalena Ro...
CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation
LLM-as-judge evaluation has become standard practice for open-ended model assessment; however, judges exhibit systematic biases that cannot be eliminated by increasing the number of scenarios or ge...
Ziyi Zhu, Olivier Tieleman, Alexey Bukhtiyarov, Jinghong Chen
Let the Agent Search: Autonomous Exploration Beats Rigid Workflows in Temporal Question Answering
Temporal Knowledge Graph Question Answering (TKGQA) demands multi-hop reasoning under temporal constraints. Prior approaches based on large language models (LLMs) typically rely on rigid, hand-craf...
Xufei Lv, Jiahui Yang, Yifu Gao, Linbo Qiao, Houde Liu
Constrained Particle Seeking: Solving Diffusion Inverse Problems with Just Forward Passes
Diffusion models have gained prominence as powerful generative tools for solving inverse problems due to their ability to model complex data distributions. However, existing methods typically rely ...
Hongkun Dou, Zike Chen, Zeyu Li, Hongjue Li, Lijun Yang, Yue Deng
Probing Materials Knowledge in LLMs: From Latent Embeddings to Reliable Predictions
Large language models are increasingly applied to materials science, yet fundamental questions remain about their reliability and knowledge encoding. Evaluating 25 LLMs across four materials scienc...
Vineeth Venugopal, Soroush Mahjoubi, Elsa Olivetti
OpenAutoNLU: Open Source AutoML Library for NLU
OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existin...
Grigory Arshinov, Aleksandr Boriskin, Sergey Senichev, Ayaz Zaripov, Daria Galimzianova, Daniil K...
Emerging Human-like Strategies for Semantic Memory Foraging in Large Language Models
Both humans and Large Language Models (LLMs) store a vast repository of semantic memories. In humans, efficient and strategic access to this memory store is a critical foundation for a variety of c...
Eric Lacosse, Mariana Duarte, Peter M. Todd, Daniel C. McNamee
Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
Emotional and cognitive factors are essential for understanding mental health disorders. However, existing methods often treat multi-modal data as classification tasks, limiting interpretability es...
Zhiyuan Zhou, Yanrong Guo, Shijie Hao
Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition
Implementing new features across an entire codebase presents a formidable challenge for Large Language Models (LLMs). This proactive task requires a deep understanding of the global system architec...
Mingwei Liu, Zhenxi Chen, Zheng Pei, Zihao Wang, Yanlin Wang, Zibin Zheng
SSMG-Nav: Enhancing Lifelong Object Navigation with Semantic Skeleton Memory Graph
Navigating to out-of-sight targets from human instructions in unfamiliar environments is a core capability for service robots. Despite substantial progress, most approaches underutilize reusable, p...
Haochen Niu, Lantao Zhang, Xingwu Ji, Rendong Ying, Peilin Liu, Fei Wen
Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
We study the ongoing debate regarding the statistical fidelity of AI-generated data compared to human-generated data in the context of non-verbal communication using full body motion. Concretely, w...
Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu
ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs
Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a LLMs should not know is important for ensuring alignment and thus safe use...
Xunlei Chen, Jinyu Guo, Yuang Li, Zhaokun Wang, Yi Gong, Jie Zou, Jiwei Wei, Wenhong Tian
Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report
This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for...
Andreas Happe, Jürgen Cito