Papers
Research papers from arXiv and related sources
Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering
Large language models (LLMs) increasingly assist software engineering tasks that require reasoning over long code contexts, yet their robustness under varying input conditions remains unclear. We c...
Kishan Maharaj, Nandakishore Menon, Ashita Saxena, Srikanth Tamilselvam
When LLM Judges Inflate Scores: Exploring Overrating in Relevance Assessment
Human relevance assessment is time-consuming and cognitively intensive, limiting the scalability of Information Retrieval evaluation. This has led to growing interest in using large language models...
Chuting Yu, Hang Li, Joel Mackenzie, Teerapong Leelanupab
SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
This paper presents SimulatorCoder, an agent powered by large language models (LLMs), designed to generate and optimize deep neural network (DNN) accelerator simulators based on natural language de...
Yuhuan Xia, Tun Li, Hongji Zhou, Xianfa Zhou, Chong Chen, Ruiyu Zhang
Powering Up Zeroth-Order Training via Subspace Gradient Orthogonalization
Zeroth-order (ZO) optimization provides a gradient-free alternative to first-order (FO) methods by estimating gradients via finite differences of function evaluations, and has recently emerged as a...
Yicheng Lang, Changsheng Wang, Yihua Zhang, Mingyi Hong, Zheng Zhang, Wotao Yin, Sijia Liu
Integrable cellular automata on finite fields of order $2^n$
This paper explores cellular automata (CA) constructed from Yang-Baxter maps over finite fields $F_{2^n}$. We define $R$-matrices using a map $f$ on $F_{2^n}$ and establish necessary and sufficient...
Aoi Araoka, Tetsuji Tokihiro
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI
As Large Language Models (LLMs) transition from standalone chat interfaces to foundational reasoning layers in multi-agent systems and recursive evaluation loops (LLM-as-a-judge), the detection of ...
Dusan Bosnjakovic
Epistemology of Generative AI: The Geometry of Knowing
Generative AI presents an unprecedented challenge to our understanding of knowledge and its production. Unlike previous technological transformations, where engineering understanding preceded or ac...
Ilya Levin
Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction
Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However,...
Xiaoran Cai, Wang Yang, Xiyu Ren, Chekun Law, Rohit Sharma, Peng Qi
Operationalization of Machine Learning with Serverless Architecture: An Industrial Operationalization of Machine Learning with Serverless Architecture: An Industrial Implementation for Harmonized System Code Prediction
This paper presents a serverless MLOps framework orchestrating the complete ML lifecycle from data ingestion, training, deployment, monitoring, and retraining to using event-driven pipelines and ma...
Sai Vineeth Kandappareddigari, Santhoshkumar Jagadish, Gauri Verma, Ilhuicamina Contreras, Christ...
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Large language model(LLM)-driven multi-agent systems(MAS) coordinate specialized agents through predefined interaction topologies and have shown promise for complex tasks such as competition-level ...
Siyu Wang, Ruotian Lu, Zhihao Yang, Yuchao Wang, Yanzhou Zhang, Lei Xu, Qimin Xu, Guojun Yin, Cai...
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
Despite recent breakthroughs, audio foundation models struggle in processing complex multi-source acoustic scenes. We refer to this challenging domain as audio stories, which can have multiple spea...
William Chen, Prem Seetharaman, Rithesh Kumar, Oriol Nieto, Shinji Watanabe, Justin Salamon, Zeyu...
Agentic Wireless Communication for 6G: Intent-Aware and Continuously Evolving Physical-Layer Intelligence
As 6G wireless systems evolve, growing functional complexity and diverse service demands are driving a shift from rule-based control to intent-driven autonomous intelligence. User requirements are ...
Zhaoyang Li, Xingzhi Jin, Junyu Pan, Qianqian Yang, Zhiguo Shi
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitat...
Chuiyang Meng, Ming Tang, Vincent W. S. Wong
What to Cut? Predicting Unnecessary Methods in Agentic Code Generation
Agentic Coding, powered by autonomous agents such as GitHub Copilot and Cursor, enables developers to generate code, tests, and pull requests from natural language instructions alone. While this ac...
Kan Watanabe, Tatsuya Shirai, Yutaro Kashiwa, Hajimu Iida
Synergizing Transport-Based Generative Models and Latent Geometry for Stochastic Closure Modeling
Diffusion models recently developed for generative AI tasks can produce high-quality samples while still maintaining diversity among samples to promote mode coverage, providing a promising path for...
Xinghao Dong, Huchen Yang, Jin-long Wu
How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses
The rapid adoption of large language models has led to the emergence of AI coding agents that autonomously create pull requests on GitHub. However, how these agents differ in their pull request des...
Kan Watanabe, Rikuto Tsuchida, Takahiro Monno, Bin Huang, Kazuma Yamasaki, Youmei Fan, Kazumasa S...
Rememo: A Research-through-Design Inquiry Towards an AI-in-the-loop Therapist's Tool for Dementia Reminiscence
Reminiscence therapy (RT) is a common non-pharmacological intervention in dementia care. Recent technology-mediated interventions have largely focused on people with dementia through solutions that...
Celeste Seah, Yoke Chuan Lee, Jung-Joo Lee, Ching-Chiuan Yen, Clement Zheng
BankMathBench: A Benchmark for Numerical Reasoning in Banking Scenarios
Large language models (LLMs)-based chatbots are increasingly being adopted in the financial domain, particularly in digital banking, to handle customer inquiries about products such as deposits, sa...
Yunseung Lee, Subin Kim, Youngjun Kwak, Jaegul Choo
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We in...
Yunseok Han, Yejoon Lee, Jaeyoung Do
Large Language Models Persuade Without Planning Theory of Mind
A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theo...
Jared Moore, Rasmus Overmark, Ned Cooper, Beba Cibralic, Nick Haber, Cameron R. Jones