Papers
Research papers from arXiv and related sources
FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications
Instruction following is critical for LLMs deployed in enterprise and API-driven settings, where strict adherence to output formats, content constraints, and procedural requirements is essential fo...
Yunfan Zhang, Yijie Bei, Jetashree Ravi, Pawel Garbacki
HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents
Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and...
Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou
SinhaLegal: A Benchmark Corpus for Information Extraction and Analysis in Sinhala Legislative Texts
SinhaLegal introduces a Sinhala legislative text corpus containing approximately 2 million words across 1,206 legal documents. The dataset includes two types of legal documents: 1,065 Acts dated fr...
Minduli Lasandi, Nevidu Jayatilleke
On Multi-Step Theorem Prediction via Non-Parametric Structural Priors
Multi-step theorem prediction is a central challenge in automated reasoning. Existing neural-symbolic approaches rely heavily on supervised parametric models, which exhibit limited generalization t...
Junbo Zhao, Ting Zhang, Can Li, Wei He, Jingdong Wang, Hua Huang
Why Is RLHF Alignment Shallow? A Gradient Analysis
Why is safety alignment in LLMs shallow? We prove that gradient-based alignment inherently concentrates on positions where harm is decided and vanishes beyond. Using a martingale decomposition of s...
Robin Young
Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models
We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for evaluating the efficacy of a structured, 150-control behavioral governance layer, the MDBC (Madan D...
G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan
From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models
Pre-training data detection for LLMs is essential for addressing copyright concerns and mitigating benchmark contamination. Existing methods mainly focus on the likelihood-based statistical feature...
Ruiqi Zhang, Lingxiang Wang, Hainan Zhang, Zhiming Zheng, Yanyan Lan
VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-gra...
Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai
Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses
Automated short-answer scoring lags other LLM applications. We meta-analyze 890 culminating results across a systematic review of LLM short-answer scoring studies, modeling the traditional effect s...
Michael Hardy
EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue
Manipulative communication, such as gaslighting, guilt-tripping, and emotional coercion, is often difficult for individuals to recognize. Existing agentic AI systems lack the structured, longitudin...
Ratna Kandala, Niva Manchanda, Akshata Kishore Moharir, Ananth Kandala
Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents
Persistent conversational AI systems face a choice between passing full conversation histories to a long-context large language model (LLM) and maintaining a dedicated memory system that extracts a...
Natchanon Pollertlam, Witchayut Kornsuwannawit
SparkTales: Facilitating Cross-Language Collaborative Storytelling through Coordinator-AI Collaboration
Cross-language collaborative storytelling plays a vital role in children's language learning and cultural development, fostering both expressive ability and intercultural awareness. Yet, in practic...
Wenxin Zhao, Peng Zhang, Hansu Gu, Haoxuan Zhou, Xiaojie Huo, Lin Wang, Wen Zheng, Tun Lu, Ning Gu
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models...
Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao
Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm
Large language models (LLMs) are increasingly used for semantic query processing over large corpora. A set of semantic operators derived from relational algebra has been proposed to provide a unifi...
Nan Hou, Kangfei Zhao, Jiadong Xie, Jeffrey Xu Yu
Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
Large language models (LLMs) have been widely deployed for online generative services, where numerous LLM instances jointly handle workloads with fluctuating request arrival rates and variable requ...
Cong Li, Yihan Yin, Chenhao Xue, Zhao Wang, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Yuan Xi...
SELDON: Supernova Explosions Learned by Deep ODE Networks
The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C. Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the tradi...
Jiezhong Wu, Jack O'Brien, Jennifer Li, M. S. Krafczyk, Ved G. Shah, Amanda R. Wasserman, Daniel ...
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction fail...
Boyuan, Guan, Wencong Cui, Levente Juhasz
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce ...
Furkan Mumcu, Yasin Yilmaz
LLM-supported 3D Modeling Tool for Radio Radiance Field Reconstruction
Accurate channel estimation is essential for massive multiple-input multiple-output (MIMO) technologies in next-generation wireless communications. Recently, the radio radiance field (RRF) has emer...
Chengling Xu, Huiwen Zhang, Haijian Sun, Feng Ye
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored atta...
Haoyu Liu, Dingcheng Li, Lukas Rutishauser, Zeyu Zheng