Papers
Research papers from arXiv and related sources
FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications
Instruction following is critical for LLMs deployed in enterprise and API-driven settings, where strict adherence to output formats, content constraints, and procedural requirements is essential fo...
Yunfan Zhang, Yijie Bei, Jetashree Ravi, Pawel Garbacki
HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents
Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and...
Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou
SinhaLegal: A Benchmark Corpus for Information Extraction and Analysis in Sinhala Legislative Texts
SinhaLegal introduces a Sinhala legislative text corpus containing approximately 2 million words across 1,206 legal documents. The dataset includes two types of legal documents: 1,065 Acts dated fr...
Minduli Lasandi, Nevidu Jayatilleke
On Multi-Step Theorem Prediction via Non-Parametric Structural Priors
Multi-step theorem prediction is a central challenge in automated reasoning. Existing neural-symbolic approaches rely heavily on supervised parametric models, which exhibit limited generalization t...
Junbo Zhao, Ting Zhang, Can Li, Wei He, Jingdong Wang, Hua Huang
Why Is RLHF Alignment Shallow? A Gradient Analysis
Why is safety alignment in LLMs shallow? We prove that gradient-based alignment inherently concentrates on positions where harm is decided and vanishes beyond. Using a martingale decomposition of s...
Robin Young
Design Behaviour Codes (DBCs): A Taxonomy-Driven Layered Governance Benchmark for Large Language Models
We introduce the Dynamic Behavioral Constraint (DBC) benchmark, the first empirical framework for evaluating the efficacy of a structured, 150-control behavioral governance layer, the MDBC (Madan D...
G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan
From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models
Pre-training data detection for LLMs is essential for addressing copyright concerns and mitigating benchmark contamination. Existing methods mainly focus on the likelihood-based statistical feature...
Ruiqi Zhang, Lingxiang Wang, Hainan Zhang, Zhiming Zheng, Yanyan Lan
VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-gra...
Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai
Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses
Automated short-answer scoring lags other LLM applications. We meta-analyze 890 culminating results across a systematic review of LLM short-answer scoring studies, modeling the traditional effect s...
Michael Hardy
LLM-Grounded Explainability for Port Congestion Prediction via Temporal Graph Attention Networks
Port congestion at major maritime hubs disrupts global supply chains, yet existing prediction systems typically prioritize forecasting accuracy without providing operationally interpretable explana...
Zhiming Xue, Yujue Wang
EchoGuard: An Agentic Framework with Knowledge-Graph Memory for Detecting Manipulative Communication in Longitudinal Dialogue
Manipulative communication, such as gaslighting, guilt-tripping, and emotional coercion, is often difficult for individuals to recognize. Existing agentic AI systems lack the structured, longitudin...
Ratna Kandala, Niva Manchanda, Akshata Kishore Moharir, Ananth Kandala
Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents
Persistent conversational AI systems face a choice between passing full conversation histories to a long-context large language model (LLM) and maintaining a dedicated memory system that extracts a...
Natchanon Pollertlam, Witchayut Kornsuwannawit
Detection of GNSS Interference Using Reflected Signal Ob-servations from the LEO Satellite Constellation
Radio Frequency Interference (RFI) is a growing concern for Global Navigation Satellite System (GNSS) reliability. The Cyclone GNSS (CYGNSS) constellation, designed for ocean wind retrieval via GNS...
Ji-Hyeon Shin, Pyo-Woong Son
SparkTales: Facilitating Cross-Language Collaborative Storytelling through Coordinator-AI Collaboration
Cross-language collaborative storytelling plays a vital role in children's language learning and cultural development, fostering both expressive ability and intercultural awareness. Yet, in practic...
Wenxin Zhao, Peng Zhang, Hansu Gu, Haoxuan Zhou, Xiaojie Huo, Lin Wang, Wen Zheng, Tun Lu, Ning Gu
Can LLMs Synthesize Court-Ready Statistical Evidence? Evaluating AI-Assisted Sentencing Bias Analysis for California Racial Justice Act Claims
Resentencing in California remains a complex legal challenge despite legislative reforms like the Racial Justice Act (2020), which allows defendants to challenge convictions based on statistical ev...
Aparna Komarla
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models...
Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao
Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm
Large language models (LLMs) are increasingly used for semantic query processing over large corpora. A set of semantic operators derived from relational algebra has been proposed to provide a unifi...
Nan Hou, Kangfei Zhao, Jiadong Xie, Jeffrey Xu Yu
Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
Large language models (LLMs) have been widely deployed for online generative services, where numerous LLM instances jointly handle workloads with fluctuating request arrival rates and variable requ...
Cong Li, Yihan Yin, Chenhao Xue, Zhao Wang, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Yuan Xi...
Could the interaction of jet and SN ejecta be the cause of X-ray knots observed in a radio galaxy?
We investigate the interaction between relativistic jets and supernova (SN) ejecta as a potential origin of X-ray knots in radio galaxies, employing knot A in M 87 as a test case. By modeling the d...
Jia-Chun He, Xiao-Na Sun, Hao-Qiang Zhang, Yun-Feng Liang, Hai-Ming Zhang, Da-Bin Lin, En-Wei Liang
MOOSEnger -- a Domain-Specific AI Agent for the MOOSE Ecosystem
MOOSEnger is a tool-enabled AI agent tailored to the Multiphysics Object-Oriented Simulation Environment (MOOSE). MOOSE cases are specified in HIT ".i" input files; the large object catalog and str...
Mengnan Li, Jason Miller, Zachary Prince, Alexander Lindsay, Cody Permann