Papers
Research papers from arXiv and related sources
Current LLMs still cannot 'talk much' about grammar modules: Evidence from syntax
We aim to examine the extent to which Large Language Models (LLMs) can 'talk much' about grammar modules, providing evidence from syntax core properties translated by ChatGPT into Arabic. We collec...
Mohammed Q. Shormani
Cislunar State and Uncertainty Propagation via the Modified Generalized Equinoctial Orbital Elements
The complex cislunar dynamical environment poses challenges for spacecraft navigation and Space Domain Awareness (SDA) operations, where the knowledge of current and future spacecraft states is ess...
Maaninee Gupta, Kyle J. DeMars
GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression
Current network data telemetry pipelines consist of massive streams of fine-grained Key Performance Indicators (KPIs) from multiple distributed sources towards central aggregators, making data stor...
Pietro Talli, Qi Liao, Alessandro Lieto, Parijat Bhattacharjee, Federico Chiariotti, Andrea Zanella
Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition
Forecasting plays a crucial role in modern safety-critical applications, such as space operations. However, the increasing use of deep forecasting models introduces a new security risk of trojan ho...
Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyńsk...
Sharing The Secret: Distributed Privacy-Preserving Monitoring
In traditional runtime verification, a system is typically observed by a monolithic monitor. Enforcing privacy in such settings is computationally expensive, as it necessitates heavy cryptographic ...
Mahyar Karimi, K. S. Thejaswini, Roderick Bloem, Thomas A. Henzinger
The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and...
Amartya Roy, Rasul Tutunov, Xiaotong Ji, Matthieu Zimmer, Haitham Bou-Ammar
Pitfalls in Evaluating Interpretability Agents
Automated interpretability systems aim to reduce the need for human labor and scale analysis to increasingly large models and diverse tasks. Recent efforts toward this goal leverage large language ...
Tal Haklay, Nikhil Prakash, Sana Pandey, Antonio Torralba, Aaron Mueller, Jacob Andreas, Tamar Ro...
LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain
Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This pape...
Antonio De Santis, Marco Balduini, Matteo Belcao, Andrea Proia, Marco Brambilla, Emanuele Della V...
Beyond Accuracy: Towards a Robust Evaluation Methodology for AI Systems for Language Education
The rapid adoption of large language models in AI-powered language education has created an urgent need for evaluations that assess pedagogical effectiveness, particularly in language learning--one...
James Edgell, Wm. Matthew Kennedy, Isaac Pattis, Ben Knight, Danielle Carvalho, Elizabeth Wonnacott
Inference in high-dimensional logistic regression under tensor network dependence
We investigate the problem of statistical inference for logistic regression with high-dimensional covariates in settings where dependence among individuals is induced by an underlying Markov random...
Josh Miles, Sohom Bhattacharya
Agentic Harness for Real-World Compilers
Compilers are critical to modern computing, yet fixing compiler bugs is difficult. While recent large language model (LLM) advancements enable automated bug repair, compiler bugs pose unique challe...
Yingwei Zheng, Cong Li, Shaohua Li, Yuqun Zhang, Zhendong Su
The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries
When a traveler asks an AI search engine to recommend a hotel, which sources get cited -- and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queri...
Peiying Zhu, Sidi Chang
From School AI Readiness to Student AI Literacy: A National Multilevel Mediation Analysis of Institutional Capacity and Teacher Capability
Artificial intelligence (AI) is increasingly embedded in vocational education systems, yet empirical evidence linking institutional AI readiness to student learning outcomes remains limited. This s...
Xiu Guan, Mingmin Zheng, Dragan Gašević, Wenxin Guo, Yingqun Liu, Xibin Han, Danijela Gasevic, Ru...
Feasible Deviations from Unitarity with Vector-Like Quark Singlets
We deduce pertinent relations between the elements of the CKM matrix, and find that not all of these are totally compatible with experiment and/or the assumption of the $3 \times 3$ unitarity. We i...
Francisco Albergaria, Francisco J. Botella, G. C. Branco, José Filipe Bastos, J. I. Silva-Marcos
Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs
Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from inef...
Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang
LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families
Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions...
Jianan Chen, Xiaoxue Gao, Tatsuya Kawahara, Nancy F. Chen
CoverageBench: Evaluating Information Coverage across Tasks and Domains
We wish to measure the information coverage of an ad hoc retrieval algorithm, that is, how much of the range of available relevant information is covered by the search results. Information coverage...
Saron Samuel, Andrew Yates, Dawn Lawrie, Ian Soboroff, Trevor Adriaanse, Benjamin Van Durme, Euge...
Orchestrating Human-AI Software Delivery: A Retrospective Longitudinal Field Study of Three Software Modernization Programs
Evidence on AI in software engineering still leans heavily toward individual task completion, while evidence on team-level delivery remains scarce. We report a retrospective longitudinal field stud...
Maximiliano Armesto, Christophe Kolb
Detached Skip-Links and $R$-Probe: Decoupling Feature Aggregation from Gradient Propagation for MLLM OCR
Multimodal large language models (MLLMs) excel at high-level reasoning yet fail on OCR tasks where fine-grained visual details are compromised or misaligned. We identify an overlooked optimization ...
Ziye Yuan, Ruchang Yao, Chengxin Zheng, Yusheng Zhao, Daxiang Dong, Ming Zhang
RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering
Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable knowledge graphs. Existing approaches fall int...
Bo Yuan, Hexuan Deng, Xuebo Liu, Min Zhang