Papers
Research papers from arXiv and related sources
DATASHI: A Parallel English-Tashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing
DATASHI is a new parallel English-Tashlhiyt corpus that fills a critical gap in computational resources for Amazigh languages. It contains 5,000 sentence pairs, including a 1,500-sentence subset wi...
Nasser-Eddine Monir, Zakaria Baou
Kolmogorov Complexity Bounds for LLM Steganography and a Perplexity-Based Detection Proxy
Large language models can rewrite text to embed hidden payloads while preserving surface-level meaning, a capability that opens covert channels between cooperating AI systems and poses challenges f...
Andrii Shportko
CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation
We present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Posit...
Mohammad Eslami, Dhanvinkumar Ganeshkumar, Saber Kazeminasab, Michael G. Morley, Michael V. Bolan...
Counterfactual Credit Policy Optimization for Multi-Agent Collaboration
Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles and aggregating diverse hypotheses. Yet, reinforcement learning (RL) for such systems i...
Zhongyi Li, Wan Tian, Yikun Ban, Jinju Chen, Huiming Zhang, Yang Liu, Fuzhen Zhuang
Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection
Unsupervised Continuous Anomaly Detection (UCAD) is gaining attention for effectively addressing the catastrophic forgetting and heavy computational burden issues in traditional Unsupervised Anomal...
Mingle Zhou, Jiahui Liu, Jin Wan, Gang Li, Min Li
Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment
Recursive self-improvement--where a model iteratively trains on its own outputs--promises sustained capability growth but faces a fundamental obstacle: recursive drift. As models train on self-gene...
Xinyu Zhang
On the series expansion of the secondary zeta function about $s=1$ and its coefficients
The secondary zeta function is defined as a generalized zeta series over the imaginary parts of non-trivial zeros assuming (RH). This function admits Laurent series expansion at the double pole at ...
Artur Kawalec
AI In Cybersecurity Education -- Scalable Agentic CTF Design Principles and Educational Outcomes
Large language models are rapidly changing how learners acquire and demonstrate cybersecurity skills. However, when human--AI collaboration is allowed, educators still lack validated competition de...
Haoran Xi, Minghao Shao, Kimberly Milner, Venkata Sai Charan Putrevu, Nanda Rani, Meet Udeshi, Pr...
PROBE: Diagnosing Residual Concept Capacity in Erased Text-to-Video Diffusion Models
Concept erasure techniques for text-to-video (T2V) diffusion models report substantial suppression of sensitive content, yet current evaluation is limited to checking whether the target concept is ...
Yiwei Xie, Zheng Zhang, Ping Liu
On the series expansion of the prime zeta function about $s=1$ and its coefficients
In this article, we derive a series expansion of the prime zeta function about the $s=1$ logarithmic singularity and prove general formula for its expansion coefficients, which is similar to the St...
Artur Kawalec
LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search
Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reli...
Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao
VIGIL: Part-Grounded Structured Reasoning for Generalizable Deepfake Detection
Multimodal large language models (MLLMs) offer a promising path toward interpretable deepfake detection by generating textual explanations. However, the reasoning process of current MLLM-based meth...
Xinghan Li, Junhao Xu, Jingjing Chen
BOxCrete: A Bayesian Optimization Open-Source AI Model for Concrete Strength Forecasting and Mix Optimization
Modern concrete must simultaneously satisfy evolving demands for mechanical performance, workability, durability, and sustainability, making mix designs increasingly complex. Recent studies leverag...
Bayezid Baten, M. Ayyan Iqbal, Sebastian Ament, Julius Kusuma, Nishant Garg
SafePilot: A Framework for Assuring LLM-enabled Cyber-Physical Systems
Large Language Models (LLMs), deep learning architectures with typically over 10 billion parameters, have recently begun to be integrated into various cyber-physical systems (CPS) such as robotics,...
Weizhe Xu, Mengyu Liu, Fanxin Kong
Evaluating Power Flow Manifold from Local Data around a Single Operating Point via Geodesics
The widespread adoption of renewable energy poses a challenge in maintaining a feasible operating point in highly variable scenarios. This paper demonstrates that, within a feasible region of a pow...
Qirui Zheng, Dan Wu, Franz-Erich Wolter, Sijia Geng
Learning Inflation Narratives from Reddit: How Lightweight LLMs Reveal Forward-Looking Economic Signals
Public perceptions and expectations of inflation shape household spending, wage bargaining, and policy support, making them key determinants of macroeconomic outcomes. However, current measures rel...
Ryuichi Saito, Sho Tsugawa
Rydberg Atomic Receivers for Net-Zero 6G Wireless Communication and Sensing: Progress, Experiments, and Sustainable Prospects
Against the backdrop of the global drive to advance the green transformation of the information and communications technology (ICT) industry and leverage technological innovation to facilitate the ...
Yi Tao, Zhen Gao, Zhiao Zhu, De Mi, Zhonghuai Wu, Zijian Zhang, Fusang Zhang, Dezhi Zheng, Sheng ...
Effective Strategies for Asynchronous Software Engineering Agents
AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still p...
Jiayi Geng, Graham Neubig
TagLLM: A Fine-Grained Tag Generation Approach for Note Recommendation
Large Language Models (LLMs) have shown promising potential in E-commerce community recommendation. While LLMs and Multimodal LLMs (MLLMs) are widely used to encode notes into implicit embeddings, ...
Zhijian Chen, Likai Wang, Lei Chen, Yaguang Dou, Jialiang Shi, Tian Qi, Dongdong Hao, Mengying Lu...
Beyond Correlation: Refutation-Validated Aspect-Based Sentiment Analysis for Explainable Energy Market Returns
This paper proposes a refutation-validated framework for aspect-based sentiment analysis in financial markets, addressing the limitations of correlational studies that cannot distinguish genuine as...
Wihan van der Heever, Keane Ong, Ranjan Satapathy, Erik Cambria