Papers
Research papers from arXiv and related sources
PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information
Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose P...
Kihyuk Yoon, Lingchao Mao, Catherine Chong, Todd J. Schwedt, Chia-Chun Chiang, Jing Li
Curiosity Over Hype: Modeling Motivation Language to Understand Early Outcomes in a Selective Quantum Track
We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 app...
Daniella Alexandra Crysti Vargas Saldana, Freddy Herrera Cueva
Hardware-Accelerated Geometrical Simulation of Biological and Engineered In-Air Ultrasonic Systems
The deployment of in-air acoustic sensors for industrial monitoring and autonomous robotics has grown significantly, often drawing inspiration from biological echolocation. However, developing and ...
Wouter Jansen, Jan Steckel
KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge
Large Language Models (LLMs) possess a remarkable capacity to generate persuasive and intelligible language. However, coherence does not equate to truthfulness, as the responses often contain subtl...
Alex Robertson, Huizhi Liang, Mahbub Gani, Rohit Kumar, Srijith Rajamohan
Evaluating the Impact of Data Anonymization on Image Retrieval
With the growing importance of privacy regulations such as the General Data Protection Regulation, anonymizing visual data is becoming increasingly relevant across institutions. However, anonymizat...
Marvin Chen, Manuel Eberhardinger, Johannes Maucher
Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration
Generative artificial intelligence systems increasingly participate in research, law, education, media, and governance. Their fluent and adaptive outputs create an experience of collaboration. Howe...
Tatia Codreanu
Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding
We present Nacrith, a lossless compression system that combines a 135M-parameter transformer language model (SmolLM2-135M) with an ensemble of lightweight online predictors and a 32-bit arithmetic ...
Roberto Tacconelli
PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring
While advancements in Text-to-Video (T2V) generative AI offer a promising path toward democratizing content creation, current models are often optimized for visual fidelity rather than instructiona...
Injun Baek, Yearim Kim, Nojun Kwak
Rules or Weights? Comparing User Understanding of Explainable AI Techniques with the Cognitive XAI-Adaptive Model
Rules and Weights are popular XAI techniques for explaining AI decisions. Yet, it remains unclear how to choose between them, lacking a cognitive framework to compare their interpretability. In an ...
Louth Bin Rawshan, Zhuoyu Wang, Brian Y Lim
Seeing Clearly, Reasoning Confidently: Plug-and-Play Remedies for Vision Language Model Blindness
Vision language models (VLMs) have achieved remarkable success in broad visual understanding, yet they remain challenged by object-centric reasoning on rare objects due to the scarcity of such inst...
Xin Hu, Haomiao Ni, Yunbei Zhang, Jihun Hamm, Zechen Li, Zhengming Ding
Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering
The adoption of large language models in safety-critical system engineering is constrained by trustworthiness, traceability, and alignment with established verification practices. We propose workfl...
Chih-Hong Cheng, Brian Hsuan-Cheng Liao, Adam Molin, Hasan Esen
Anatomy of Unlearning: The Dual Impact of Fact Salience and Model Fine-Tuning
Machine Unlearning (MU) enables Large Language Models (LLMs) to remove unsafe or outdated information. However, existing work assumes that all facts are equally forgettable and largely ignores whet...
Borisiuk Anna, Andrey Savchenko, Alexander Panchecko, Elena Tutubalina
RAID: Retrieval-Augmented Anomaly Detection
Unsupervised Anomaly Detection (UAD) aims to identify abnormal regions by establishing correspondences between test images and normal templates. Existing methods primarily rely on image reconstruct...
Mingxiu Cai, Zhe Zhang, Gaochang Wu, Tianyou Chai, Xiatian Zhu
Learning Mutual View Information Graph for Adaptive Adversarial Collaborative Perception
Collaborative perception (CP) enables data sharing among connected and autonomous vehicles (CAVs) to enhance driving safety. However, CP systems are vulnerable to adversarial attacks where maliciou...
Yihang Tao, Senkang Hu, Haonan An, Zhengru Fang, Hangcheng Cao, Yuguang Fang
ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
We introduce ISO-Bench, a benchmark for coding agents to test their capabilities on real-world inference optimization tasks. These tasks were taken from vLLM and SGLang, two of the most popular LLM...
Ayush Nangia, Shikhar Mishra, Aman Gokrani, Paras Chopra
Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We intro...
Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi
Metaorder modelling and identification from public data
Market-order flow in financial markets exhibits long-range correlations. This is a widely known stylised fact of financial markets. A popular hypothesis for this stylised fact comes from the Lillo-...
Ezra Goliath, Tim Gebbie
Co-Optimization of Network Topology and Variable Impedance Devices under Dynamic Line Ratings in Power Transmission Systems
Power system operators are increasingly deploying Grid Enhancing Technologies (GETs) to mitigate operational challenges such as line and transformer congestion, and voltage violations. These techno...
Junseon Park, Hyeongon Park, Rahul K. Gupta
Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet
Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in sel...
Biswajit Sadhu, Kalpak Gupte, Trijit Sadhu, S. Anand
Goal-Oriented Influence-Maximizing Data Acquisition for Learning and Optimization
Active data acquisition is central to many learning and optimization tasks in deep neural networks, yet remains challenging because most approaches rely on predictive uncertainty estimates that are...
Weichi Yao, Bianca Dumitrascu, Bryan R. Goldsmith, Yixin Wang