Papers
Research papers from arXiv and related sources
The Sound of Death: Deep Learning Reveals Vascular Damage from Carotid Ultrasound
Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide, yet early risk detection is often limited by available diagnostics. Carotid ultrasound, a non-invasive and widely acc...
Christoph Balada, Aida Romano-Martinez, Payal Varshney, Vincent ten Cate, Katharina Geschke, Jona...
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation
The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly quest...
Bogdan Kostić, Conor Fallon, Julian Risch, Alexander Löser
Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE
Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researc...
Valdemar Švábenský, Brendan Flanagan, Erwin Daniel López Zapata, Atsushi Shimada
MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions
Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presenta...
Hui Min Wong, Philip Heesen, Pascal Janetzky, Martin Bendszus, Stefan Feuerriegel
Human attribution of empathic behaviour to AI systems
Artificial intelligence systems increasingly generate text intended to provide social and emotional support. Understanding how users perceive empathic qualities in such content is therefore critica...
Jonas Festor, Ivo Snels, Bennett Kleinberg
Non-Invasive Anemia Detection: A Multichannel PPG-Based Hemoglobin Estimation with Explainable Artificial Intelligence
Anemia is a prevalent hematological disorder that requires frequent hemoglobin monitoring for early diagnosis and effective management. Conventional hemoglobin assessment relies on invasive blood s...
Garima Sahu, Poorva Verma, Nachiket Tapas
Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective
While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the su...
Yukun Chen, Xinyu Zhang, Jialong Tang, Yu Wan, Baosong Yang, Yiming Li, Zhan Qin, Kui Ren
Federated Latent Space Alignment for Multi-user Semantic Communications
Semantic communication aims to convey meaning for effective task execution, but differing latent representations in AI-native devices can cause semantic mismatches that hinder mutual understanding....
Giuseppe Di Poce, Mario Edoardo Pandolfo, Emilio Calvanese Strinati, Paolo Di Lorenzo
On the Reliability of User-Centric Evaluation of Conversational Recommender Systems
User-centric evaluation has become a key paradigm for assessing Conversational Recommender Systems (CRS), aiming to capture subjective qualities such as satisfaction, trust, and rapport. To enable ...
Michael Müller, Amir Reza Mohammadi, Andreas Peintner, Beatriz Barroso Gstrein, Günther Specht, E...
Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study
Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments pre...
Kensuke Okada, Yui Furukawa, Kyosuke Bunji
EA-Swin: An Embedding-Agnostic Swin Transformer for AI-Generated Video Detection
Recent advances in foundation video generators such as Sora2, Veo3, and other commercial systems have produced highly realistic synthetic videos, exposing the limitations of existing detection meth...
Hung Mai, Loi Dinh, Duc Hai Nguyen, Dat Do, Luong Doan, Khanh Nguyen Quoc, Huan Vu, Phong Ho, Nae...
On the Concept of Violence: A Comparative Study of Human and AI Judgments
Background: What counts as violence is neither self-evident nor universally agreed upon. While physical aggression is prototypical, contemporary societies increasingly debate whether exclusion, hum...
Mariachiara Stellato, Francesco Lancia, Chiara Galeazzi, Nico Curti
Inferring Height from Earth Embeddings: First insights using Google AlphaEarth
This study investigates whether the geospatial and multimodal features encoded in \textit{Earth Embeddings} can effectively guide deep learning (DL) regression models for regional surface height ma...
Alireza Hamoudzadeh, Valeria Belloni, Roberta Ravanelli
Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web
The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical inter...
Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath
Tri-Resonant Leptogenesis in a Non-Holomorphic Modular A$_4$ Scotogenic Model
We investigate low-scale baryogenesis \textit{via} tri-resonant leptogenesis within the scotogenic model with a scalar dark matter embedded in non-holomorphic modular $A_4$ symmetry framework. The ...
Tapender, Surender Verma
Disjunction Composition of BDD Transition Systems for Model-Based Testing
We introduce a compositional approach to model-based test generation in Behavior-Driven Development (BDD). BDD is an agile methodology in which system behavior is specified through textual scenario...
Tannaz Zameni, Petra van den Bos, Arend Rensink
All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with informati...
Zeyu Zhang, Ryan Chen, Bradly C. Stadie
Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations ...
Bianca Raimondi, Maurizio Gabbrielli
Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs
As large language models (LLMs) continue to grow in size, fewer users are able to host and run models locally. This has led to increased use of third-party hosting services. However, in this settin...
Arka Pal, Louai Zahran, William Gvozdjak, Akilesh Potti, Micah Goldblum
Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight
Predicting human decision-making in high-stakes environments remains a central challenge for artificial intelligence. While large language models (LLMs) demonstrate strong general reasoning, they o...
Ben Yellin, Ehud Ezra, Mark Foreman, Shula Grinapol