Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

The Sound of Death: Deep Learning Reveals Vascular Damage from Carotid Ultrasound

Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide, yet early risk detection is often limited by available diagnostics. Carotid ultrasound, a non-invasive and widely acc...

Christoph Balada, Aida Romano-Martinez, Payal Varshney, Vincent ten Cate, Katharina Geschke, Jona...

2602.17321 2026-02-19
AI LLM

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly quest...

Bogdan Kostić, Conor Fallon, Julian Risch, Alexander Löser

2602.17316 2026-02-19
AI LLM

Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE

Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researc...

Valdemar Švábenský, Brendan Flanagan, Erwin Daniel López Zapata, Atsushi Shimada

2602.17314 2026-02-19
AI LLM

MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions

Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presenta...

Hui Min Wong, Philip Heesen, Pascal Janetzky, Martin Bendszus, Stefan Feuerriegel

2602.17308 2026-02-19
AI LLM

Human attribution of empathic behaviour to AI systems

Artificial intelligence systems increasingly generate text intended to provide social and emotional support. Understanding how users perceive empathic qualities in such content is therefore critica...

Jonas Festor, Ivo Snels, Bennett Kleinberg

2602.17293 2026-02-19
TESTING

Non-Invasive Anemia Detection: A Multichannel PPG-Based Hemoglobin Estimation with Explainable Artificial Intelligence

Anemia is a prevalent hematological disorder that requires frequent hemoglobin monitoring for early diagnosis and effective management. Conventional hemoglobin assessment relies on invasive blood s...

Garima Sahu, Poorva Verma, Nachiket Tapas

2602.17290 2026-02-19
AI LLM

Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the su...

Yukun Chen, Xinyu Zhang, Jialong Tang, Yu Wan, Baosong Yang, Yiming Li, Zhan Qin, Kui Ren

2602.17283 2026-02-19
AI LLM

Federated Latent Space Alignment for Multi-user Semantic Communications

Semantic communication aims to convey meaning for effective task execution, but differing latent representations in AI-native devices can cause semantic mismatches that hinder mutual understanding....

Giuseppe Di Poce, Mario Edoardo Pandolfo, Emilio Calvanese Strinati, Paolo Di Lorenzo

2602.17271 2026-02-19
AI LLM

On the Reliability of User-Centric Evaluation of Conversational Recommender Systems

User-centric evaluation has become a key paradigm for assessing Conversational Recommender Systems (CRS), aiming to capture subjective qualities such as satisfaction, trust, and rapport. To enable ...

Michael Müller, Amir Reza Mohammadi, Andreas Peintner, Beatriz Barroso Gstrein, Günther Specht, E...

2602.17264 2026-02-19
AI LLM

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Human self-report questionnaires are increasingly used in NLP to benchmark and audit large language models (LLMs), from persona consistency to safety and bias assessments. Yet these instruments pre...

Kensuke Okada, Yui Furukawa, Kyosuke Bunji

2602.17262 2026-02-19
AI LLM

EA-Swin: An Embedding-Agnostic Swin Transformer for AI-Generated Video Detection

Recent advances in foundation video generators such as Sora2, Veo3, and other commercial systems have produced highly realistic synthetic videos, exposing the limitations of existing detection meth...

Hung Mai, Loi Dinh, Duc Hai Nguyen, Dat Do, Luong Doan, Khanh Nguyen Quoc, Huan Vu, Phong Ho, Nae...

2602.17260 2026-02-19
AI LLM

On the Concept of Violence: A Comparative Study of Human and AI Judgments

Background: What counts as violence is neither self-evident nor universally agreed upon. While physical aggression is prototypical, contemporary societies increasingly debate whether exclusion, hum...

Mariachiara Stellato, Francesco Lancia, Chiara Galeazzi, Nico Curti

2602.17256 2026-02-19
TESTING

Inferring Height from Earth Embeddings: First insights using Google AlphaEarth

This study investigates whether the geospatial and multimodal features encoded in \textit{Earth Embeddings} can effectively guide deep learning (DL) regression models for regional surface height ma...

Alireza Hamoudzadeh, Valeria Belloni, Roberta Ravanelli

2602.17250 2026-02-19
AI LLM

Web Verbs: Typed Abstractions for Reliable Task Composition on the Agentic Web

The Web is evolving from a medium that humans browse to an environment where software agents act on behalf of users. Advances in large language models (LLMs) make natural language a practical inter...

Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath

2602.17245 2026-02-19
TESTING

Tri-Resonant Leptogenesis in a Non-Holomorphic Modular A$_4$ Scotogenic Model

We investigate low-scale baryogenesis \textit{via} tri-resonant leptogenesis within the scotogenic model with a scalar dark matter embedded in non-holomorphic modular $A_4$ symmetry framework. The ...

Tapender, Surender Verma

2602.17243 2026-02-19
TESTING

Disjunction Composition of BDD Transition Systems for Model-Based Testing

We introduce a compositional approach to model-based test generation in Behavior-Driven Development (BDD). BDD is an agile methodology in which system behavior is specified through textual scenario...

Tannaz Zameni, Petra van den Bos, Arend Rensink

2602.17237 2026-02-19
AI LLM

All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting

To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with informati...

Zeyu Zhang, Ryan Chen, Bradly C. Stadie

2602.17234 2026-02-19
AI LLM

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations ...

Bianca Raimondi, Maurizio Gabbrielli

2602.17229 2026-02-19
AI LLM

Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs

As large language models (LLMs) continue to grow in size, fewer users are able to host and run models locally. This has led to increased use of third-party hosting services. However, in this settin...

Arka Pal, Louai Zahran, William Gvozdjak, Akilesh Potti, Micah Goldblum

2602.17223 2026-02-19
AI LLM

Decoding the Human Factor: High Fidelity Behavioral Prediction for Strategic Foresight

Predicting human decision-making in high-stakes environments remains a central challenge for artificial intelligence. While large language models (LLMs) demonstrate strong general reasoning, they o...

Ben Yellin, Ehud Ezra, Mark Foreman, Shula Grinapol

2602.17222 2026-02-19