Research

Paper

TESTING February 23, 2026

Curiosity Over Hype: Modeling Motivation Language to Understand Early Outcomes in a Selective Quantum Track

Authors

Daniella Alexandra Crysti Vargas Saldana, Freddy Herrera Cueva

Abstract

We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 applicants' open responses and link them to outcomes from two selective modules: Module 1 (secondary; mathematics and computing foundations; n=23) and Module 2 (secondary + early undergraduate; quantum fundamentals; n=36, including M1 continuers). To ensure baseline comparability, the M2 university entrance exam matched the difficulty of the M1 final. Final grades followed the program's official cohort-specific weightings (attendance/assignments/exam), which we retain to preserve ecological validity. Methodologically, we model text with Latent Dirichlet Allocation (LDA, k=8) and, for robustness, with sentence embeddings from a small multilingual language model, EmbeddingGemma-300M, projected via UMAP and clustered with HDBSCAN. This combination leverages the transparency of bag-of-words topics and the semantic richness of small language model embeddings. Descriptively, curiosity/learning topics show higher grades and attendance than technology/career-oriented topics; inferential tests are underpowered (e.g., linear R2 ~ 0.03; logistic pseudo-R2 ~ 0.04) so effect-size estimates should be viewed as preliminary rather than confirmatory. Embedding-based clustering yields seven clusters with 11.2% noise and modest agreement with LDA (ARI=0.068; NMI=0.163). Results suggest that brief motivation responses encode promising signals that could support early mentoring in rigorous STEM pipelines, while highlighting the need for larger, pre-registered studies.

Metadata

arXiv ID: 2602.19659
Provider: ARXIV
Primary Category: physics.ed-ph
Published: 2026-02-23
Fetched: 2026-02-24 04:38

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.19659v1</id>\n    <title>Curiosity Over Hype: Modeling Motivation Language to Understand Early Outcomes in a Selective Quantum Track</title>\n    <updated>2026-02-23T10:09:05Z</updated>\n    <link href='https://arxiv.org/abs/2602.19659v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.19659v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 applicants' open responses and link them to outcomes from two selective modules: Module 1 (secondary; mathematics and computing foundations; n=23) and Module 2 (secondary + early undergraduate; quantum fundamentals; n=36, including M1 continuers). To ensure baseline comparability, the M2 university entrance exam matched the difficulty of the M1 final. Final grades followed the program's official cohort-specific weightings (attendance/assignments/exam), which we retain to preserve ecological validity. Methodologically, we model text with Latent Dirichlet Allocation (LDA, k=8) and, for robustness, with sentence embeddings from a small multilingual language model, EmbeddingGemma-300M, projected via UMAP and clustered with HDBSCAN. This combination leverages the transparency of bag-of-words topics and the semantic richness of small language model embeddings. Descriptively, curiosity/learning topics show higher grades and attendance than technology/career-oriented topics; inferential tests are underpowered (e.g., linear R2 ~ 0.03; logistic pseudo-R2 ~ 0.04) so effect-size estimates should be viewed as preliminary rather than confirmatory. Embedding-based clustering yields seven clusters with 11.2% noise and modest agreement with LDA (ARI=0.068; NMI=0.163). Results suggest that brief motivation responses encode promising signals that could support early mentoring in rigorous STEM pipelines, while highlighting the need for larger, pre-registered studies.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='physics.ed-ph'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='quant-ph'/>\n    <published>2026-02-23T10:09:05Z</published>\n    <arxiv:comment>Published in the Proceedings of IEEE ICALTER 2025. 5 pages, 7 figures</arxiv:comment>\n    <arxiv:primary_category term='physics.ed-ph'/>\n    <arxiv:journal_ref>Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALTER), 2025</arxiv:journal_ref>\n    <author>\n      <name>Daniella Alexandra Crysti Vargas Saldana</name>\n    </author>\n    <author>\n      <name>Freddy Herrera Cueva</name>\n    </author>\n    <arxiv:doi>10.1109/ICALTER69698.2025.11355072</arxiv:doi>\n    <link href='https://doi.org/10.1109/ICALTER69698.2025.11355072' rel='related' title='doi'/>\n  </entry>"
}