Paper
Curiosity Over Hype: Modeling Motivation Language to Understand Early Outcomes in a Selective Quantum Track
Authors
Daniella Alexandra Crysti Vargas Saldana, Freddy Herrera Cueva
Abstract
We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 applicants' open responses and link them to outcomes from two selective modules: Module 1 (secondary; mathematics and computing foundations; n=23) and Module 2 (secondary + early undergraduate; quantum fundamentals; n=36, including M1 continuers). To ensure baseline comparability, the M2 university entrance exam matched the difficulty of the M1 final. Final grades followed the program's official cohort-specific weightings (attendance/assignments/exam), which we retain to preserve ecological validity. Methodologically, we model text with Latent Dirichlet Allocation (LDA, k=8) and, for robustness, with sentence embeddings from a small multilingual language model, EmbeddingGemma-300M, projected via UMAP and clustered with HDBSCAN. This combination leverages the transparency of bag-of-words topics and the semantic richness of small language model embeddings. Descriptively, curiosity/learning topics show higher grades and attendance than technology/career-oriented topics; inferential tests are underpowered (e.g., linear R2 ~ 0.03; logistic pseudo-R2 ~ 0.04) so effect-size estimates should be viewed as preliminary rather than confirmatory. Embedding-based clustering yields seven clusters with 11.2% noise and modest agreement with LDA (ARI=0.068; NMI=0.163). Results suggest that brief motivation responses encode promising signals that could support early mentoring in rigorous STEM pipelines, while highlighting the need for larger, pre-registered studies.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2602.19659v1</id>\n <title>Curiosity Over Hype: Modeling Motivation Language to Understand Early Outcomes in a Selective Quantum Track</title>\n <updated>2026-02-23T10:09:05Z</updated>\n <link href='https://arxiv.org/abs/2602.19659v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2602.19659v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>We study whether latent motivation signals in short Spanish admission responses predict engagement and performance in an early quantum computing pathway run by QuantumHub Peru. We analyze N=241 applicants' open responses and link them to outcomes from two selective modules: Module 1 (secondary; mathematics and computing foundations; n=23) and Module 2 (secondary + early undergraduate; quantum fundamentals; n=36, including M1 continuers). To ensure baseline comparability, the M2 university entrance exam matched the difficulty of the M1 final. Final grades followed the program's official cohort-specific weightings (attendance/assignments/exam), which we retain to preserve ecological validity. Methodologically, we model text with Latent Dirichlet Allocation (LDA, k=8) and, for robustness, with sentence embeddings from a small multilingual language model, EmbeddingGemma-300M, projected via UMAP and clustered with HDBSCAN. This combination leverages the transparency of bag-of-words topics and the semantic richness of small language model embeddings. Descriptively, curiosity/learning topics show higher grades and attendance than technology/career-oriented topics; inferential tests are underpowered (e.g., linear R2 ~ 0.03; logistic pseudo-R2 ~ 0.04) so effect-size estimates should be viewed as preliminary rather than confirmatory. Embedding-based clustering yields seven clusters with 11.2% noise and modest agreement with LDA (ARI=0.068; NMI=0.163). Results suggest that brief motivation responses encode promising signals that could support early mentoring in rigorous STEM pipelines, while highlighting the need for larger, pre-registered studies.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='physics.ed-ph'/>\n <category scheme='http://arxiv.org/schemas/atom' term='quant-ph'/>\n <published>2026-02-23T10:09:05Z</published>\n <arxiv:comment>Published in the Proceedings of IEEE ICALTER 2025. 5 pages, 7 figures</arxiv:comment>\n <arxiv:primary_category term='physics.ed-ph'/>\n <arxiv:journal_ref>Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALTER), 2025</arxiv:journal_ref>\n <author>\n <name>Daniella Alexandra Crysti Vargas Saldana</name>\n </author>\n <author>\n <name>Freddy Herrera Cueva</name>\n </author>\n <arxiv:doi>10.1109/ICALTER69698.2025.11355072</arxiv:doi>\n <link href='https://doi.org/10.1109/ICALTER69698.2025.11355072' rel='related' title='doi'/>\n </entry>"
}