Research

Paper

AI LLM March 12, 2026

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Authors

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Martínez, Pedro Reviriego

Abstract

Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.

Metadata

arXiv ID: 2603.12105

Provider: ARXIV

Primary Category: cs.CL

Published: 2026-03-12

Fetched: 2026-03-14 05:03

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.12105v1</id>\n    <title>To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times</title>\n    <updated>2026-03-12T16:10:27Z</updated>\n    <link href='https://arxiv.org/abs/2603.12105v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.12105v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by prompting an LLM, in zero-shot fashion, with a question similar to those used in human studies. Meanwhile, for other norms such as lexical decision time or age of acquisition, LLMs require supervised fine-tuning to obtain results that align with ground-truth values. In this paper, we extend this approach to the previously unstudied features of sentence memorability and reading times, which involve the relationship between multiple words in a sentence-level context. Our results show that via fine-tuning, models can provide estimates that correlate with human-derived norms and exceed the predictive power of interpretable baseline predictors, demonstrating that LLMs contain useful information about sentence-level features. At the same time, our results show very mixed zero-shot and few-shot performance, providing further evidence that care is needed when using LLM-prompting as a proxy for human cognitive measures.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <published>2026-03-12T16:10:27Z</published>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Thomas Hikaru Clark</name>\n    </author>\n    <author>\n      <name>Carlos Arriaga</name>\n    </author>\n    <author>\n      <name>Javier Conde</name>\n    </author>\n    <author>\n      <name>Gonzalo Martínez</name>\n    </author>\n    <author>\n      <name>Pedro Reviriego</name>\n    </author>\n  </entry>"
}