Research

Paper

TESTING March 16, 2026

POLAR:A Per-User Association Test in Embedding Space

Authors

Pedro Bento, Arthur Buzelin, Arthur Chagas, Yan Aquino, Victoria Estanislau, Samira Malaquias, Pedro Robles Dutenhefner, Gisele L. Pappa, Virgilio Almeida, Wagner MeiraJr

Abstract

Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.

Metadata

arXiv ID: 2603.15950

Provider: ARXIV

Primary Category: cs.CL

Published: 2026-03-16

Fetched: 2026-03-18 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.15950v1</id>\n    <title>POLAR:A Per-User Association Test in Embedding Space</title>\n    <updated>2026-03-16T21:59:13Z</updated>\n    <link href='https://arxiv.org/abs/2603.15950v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.15950v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CY'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SI'/>\n    <published>2026-03-16T21:59:13Z</published>\n    <arxiv:comment>Accepted paper at ICWSM 2026</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Pedro Bento</name>\n    </author>\n    <author>\n      <name>Arthur Buzelin</name>\n    </author>\n    <author>\n      <name>Arthur Chagas</name>\n    </author>\n    <author>\n      <name>Yan Aquino</name>\n    </author>\n    <author>\n      <name>Victoria Estanislau</name>\n    </author>\n    <author>\n      <name>Samira Malaquias</name>\n    </author>\n    <author>\n      <name>Pedro Robles Dutenhefner</name>\n    </author>\n    <author>\n      <name>Gisele L. Pappa</name>\n    </author>\n    <author>\n      <name>Virgilio Almeida</name>\n    </author>\n    <author>\n      <name>Wagner MeiraJr</name>\n    </author>\n  </entry>"
}