Research

Paper

AI LLM March 06, 2026

Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics

Authors

Srishti Palani, Vidya Setlur

Abstract

Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-Judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.

Metadata

arXiv ID: 2603.05832
Provider: ARXIV
Primary Category: cs.HC
Published: 2026-03-06
Fetched: 2026-03-09 06:05

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.05832v1</id>\n    <title>Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics</title>\n    <updated>2026-03-06T02:30:55Z</updated>\n    <link href='https://arxiv.org/abs/2603.05832v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.05832v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-Judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-06T02:30:55Z</published>\n    <arxiv:primary_category term='cs.HC'/>\n    <author>\n      <name>Srishti Palani</name>\n    </author>\n    <author>\n      <name>Vidya Setlur</name>\n    </author>\n  </entry>"
}