Research

Paper

AI LLM February 19, 2026

Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

Authors

Xiaoran Cai, Wang Yang, Xiyu Ren, Chekun Law, Rohit Sharma, Peng Qi

Abstract

Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However, sustainability ratings across agencies for a single company vary widely, limiting their comparability, credibility, and relevance to decision-making. To harmonize the rating results, we propose adopting a universal human-AI collaboration framework to generate trustworthy benchmark datasets for evaluating sustainability rating methodologies. The framework comprises two complementary parts: STRIDE (Sustainability Trust Rating & Integrity Data Equation) provides principled criteria and a scoring system that guide the construction of firm-level benchmark datasets using large language models (LLMs), and SR-Delta, a discrepancy-analysis procedural framework that surfaces insights for potential adjustments. The framework enables scalable and comparable assessment of sustainability rating methodologies. We call on the broader AI community to adopt AI-powered approaches to strengthen and advance sustainability rating methodologies that support and enforce urgent sustainability agendas.

Metadata

arXiv ID: 2602.17106

Provider: ARXIV

Primary Category: cs.AI

Published: 2026-02-19

Fetched: 2026-02-21 18:51

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.17106v1</id>\n    <title>Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction</title>\n    <updated>2026-02-19T06:04:28Z</updated>\n    <link href='https://arxiv.org/abs/2602.17106v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.17106v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However, sustainability ratings across agencies for a single company vary widely, limiting their comparability, credibility, and relevance to decision-making. To harmonize the rating results, we propose adopting a universal human-AI collaboration framework to generate trustworthy benchmark datasets for evaluating sustainability rating methodologies. The framework comprises two complementary parts: STRIDE (Sustainability Trust Rating &amp; Integrity Data Equation) provides principled criteria and a scoring system that guide the construction of firm-level benchmark datasets using large language models (LLMs), and SR-Delta, a discrepancy-analysis procedural framework that surfaces insights for potential adjustments. The framework enables scalable and comparable assessment of sustainability rating methodologies. We call on the broader AI community to adopt AI-powered approaches to strengthen and advance sustainability rating methodologies that support and enforce urgent sustainability agendas.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-19T06:04:28Z</published>\n    <arxiv:primary_category term='cs.AI'/>\n    <author>\n      <name>Xiaoran Cai</name>\n    </author>\n    <author>\n      <name>Wang Yang</name>\n    </author>\n    <author>\n      <name>Xiyu Ren</name>\n    </author>\n    <author>\n      <name>Chekun Law</name>\n    </author>\n    <author>\n      <name>Rohit Sharma</name>\n    </author>\n    <author>\n      <name>Peng Qi</name>\n    </author>\n  </entry>"
}