Paper
Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions
Authors
Hussein Ghaly
Abstract
This paper introduces a new methodology for using LLM-based systems for accurate and efficient semantic tagging of UN Security Council resolutions. The main goal is to leverage LLM performance variability to build ensemble systems for data cleaning and semantic tagging tasks. We introduce two evaluation metrics: Content Preservation Ratio (CPR) and Tag Well-Formedness (TWF), in order to avoid hallucinations and unnecessary additions or omissions to the input text beyond the task requirement. These metrics allow the selection of the best output from multiple runs of several GPT models. GPT-4.1 achieved the highest metrics for both tasks (Cleaning: CPR 84.9% - Semantic Tagging: CPR 99.99% and TWF 99.92%). In terms of cost, smaller models, such as GPT-4.1-mini, achieved comparable performance to the best model in each task at only 20% of the cost. These metrics ultimately allowed the ensemble to select the optimal output (both cleaned and tagged content) for all the LLM models involved, across multiple runs. With this ensemble design and the use of metrics, we create a reliable LLM system for performing semantic tagging on challenging texts.
Metadata
Related papers
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30
On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers
Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30
Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books
Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30
ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining
Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30
RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems
Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.05895v1</id>\n <title>Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions</title>\n <updated>2026-03-06T04:26:53Z</updated>\n <link href='https://arxiv.org/abs/2603.05895v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.05895v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>This paper introduces a new methodology for using LLM-based systems for accurate and efficient semantic tagging of UN Security Council resolutions. The main goal is to leverage LLM performance variability to build ensemble systems for data cleaning and semantic tagging tasks. We introduce two evaluation metrics: Content Preservation Ratio (CPR) and Tag Well-Formedness (TWF), in order to avoid hallucinations and unnecessary additions or omissions to the input text beyond the task requirement. These metrics allow the selection of the best output from multiple runs of several GPT models. GPT-4.1 achieved the highest metrics for both tasks (Cleaning: CPR 84.9% - Semantic Tagging: CPR 99.99% and TWF 99.92%). In terms of cost, smaller models, such as GPT-4.1-mini, achieved comparable performance to the best model in each task at only 20% of the cost. These metrics ultimately allowed the ensemble to select the optimal output (both cleaned and tagged content) for all the LLM models involved, across multiple runs. With this ensemble design and the use of metrics, we create a reliable LLM system for performing semantic tagging on challenging texts.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n <published>2026-03-06T04:26:53Z</published>\n <arxiv:primary_category term='cs.CL'/>\n <author>\n <name>Hussein Ghaly</name>\n </author>\n </entry>"
}