Research

Paper

AI LLM March 13, 2026

Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System

Authors

Necva Bölücü, Jessica Irons, Changhyun Lee, Brian Jin, Maciej Rybinski, Huichen Yang, Andreas Duenser, Stephen Wan

Abstract

The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.

Metadata

arXiv ID: 2603.12638

Provider: ARXIV

Primary Category: cs.CL

Published: 2026-03-13

Fetched: 2026-03-16 06:01

Related papers

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongy... • 2026-03-25

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donah... • 2026-03-25

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

Biplab Pal, Santanu Bhattacharya • 2026-03-25

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, ... • 2026-03-25

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Zhuo Li, Yupeng Zhang, Pengyu Cheng, Jiajun Song, Mengyu Zhou, Hao Li, Shujie... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.12638v1</id>\n    <title>Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System</title>\n    <updated>2026-03-13T04:16:08Z</updated>\n    <link href='https://arxiv.org/abs/2603.12638v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.12638v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CL'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.HC'/>\n    <published>2026-03-13T04:16:08Z</published>\n    <arxiv:comment>17pages, 9 figures, EACL demo track</arxiv:comment>\n    <arxiv:primary_category term='cs.CL'/>\n    <author>\n      <name>Necva Bölücü</name>\n    </author>\n    <author>\n      <name>Jessica Irons</name>\n    </author>\n    <author>\n      <name>Changhyun Lee</name>\n    </author>\n    <author>\n      <name>Brian Jin</name>\n    </author>\n    <author>\n      <name>Maciej Rybinski</name>\n    </author>\n    <author>\n      <name>Huichen Yang</name>\n    </author>\n    <author>\n      <name>Andreas Duenser</name>\n    </author>\n    <author>\n      <name>Stephen Wan</name>\n    </author>\n  </entry>"
}