Research

Paper

TESTING March 12, 2026

ZTab: Domain-based Zero-shot Annotation for Table Columns

Authors

Ehsan Hoseinzade, Ke Wang

Abstract

This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to privacy concerns. However, existing zero-shot models suffer from poor performance when the number of semantic column types is large, limited understanding of tabular structure, and privacy risks arising from dependence on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework that addresses both performance and zero-shot requirements. Given a domain configuration consisting of a set of predefined semantic types and sample table schemas, ZTab generates pseudo-tables for the sample schemas and fine-tunes an annotation LLM on them. ZTab is domain-based zero-shot in that it does not depend on user-specific labeled training data; therefore, no retraining is needed for a test table from a similar domain. We describe three cases of domain-based zero-shot. The domain configuration of ZTab provides a trade-off between the extent of zero-shot and annotation performance: a "universal domain" that contains all semantic types approaches "pure" zero-shot, while a "specialized domain" that contains semantic types for a specific application enables better zero-shot performance within that domain. Source code and datasets are available at https://github.com/hoseinzadeehsan/ZTab

Metadata

arXiv ID: 2603.11436
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-03-12
Fetched: 2026-03-13 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.11436v1</id>\n    <title>ZTab: Domain-based Zero-shot Annotation for Table Columns</title>\n    <updated>2026-03-12T02:02:36Z</updated>\n    <link href='https://arxiv.org/abs/2603.11436v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.11436v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to privacy concerns. However, existing zero-shot models suffer from poor performance when the number of semantic column types is large, limited understanding of tabular structure, and privacy risks arising from dependence on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework that addresses both performance and zero-shot requirements. Given a domain configuration consisting of a set of predefined semantic types and sample table schemas, ZTab generates pseudo-tables for the sample schemas and fine-tunes an annotation LLM on them. ZTab is domain-based zero-shot in that it does not depend on user-specific labeled training data; therefore, no retraining is needed for a test table from a similar domain. We describe three cases of domain-based zero-shot. The domain configuration of ZTab provides a trade-off between the extent of zero-shot and annotation performance: a \"universal domain\" that contains all semantic types approaches \"pure\" zero-shot, while a \"specialized domain\" that contains semantic types for a specific application enables better zero-shot performance within that domain. Source code and datasets are available at https://github.com/hoseinzadeehsan/ZTab</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-03-12T02:02:36Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Ehsan Hoseinzade</name>\n    </author>\n    <author>\n      <name>Ke Wang</name>\n    </author>\n  </entry>"
}