Research

Paper

TESTING February 23, 2026

Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

Authors

Yijiashun Qi, Hanzhe Guo, Yijiazhen Qi

Abstract

Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.

Metadata

arXiv ID: 2602.19591
Provider: ARXIV
Primary Category: cs.LG
Published: 2026-02-23
Fetched: 2026-02-24 04:38

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.19591v1</id>\n    <title>Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks</title>\n    <updated>2026-02-23T08:35:55Z</updated>\n    <link href='https://arxiv.org/abs/2602.19591v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.19591v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity, yet systematically identifying high-potential SMEs remains an open challenge. We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which SBIR Phase I awardees will advance to Phase II funding using exclusively public data. We construct a heterogeneous graph with 32,268 company nodes, 124 research topic nodes, and 13 government agency nodes connected by approximately 99,000 edges across three semantic relation types. SME-HGT achieves an AUPRC of 0.621 0.003 on a temporally-split test set, outperforming an MLP baseline (0.590 0.002) and R-GCN (0.608 0.013) across five random seeds. At a screening depth of 100 companies, SME-HGT attains 89.6% precision with a 2.14 lift over random selection. Our temporal evaluation protocol prevents information leakage, and our reliance on public data ensures reproducibility. These results demonstrate that relational structure among firms, research topics, and funding agencies provides meaningful signal for SME potential assessment, with implications for policymakers and early-stage investors.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-02-23T08:35:55Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Yijiashun Qi</name>\n    </author>\n    <author>\n      <name>Hanzhe Guo</name>\n    </author>\n    <author>\n      <name>Yijiazhen Qi</name>\n    </author>\n  </entry>"
}