Research

Paper

TESTING March 18, 2026

Revisiting Vulnerability Patch Identification on Data in the Wild

Authors

Ivana Clairine Irsan, Ratnadira Widyasari, Ting Zhang, Huihui Huang, Ferdian Thung, Yikun Li, Lwin Khin Shar, Eng Lieh Ouh, Hong Jin Kang, David Lo

Abstract

Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from NVD reveals that they can be easily distinguished from each other. Security patches associated with NVD have different distribution of commit messages, vulnerability types, and composition of changes. These differences suggest that NVD may be unsuitable as the \textit{sole} source of data for training models to detect security patches. We find that constructing a dataset that combines security patches from NVD data with a small subset of manually identified security patches can improve model robustness.

Metadata

arXiv ID: 2603.17266

Provider: ARXIV

Primary Category: cs.SE

Published: 2026-03-18

Fetched: 2026-03-19 06:01

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.17266v1</id>\n    <title>Revisiting Vulnerability Patch Identification on Data in the Wild</title>\n    <updated>2026-03-18T01:45:39Z</updated>\n    <link href='https://arxiv.org/abs/2603.17266v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.17266v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Attacks can exploit zero-day or one-day vulnerabilities that are not publicly disclosed. To detect these vulnerabilities, security researchers monitor development activities in open-source repositories to identify unreported security patches. The sheer volume of commits makes this task infeasible to accomplish manually. Consequently, security patch detectors commonly trained and evaluated on security patches linked from vulnerability reports in the National Vulnerability Database (NVD). In this study, we assess the effectiveness of these detectors when applied in-the-wild. Our results show that models trained on NVD-derived data show substantially decreased performance, with decreases in F1-score of up to 90\\% when tested on in-the-wild security patches, rendering them impractical for real-world use. An analysis comparing security patches identified in-the-wild and commits linked from NVD reveals that they can be easily distinguished from each other. Security patches associated with NVD have different distribution of commit messages, vulnerability types, and composition of changes. These differences suggest that NVD may be unsuitable as the \\textit{sole} source of data for training models to detect security patches. We find that constructing a dataset that combines security patches from NVD data with a small subset of manually identified security patches can improve model robustness.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SE'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CR'/>\n    <published>2026-03-18T01:45:39Z</published>\n    <arxiv:primary_category term='cs.SE'/>\n    <author>\n      <name>Ivana Clairine Irsan</name>\n    </author>\n    <author>\n      <name>Ratnadira Widyasari</name>\n    </author>\n    <author>\n      <name>Ting Zhang</name>\n    </author>\n    <author>\n      <name>Huihui Huang</name>\n    </author>\n    <author>\n      <name>Ferdian Thung</name>\n    </author>\n    <author>\n      <name>Yikun Li</name>\n    </author>\n    <author>\n      <name>Lwin Khin Shar</name>\n    </author>\n    <author>\n      <name>Eng Lieh Ouh</name>\n    </author>\n    <author>\n      <name>Hong Jin Kang</name>\n    </author>\n    <author>\n      <name>David Lo</name>\n    </author>\n  </entry>"
}