Research

Paper

AI LLM March 24, 2026

ConceptCoder: Improve Code Reasoning via Concept Learning

Authors

Md Mahbubur Rahman, Hengbo Tong, Wei Le

Abstract

Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce ConceptCoder, a fine-tuning method that simulates human code inspection: models are trained to first recognize code concepts and then perform reasoning on top of these concepts. In prior work, concepts are extracted by multimodal models or LLMs to explain vision and natural language models. Our work is the first to formulate concepts for code. We define code concepts as human-understandable semantic properties of code and train models to learn such concepts. Our evaluation shows that this approach significantly improves VD accuracy, from 66.32 to 72.15 F1 on average over 9 open-source LLMs. ConceptCoder achieves the best VD performance compared to state-of-the-art (SOTA) baselines, including fine-tuned SOTA open-source LLMs and prompted proprietary models such as GPT-5.2 and Claude-Opus-4.5. Our approach also scales: concepts defined from four types of vulnerabilities benefit general vulnerability datasets with 134 CWEs. We further demonstrate that concept-based fine-tuning generalizes beyond VD and improves branch prediction. We release our code and datasets at https://figshare.com/s/1decab8232c653b44f71.

Metadata

arXiv ID: 2603.23470
Provider: ARXIV
Primary Category: cs.SE
Published: 2026-03-24
Fetched: 2026-03-25 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.23470v1</id>\n    <title>ConceptCoder: Improve Code Reasoning via Concept Learning</title>\n    <updated>2026-03-24T17:37:32Z</updated>\n    <link href='https://arxiv.org/abs/2603.23470v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.23470v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Large language models (LLMs) have shown promising results for software engineering applications, but still struggle with code reasoning tasks such as vulnerability detection (VD). We introduce ConceptCoder, a fine-tuning method that simulates human code inspection: models are trained to first recognize code concepts and then perform reasoning on top of these concepts. In prior work, concepts are extracted by multimodal models or LLMs to explain vision and natural language models. Our work is the first to formulate concepts for code. We define code concepts as human-understandable semantic properties of code and train models to learn such concepts. Our evaluation shows that this approach significantly improves VD accuracy, from 66.32 to 72.15 F1 on average over 9 open-source LLMs. ConceptCoder achieves the best VD performance compared to state-of-the-art (SOTA) baselines, including fine-tuned SOTA open-source LLMs and prompted proprietary models such as GPT-5.2 and Claude-Opus-4.5. Our approach also scales: concepts defined from four types of vulnerabilities benefit general vulnerability datasets with 134 CWEs. We further demonstrate that concept-based fine-tuning generalizes beyond VD and improves branch prediction. We release our code and datasets at https://figshare.com/s/1decab8232c653b44f71.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SE'/>\n    <published>2026-03-24T17:37:32Z</published>\n    <arxiv:primary_category term='cs.SE'/>\n    <author>\n      <name>Md Mahbubur Rahman</name>\n    </author>\n    <author>\n      <name>Hengbo Tong</name>\n    </author>\n    <author>\n      <name>Wei Le</name>\n    </author>\n  </entry>"
}