Research

Paper

TESTING March 17, 2026

InCoder-32B: Code Foundation Model for Industrial Scenarios

Authors

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang, Chenghua Lin, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Bryan Dai, Weifeng Lv

Abstract

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

Metadata

arXiv ID: 2603.16790

Provider: ARXIV

Primary Category: cs.SE

Published: 2026-03-17

Fetched: 2026-03-18 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.16790v1</id>\n    <title>InCoder-32B: Code Foundation Model for Industrial Scenarios</title>\n    <updated>2026-03-17T17:01:35Z</updated>\n    <link href='https://arxiv.org/abs/2603.16790v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.16790v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.SE'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.AI'/>\n    <published>2026-03-17T17:01:35Z</published>\n    <arxiv:primary_category term='cs.SE'/>\n    <author>\n      <name>Jian Yang</name>\n    </author>\n    <author>\n      <name>Wei Zhang</name>\n    </author>\n    <author>\n      <name>Jiajun Wu</name>\n    </author>\n    <author>\n      <name>Junhang Cheng</name>\n    </author>\n    <author>\n      <name>Shawn Guo</name>\n    </author>\n    <author>\n      <name>Haowen Wang</name>\n    </author>\n    <author>\n      <name>Weicheng Gu</name>\n    </author>\n    <author>\n      <name>Yaxin Du</name>\n    </author>\n    <author>\n      <name>Joseph Li</name>\n    </author>\n    <author>\n      <name>Fanglin Xu</name>\n    </author>\n    <author>\n      <name>Yizhi Li</name>\n    </author>\n    <author>\n      <name>Lin Jing</name>\n    </author>\n    <author>\n      <name>Yuanbo Wang</name>\n    </author>\n    <author>\n      <name>Yuhan Gao</name>\n    </author>\n    <author>\n      <name>Ruihao Gong</name>\n    </author>\n    <author>\n      <name>Chuan Hao</name>\n    </author>\n    <author>\n      <name>Ran Tao</name>\n    </author>\n    <author>\n      <name>Aishan Liu</name>\n    </author>\n    <author>\n      <name>Tuney Zheng</name>\n    </author>\n    <author>\n      <name>Ganqu Cui</name>\n    </author>\n    <author>\n      <name>Zhoujun Li</name>\n    </author>\n    <author>\n      <name>Mingjie Tang</name>\n    </author>\n    <author>\n      <name>Chenghua Lin</name>\n    </author>\n    <author>\n      <name>Wayne Xin Zhao</name>\n    </author>\n    <author>\n      <name>Xianglong Liu</name>\n    </author>\n    <author>\n      <name>Ming Zhou</name>\n    </author>\n    <author>\n      <name>Bryan Dai</name>\n    </author>\n    <author>\n      <name>Weifeng Lv</name>\n    </author>\n  </entry>"
}