Paper
When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators
Authors
Yifan Qin, Jiahao Zheng, Zheyu Yan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi
Abstract
Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von Neumann bottleneck. However, their reliance on emerging non-volatile memory devices introduces device-level non-idealities-such as write variability, conductance drift, and stochastic noise-that fundamentally challenge reliability, predictability, and safety, especially in safety-critical applications. This talk examines the reliability limits of CiM-based neural accelerators and presents a series of techniques that bridge device physics, architecture, and learning algorithms to address these challenges. We first demonstrate that even small device variations can lead to disproportionately large accuracy degradation and catastrophic failures in safety-critical inference workloads, revealing a critical gap between average-case evaluations and worst-case behavior. Building on this insight, we introduce SWIM, a selective write-verify mechanism that strategically applies verification only where it is most impactful, significantly improving reliability while maintaining CiM's efficiency advantages. Finally, we explore a learning-centric solution that improves realistic worst-case performance by training neural networks with right-censored Gaussian noise, aligning training assumptions with hardware-induced variability and enabling robust deployment without excessive hardware overhead. Together, these works highlight the necessity of cross-layer co-design for CiM accelerators and provide a principled path toward dependable, efficient neural inference on emerging memory technologies-paving the way for their adoption in safety- and reliability-critical systems.
Metadata
Related papers
Fractal universe and quantum gravity made simple
Fabio Briscese, Gianluca Calcagni • 2026-03-25
POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan
Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25
LensWalk: Agentic Video Understanding by Planning How You See in Videos
Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25
Orientation Reconstruction of Proteins using Coulomb Explosions
Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25
The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series
Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25
Raw Data (Debug)
{
"raw_xml": "<entry>\n <id>http://arxiv.org/abs/2603.03491v1</id>\n <title>When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators</title>\n <updated>2026-03-03T20:08:25Z</updated>\n <link href='https://arxiv.org/abs/2603.03491v1' rel='alternate' type='text/html'/>\n <link href='https://arxiv.org/pdf/2603.03491v1' rel='related' title='pdf' type='application/pdf'/>\n <summary>Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von Neumann bottleneck. However, their reliance on emerging non-volatile memory devices introduces device-level non-idealities-such as write variability, conductance drift, and stochastic noise-that fundamentally challenge reliability, predictability, and safety, especially in safety-critical applications. This talk examines the reliability limits of CiM-based neural accelerators and presents a series of techniques that bridge device physics, architecture, and learning algorithms to address these challenges. We first demonstrate that even small device variations can lead to disproportionately large accuracy degradation and catastrophic failures in safety-critical inference workloads, revealing a critical gap between average-case evaluations and worst-case behavior. Building on this insight, we introduce SWIM, a selective write-verify mechanism that strategically applies verification only where it is most impactful, significantly improving reliability while maintaining CiM's efficiency advantages. Finally, we explore a learning-centric solution that improves realistic worst-case performance by training neural networks with right-censored Gaussian noise, aligning training assumptions with hardware-induced variability and enabling robust deployment without excessive hardware overhead. Together, these works highlight the necessity of cross-layer co-design for CiM accelerators and provide a principled path toward dependable, efficient neural inference on emerging memory technologies-paving the way for their adoption in safety- and reliability-critical systems.</summary>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n <category scheme='http://arxiv.org/schemas/atom' term='cs.AR'/>\n <published>2026-03-03T20:08:25Z</published>\n <arxiv:comment>2026 International VLSI Symposium on Technology, Systems and Applications (VLSI TSA)</arxiv:comment>\n <arxiv:primary_category term='cs.LG'/>\n <author>\n <name>Yifan Qin</name>\n </author>\n <author>\n <name>Jiahao Zheng</name>\n </author>\n <author>\n <name>Zheyu Yan</name>\n </author>\n <author>\n <name>Wujie Wen</name>\n </author>\n <author>\n <name>Xiaobo Sharon Hu</name>\n </author>\n <author>\n <name>Yiyu Shi</name>\n </author>\n </entry>"
}