Research

Paper

AI LLM February 19, 2026

ACOS: Arrays of Cheap Optical Switches

Authors

Daniel Amir, Ori Cohen, Jakob Krebs, Mark Silberstein

Abstract

Machine learning training places immense demands on cluster networks, motivating specialized architectures and co-design with parallelization strategies. Recent designs incorporating optical circuit switches (OCSes) are promising, offering improved cost, power efficiency, and long-term bandwidth scaling than packet switches. However, most existing approaches rely on costly high-radix OCSes and/or combine them with packet switches to achieve competitive performance at scale. Unfortunately, high-radix OCSes are both expensive and slow to reconfigure, limiting both scalability and performance. We propose Arrays of Cheap Optical Switches (ACOS), which bring application co-design directly to the structure of the reconfigurable fabric. Using low-radix OCSes as building blocks, ACOS supports the forms of reconfiguration needed in training clusters including topology selection, workload adaptation, and failure resilience. The cost of ACOS scales with supported topologies and adaptations rather than with port count, breaking past the scalability barriers of current specialized ML networks. We show through simulation that ACOS-based deployments match the performance of fully provisioned packet-switched networks when training state-of-the-art LLMs at scale, while delivering significant cost savings using existing off-the-shelf OCSes, with strong bandwidth scaling and higher cost savings in the future.

Metadata

arXiv ID: 2602.17449
Provider: ARXIV
Primary Category: cs.NI
Published: 2026-02-19
Fetched: 2026-02-21 18:51

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2602.17449v1</id>\n    <title>ACOS: Arrays of Cheap Optical Switches</title>\n    <updated>2026-02-19T15:14:16Z</updated>\n    <link href='https://arxiv.org/abs/2602.17449v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2602.17449v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Machine learning training places immense demands on cluster networks, motivating specialized architectures and co-design with parallelization strategies. Recent designs incorporating optical circuit switches (OCSes) are promising, offering improved cost, power efficiency, and long-term bandwidth scaling than packet switches. However, most existing approaches rely on costly high-radix OCSes and/or combine them with packet switches to achieve competitive performance at scale. Unfortunately, high-radix OCSes are both expensive and slow to reconfigure, limiting both scalability and performance.\n  We propose Arrays of Cheap Optical Switches (ACOS), which bring application co-design directly to the structure of the reconfigurable fabric. Using low-radix OCSes as building blocks, ACOS supports the forms of reconfiguration needed in training clusters including topology selection, workload adaptation, and failure resilience. The cost of ACOS scales with supported topologies and adaptations rather than with port count, breaking past the scalability barriers of current specialized ML networks. We show through simulation that ACOS-based deployments match the performance of fully provisioned packet-switched networks when training state-of-the-art LLMs at scale, while delivering significant cost savings using existing off-the-shelf OCSes, with strong bandwidth scaling and higher cost savings in the future.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.NI'/>\n    <published>2026-02-19T15:14:16Z</published>\n    <arxiv:comment>17 pages, 12 figures</arxiv:comment>\n    <arxiv:primary_category term='cs.NI'/>\n    <author>\n      <name>Daniel Amir</name>\n    </author>\n    <author>\n      <name>Ori Cohen</name>\n    </author>\n    <author>\n      <name>Jakob Krebs</name>\n    </author>\n    <author>\n      <name>Mark Silberstein</name>\n    </author>\n  </entry>"
}