Research

Paper

AI LLM March 05, 2026

WaterSIC: information-theoretically (near) optimal linear layer quantization

Authors

Egor Lifar, Semyon Savkin, Or Ordentlich, Yury Polyanskiy

Abstract

This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT). It is shown that a popular GPTQ algorithm may have an arbitrarily large gap to the IT limit. To alleviate this problem, a novel algorithm, termed ''WaterSIC'', is proposed and is shown to be within a rate gap of 0.255 bits to the IT limit, uniformly over all possible covariance matrices of input activations. The key innovation of WaterSIC's is to allocate different quantization rates to different columns (in-features) of the weight matrix, mimicking the classical IT solution known as ''waterfilling''. Applying WaterSIC to the Llama and Qwen family of LLMs establishes new state-of-the-art performance for all quantization rates from 1 to 4 bits.

Metadata

arXiv ID: 2603.04956

Provider: ARXIV

Primary Category: cs.LG

Published: 2026-03-05

Fetched: 2026-03-06 14:20

Related papers

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jian... • 2026-03-30

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or • 2026-03-30

Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books

Minh-Thu Do, Quynh-Chau Le-Tran, Duc-Duy Nguyen-Mai, Thien-Trang Nguyen, Khan... • 2026-03-30

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Anuj Diwan, Eunsol Choi, David Harwath • 2026-03-30

RAD-AI: Rethinking Architecture Documentation for AI-Augmented Ecosystems

Oliver Aleksander Larsen, Mahyar T. Moghaddam • 2026-03-30

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.04956v1</id>\n    <title>WaterSIC: information-theoretically (near) optimal linear layer quantization</title>\n    <updated>2026-03-05T08:50:58Z</updated>\n    <link href='https://arxiv.org/abs/2603.04956v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.04956v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT). It is shown that a popular GPTQ algorithm may have an arbitrarily large gap to the IT limit. To alleviate this problem, a novel algorithm, termed ''WaterSIC'', is proposed and is shown to be within a rate gap of 0.255 bits to the IT limit, uniformly over all possible covariance matrices of input activations. The key innovation of WaterSIC's is to allocate different quantization rates to different columns (in-features) of the weight matrix, mimicking the classical IT solution known as ''waterfilling''. Applying WaterSIC to the Llama and Qwen family of LLMs establishes new state-of-the-art performance for all quantization rates from 1 to 4 bits.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.IT'/>\n    <published>2026-03-05T08:50:58Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Egor Lifar</name>\n    </author>\n    <author>\n      <name>Semyon Savkin</name>\n    </author>\n    <author>\n      <name>Or Ordentlich</name>\n    </author>\n    <author>\n      <name>Yury Polyanskiy</name>\n    </author>\n  </entry>"
}