Research

Paper

TESTING March 16, 2026

Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors

Authors

Yunuo Chen, Chuqin Zhou, Jiangchuan Li, Xiaoyue Ling, Bing He, Jincheng Dai, Li Song, Guo Lu

Abstract

We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image.To model this progression, we leverage a pretrained video diffusion model (VDM) as temporal priors: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task.In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior objective and subjective performance. On the CLIC2020 test set, our method achieves over \textbf{50\% bitrate savings} across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\times$5. Code will be released later.

Metadata

arXiv ID: 2603.15129

Provider: ARXIV

Primary Category: cs.CV

Published: 2026-03-16

Fetched: 2026-03-17 06:02

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.15129v1</id>\n    <title>Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors</title>\n    <updated>2026-03-16T11:24:26Z</updated>\n    <link href='https://arxiv.org/abs/2603.15129v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.15129v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermediate state during decoding: a compact anchor frame, which preserves the scene geometry and semantic layout while discarding high-frequency details. We then reinterpret generative decoding as a virtual temporal transition from this anchor to the final reconstructed image.To model this progression, we leverage a pretrained video diffusion model (VDM) as temporal priors: the anchor frame serves as the initial frame and the original image as the target frame, transforming the decoding process into a next-frame prediction task.In contrast to image diffusion-based ULB-IC models, our decoding proceeds from a visible, semantically faithful anchor, which improves both fidelity and realism for perceptual image compression. Extensive experiments demonstrate that our method achieves superior objective and subjective performance. On the CLIC2020 test set, our method achieves over \\textbf{50\\% bitrate savings} across LPIPS, DISTS, FID, and KID compared to DiffC, while also delivering a significant decoding speedup of up to $\\times$5. Code will be released later.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-16T11:24:26Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Yunuo Chen</name>\n    </author>\n    <author>\n      <name>Chuqin Zhou</name>\n    </author>\n    <author>\n      <name>Jiangchuan Li</name>\n    </author>\n    <author>\n      <name>Xiaoyue Ling</name>\n    </author>\n    <author>\n      <name>Bing He</name>\n    </author>\n    <author>\n      <name>Jincheng Dai</name>\n    </author>\n    <author>\n      <name>Li Song</name>\n    </author>\n    <author>\n      <name>Guo Lu</name>\n    </author>\n  </entry>"
}