Research

Paper

TESTING March 02, 2026

CoVAE: correlated multimodal generative modeling

Authors

Federico Caretti, Guido Sanguinetti

Abstract

Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.

Metadata

arXiv ID: 2603.01965

Provider: ARXIV

Primary Category: cs.LG

Published: 2026-03-02

Fetched: 2026-03-03 04:34

Related papers

Fractal universe and quantum gravity made simple

Fabio Briscese, Gianluca Calcagni • 2026-03-25

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kuma... • 2026-03-25

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan • 2026-03-25

Orientation Reconstruction of Proteins using Coulomb Explosions

Tomas André, Alfredo Bellisario, Nicusor Timneanu, Carl Caleman • 2026-03-25

The role of spatial context and multitask learning in the detection of organic and conventional farming systems based on Sentinel-2 time series

Jan Hemmerling, Marcel Schwieder, Philippe Rufin, Leon-Friedrich Thomas, Mire... • 2026-03-25

Raw Data (Debug)

{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.01965v1</id>\n    <title>CoVAE: correlated multimodal generative modeling</title>\n    <updated>2026-03-02T15:14:59Z</updated>\n    <link href='https://arxiv.org/abs/2603.01965v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.01965v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Multimodal Variational Autoencoders have emerged as a popular tool to extract effective representations from rich multimodal data. However, such models rely on fusion strategies in latent space that destroy the joint statistical structure of the multimodal data, with profound implications for generation and uncertainty quantification. In this work, we introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities. We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='q-bio.QM'/>\n    <published>2026-03-02T15:14:59Z</published>\n    <arxiv:primary_category term='cs.LG'/>\n    <author>\n      <name>Federico Caretti</name>\n    </author>\n    <author>\n      <name>Guido Sanguinetti</name>\n    </author>\n  </entry>"
}