Research

Paper

TESTING March 02, 2026

From Pixels to Patches: Pooling Strategies for Earth Embeddings

Authors

Isaac Corley, Caleb Robinson, Inbal Becker-Reshef, Juan M. Lavista Ferres

Abstract

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) performs best at 4x the embedding size. We further find that pooling effectiveness varies across embedding sources and that higher-dimensional embeddings benefit most from distributional statistics.

Metadata

arXiv ID: 2603.02080
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-02
Fetched: 2026-03-03 04:34

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.02080v1</id>\n    <title>From Pixels to Patches: Pooling Strategies for Earth Embeddings</title>\n    <updated>2026-03-02T17:03:37Z</updated>\n    <link href='https://arxiv.org/abs/2603.02080v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.02080v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can drop accuracy by more than 10% under spatial shift. To evaluate this effect, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. We benchmark 11 training-free and 2 parametric pooling methods under both random and geographically disjoint test splits. Our results show that richer pooling schemes reduce the geographic generalization gap by up to 40% relative to mean pooling and increases accuracy by up to 5% on spatial splits. We recommend Generalized Mean Pooling (GeM) as a drop-in replacement for mean pooling: it improves accuracy without increasing embedding dimensionality. For maximum accuracy, Stats pooling (concatenation of min/max/mean/std pooling) performs best at 4x the embedding size. We further find that pooling effectiveness varies across embedding sources and that higher-dimensional embeddings benefit most from distributional statistics.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.LG'/>\n    <published>2026-03-02T17:03:37Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Isaac Corley</name>\n    </author>\n    <author>\n      <name>Caleb Robinson</name>\n    </author>\n    <author>\n      <name>Inbal Becker-Reshef</name>\n    </author>\n    <author>\n      <name>Juan M. Lavista Ferres</name>\n    </author>\n  </entry>"
}