Research

Paper

AI LLM March 25, 2026

SilLang: Improving Gait Recognition with Silhouette Language Encoding

Authors

Ruiyi Zhan, Guozhen Peng, Canyu Chen, Jian Lei, Annan Li

Abstract

Gait silhouettes, which can be encoded into binary gait codes, are widely adopted to representing motion patterns of pedestrian. Recent approaches commonly leverage visual backbones to encode gait silhouettes, achieving successful performance. However, they primarily focus on continuous visual features, overlooking the discrete nature of binary silhouettes that inherently share a discrete encoding space with natural language. Large Language Models (LLMs) have demonstrated exceptional capability in extracting discriminative features from discrete sequences and modeling long-range dependencies, highlighting their potential to capture temporal motion patterns by identifying subtle variations. Motivated by these observations, we explore bridging binary gait silhouettes and natural language within a binary encoding space. However, the encoding spaces of text tokens and binary gait silhouettes remain misaligned, primarily due to differences in token frequency and density. To address this issue, we propose the Contour-Velocity Tokenizer, which encodes binary gait silhouettes while reshaping their distribution to better align with the text token space. We then establish a dual-branch framework termed Silhouette Language Model, which enhances visual silhouettes by integrating discrete linguistic embeddings derived from LLMs. Implemented on mainstream gait backbones, SilLang consistently improves state-of-the-art methods across SUSTech1K, GREW, and Gait3D.

Metadata

arXiv ID: 2603.23976
Provider: ARXIV
Primary Category: cs.CV
Published: 2026-03-25
Fetched: 2026-03-26 06:02

Related papers

Raw Data (Debug)
{
  "raw_xml": "<entry>\n    <id>http://arxiv.org/abs/2603.23976v1</id>\n    <title>SilLang: Improving Gait Recognition with Silhouette Language Encoding</title>\n    <updated>2026-03-25T06:15:29Z</updated>\n    <link href='https://arxiv.org/abs/2603.23976v1' rel='alternate' type='text/html'/>\n    <link href='https://arxiv.org/pdf/2603.23976v1' rel='related' title='pdf' type='application/pdf'/>\n    <summary>Gait silhouettes, which can be encoded into binary gait codes, are widely adopted to representing motion patterns of pedestrian. Recent approaches commonly leverage visual backbones to encode gait silhouettes, achieving successful performance. However, they primarily focus on continuous visual features, overlooking the discrete nature of binary silhouettes that inherently share a discrete encoding space with natural language. Large Language Models (LLMs) have demonstrated exceptional capability in extracting discriminative features from discrete sequences and modeling long-range dependencies, highlighting their potential to capture temporal motion patterns by identifying subtle variations. Motivated by these observations, we explore bridging binary gait silhouettes and natural language within a binary encoding space. However, the encoding spaces of text tokens and binary gait silhouettes remain misaligned, primarily due to differences in token frequency and density. To address this issue, we propose the Contour-Velocity Tokenizer, which encodes binary gait silhouettes while reshaping their distribution to better align with the text token space. We then establish a dual-branch framework termed Silhouette Language Model, which enhances visual silhouettes by integrating discrete linguistic embeddings derived from LLMs. Implemented on mainstream gait backbones, SilLang consistently improves state-of-the-art methods across SUSTech1K, GREW, and Gait3D.</summary>\n    <category scheme='http://arxiv.org/schemas/atom' term='cs.CV'/>\n    <published>2026-03-25T06:15:29Z</published>\n    <arxiv:primary_category term='cs.CV'/>\n    <author>\n      <name>Ruiyi Zhan</name>\n    </author>\n    <author>\n      <name>Guozhen Peng</name>\n    </author>\n    <author>\n      <name>Canyu Chen</name>\n    </author>\n    <author>\n      <name>Jian Lei</name>\n    </author>\n    <author>\n      <name>Annan Li</name>\n    </author>\n  </entry>"
}