HairGPT: A Unified Autoregressive Framework for 3D Realistic Hairstyle Synthesis

Abstract

Realistic hair generation remains difficult because hairstyles couple challenging global topology with high-frequency local detail. HairGPT addresses this problem with a strand-based generative pipeline that decomposes hair into a global density map and local strand geometry, then tokenizes both into a compact discrete representation. Generation proceeds over region-aware hierarchical sequences, allowing the model to predict structure and style progressively from text or image conditions. This design yields stronger control over topology, texture, and local edits, enabling high-fidelity synthesis of rare hairstyles as well as effective adaptation to stylized domains.

Key Contributions

1. Strand-as-language modeling for hair generation

A paradigm shift to strand-as-language modeling, reformulating 3D realistic hair generation as a dual-decoupled autoregressive problem and treating strands as the core generative units.

2. Compact strand tokenizer for complex geometry

A guide-strand tokenizer built with multi-head product quantization, compressing complex topology and high-frequency texture into a compact discrete codebook.

3. Region-aware hierarchy for stable generation

A hierarchical strand language with multi-stage training, organizing synthesis into region-aware sequences that improve structural coherence and make training more stable.

Pipeline Overview

The 3D hairstyle geometry is decomposed into a global density map, quantized by tokenizer Q_d, and local strand features. Specifically, strand roots are encoded into 2 UV tokens. The strand geometry is further decoupled into coarse shape and style residuals, which are discretized into 4 tokens via tokenizers Q_c and Q_s. These geometric codes are assembled into a hierarchical sequence and processed by a decoder-only Transformer. After being concatenated and conditioned on text and image embeddings, the model autoregressively predicts the target hair tokens and is supervised with a cross-entropy loss.

Representative Results

Image-guided hairstyle synthesis comparison. HairGPT can effectively generate extremely high-frequency coils and complex hair topology following the image, especially for buns and ponytails. We visualize both the raw guide strands directly output by our model and the dense strands produced via a simple interpolation algorithm; note that this upsampling process is employed solely for visualization and is not the primary focus of this work.

Text guided hairstyle synthesis comparison. Our HairGPT produces 3D hairstyles that adhere to fine-grained semantic instructions.

Cross-domain adaptation to stylized characters. Our framework adapts to 2D cartoon inputs via fine-tuning. It generates plausible 3D strand arrangements that faithfully respect the volume and flow of the original anime portraits.

Realistic Avatar Creation. Our model can collaboratively work with the 3D face synthesis model DreamFace to produce photorealistic avatars with unified visual aesthetics.

Editing. Our dual-decoupled representation and vision-language model naturally facilitates diverse editing applications, either image or text prompt.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant W2431046, Central Guided Local Science and Technology Foundation of China YDZX20253100001001, and by MoE Key Lab of Intelligent Perception and Human-Machine Collaboration (ShanghaiTech University), the Shanghai Frontiers Science Center of Human-centered Artificial Intelligence. This work was also supported by the HPC Platform of ShanghaiTech University.

The authors would also like to thank Heng'an Zhou from ShanghaiTech University for his assistance with the supplementary video, and Zijun Zhao from Deemos Technology Co., Ltd. for helping process part of the raw hairstyle data.

@misc{luo2026hairgpt, title={HairGPT: A Unified Autoregressive Framework for 3D Realistic Hairstyle Synthesis}, author={Luo, Haimin and Ouyang, Min and Xu, Lan and Yu, Jingyi}, year={2026}, note={Conditionally accepted by SIGGRAPH 2026 (journal track)} }