We present a universal prior model for 3D head avatars with explicit hair compositionality.
Existing approaches to build generalizable priors for 3D head avatars often adopt a holistic modeling approach, treating the face and hair as an inseparable entity. This overlooks the inherent compositionality of the human head, making it difficult for the model to naturally disentangle face and hair representations, especially when the dataset is limited. Furthermore, such holistic models struggle to support applications like 3D face and hairstyle swapping in a flexible and controllable manner.
To address these challenges, we introduce a prior model that explicitly accounts for the compositionality of face and hair, learning their latent spaces separately. A key enabler of this approach is our synthetic hairless data creation pipeline, which removes hair from studio-captured datasets using estimated hairless geometry and texture derived from a diffusion prior. By leveraging a paired dataset of hair and hairless captures, we train disentangled prior models for face and hair, incorporating compositionality as an inductive bias to facilitate effective separation.
Our model's inherent compositionality enables seamless transfer of face and hair components between avatars while preserving identity. Additionally, we demonstrate that our model can be fine-tuned in a few-shot manner using monocular captures to create high-fidelity, hair-compositional 3D head avatars for unseen subjects. These capabilities highlight the practical applicability of our approach in real-world scenarios, paving the way for flexible and expressive 3D avatar generation.
HairCUP enables hair transfer across avatars by leveraging the compositionality of the model.
Column 1: Driving expression (Face Identity) | Header: Hair identities | Columns 2-4: Avatars with transferred hairstyles
Column 1: Driving expression (Hair Identity) | Header: Face identities | Columns 2-4: Avatars with transferred hairstyles
To learn disentangled hair and face priors, we leverage a paired dataset of studio captures and synthetic hairless images.
HairCUP comprises ID-conditioned face/hair hypernetworks and a compositional avatar model. The hypernetworks generate multi-scale bias maps from UV-unwrapped mean albedo and geometry maps, which are added to each layer of the face/hair Gaussian decoders. The compositional model consists of hair motion and face expression encoders that produce motion and expression codes, which the decoders use to generate Gaussians. During training, face/hair data come from the same subject with multi-view supervision and segmentation-based disentanglement. At test time, face/hair ID and expression data can be mixed across subjects.
@article{kim2025haircup,
title={HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars},
author={Kim, Byungjun and Saito, Shunsuke and Nam, Giljoo and Simon, Tomas and Saragih, Jason and Joo, Hanbyul and Li, Junxuan},
journal={arXiv preprint arXiv:2507.19481},
year={2025}
}