Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation


Takehiko Ohkawa1,2  Takuma Yagi1   Atsushi Hashimoto2   Yoshitaka Ushiku2   Yoichi Sato1

1The University of Tokyo   2OMRON SINIC X  

IEEE Access 2021





Abstract

Hand segmentation is a crucial task in first-person vision. Since first-person images exhibit strong bias in appearance among different environments, adapting a pre-trained segmentation model to a new domain is required in hand segmentation. Here, we focus on appearance gaps for hand regions and backgrounds separately. We propose (i) foreground-aware image stylization and (ii) consensus pseudo-labeling for domain adaptation of hand segmentation. We stylize source images independently for the foreground and background using target images as style. To resolve the domain shift that the stylization has not addressed, we apply careful pseudo-labeling by taking a consensus between the models trained on the source and stylized source images. We validated our method on domain adaptation of hand segmentation from real and simulation images. Our method achieved state-of-the-art performance in both settings. We also demonstrated promising results in challenging multi-target domain adaptation and domain generalization settings.

Overview

Our semi-supervised domain adaptation method for hand segmentation consists of (i) foreground-aware image stylization and (ii) consensus pseudo-labeling. First, we stylize source images as the style of target images separately for the foreground and background, which alleviates appearance gaps of first-person images effectively. We then create a style-adapted dataset with target styles and source labels. Next, we prepare two networks for hand segmentation: a reference model R and a segmentation model M. Both networks are trained on the source dataset and the style-adapted source dataset, respectively. To adapt the segmentation model M to the target domain, we generate target pseudo-labels with a consensus scheme of the two networks. Given two predictions on the same target instance from both networks, we take an intersection of the two predictions and accept it as a pseudo-label when mIoU with the two predictions surpasses a certain threshold. The agreed pseudo-labels are used to update the segmentation model M.

Results

We validated our approach in real-to-real adaptation and sim-to-real adaptation settings. Our method outperformed comparison methods significantly.
Source: EGTEA, Ego2Hands, ObMan-Ego
Target: GTEA, EDSH, UTG, YHG
Evaluation: mIoU


© Takehiko Ohkawa 2021

< Home