This document proposes three methods to improve semantic segmentation using self-supervised depth estimation from unlabeled image sequences:
1. It transfers knowledge from features learned during self-supervised depth estimation to semantic segmentation through multi-task learning.
2. It introduces a new data augmentation technique called DepthMix which blends images and labels according to the geometry of the scene from depth estimation, generating fewer artifacts than prior methods.
3. It proposes an automatic data selection method to select the most useful unlabeled samples for annotation, driven by diversity and uncertainty criteria evaluated using depth estimation as a proxy task, avoiding the need for human annotation in active learning loops.