Abstract
Cats and dogs being humanity's favoured domestic pets occupy a large portion of the internet and of our
digital lives. However, augmented reality technology — while becoming pervasive for humans — has
so far mostly left out our beloved pets out of the picture due to limited enabling technology. While there
are well-established learning frameworks for human pose estimation, they mostly rely on large datasets of
hand-labelled images, such as Microsoft's COCO (Lin et al., 2014) or facebook's dense pose (Guler et al.,
2018). Labelling large datasets is time-consuming and expensive, and manually labelling 3D information is
difficult to do consistently. Our solution to these problem is to synthesize highly varied datasets of
animals, together with their corresponding 3D information such as pose. To generalize to various animals and
breeds, as well as to the real-world domain, we leverage domain randomization over traditional dimensions
(background, color variations and image transforms), but as well as with novel procedural appearance
variations in breed, age and species. We evaluate the validity of our approach on various benchmarks, and
produced several 3D graphical augmentations of real world cats and dogs using our fully synthetic approach.