Artificial Dummies for Urban Dataset Augmentation
A. Vobecký
D. Hurych
M. Uřičář
P. Pérez
J. Sivic
Augmenting training data with our DummyNet approach. The inputs are the desired pose (keypoints), desired pedestrian appearance (specified by an image), and the target background scene. The output is a scene with the composited pedestrian. The DummyNet data generator (1) augments the training data (2), which leads to improved person detector performance (3). The data generator box (1) displays the inference-time setup.


Existing datasets for training pedestrian detectors in images suffer from limited appearance and pose variation. The most challenging scenarios are rarely included because they are too difficult to capture due to safety reasons, or they are very unlikely to happen. The strict safety requirements in assisted and autonomous driving applications call for an extra high detection accuracy also in these rare situations. Having the ability to generate people images in arbitrary poses, with arbitrary appearances and embedded in different background scenes with varying illumination and weather conditions, is a crucial component for the development and testing of such applications. The contributions of this paper are three-fold. First, we describe an augmentation method for controlled synthesis of urban scenes containing people, thus producing rare or never-seen situations. This is achieved with a data generator (called DummyNet) with disentangled control of the pose, the appearance, and the target background scene. Second, the proposed generator relies on novel network architecture and associated loss that takes into account the segmentation of the foreground person and its composition into the background scene. Finally, we demonstrate that the data generated by our DummyNet improve performance of several existing person detectors across various datasets as well as in challenging situations, such as night-time conditions, where only a limited amount of training data is available. In the setup with only day-time data available, we improve the night-time detector by 17% log-average miss rate over the detector trained with the day-time data only.

Explanatory videos

One-minute Introduction

Twenty-minute Talk


Diagram of the data augmentation procedure (inference). Green rectangles mark inputs to the generator. They include an automatic keypoint generator (PCA model from training skeletons keypoints). The appearance may be sampled from latent input or given by an existing person image.


Paper and Supplementary Material

A. Vobecký, D. Hurych, M. Uřičář, P. Pérez, J. Sivic.
Artificial Dummies for Urban Dataset Augmentation
In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021.
(hosted on ArXiv)

    title={Artificial Dummies for Urban Dataset Augmentation},
    author={Vobeck{\'y}, Anton{\'i}n and Hurych, David and U{\vr}i{\vc}{\'a}{\vr}, Michal and P{\'e}rez, Patrick and Sivic, Josef},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},


This work was supported by Valeo, the Grant Agency of the Czech Technical University in Prague, grant No. SGS18/205/OHK3/3T/37, and the European Regional Development Fund under the project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000468).