Aim: generate realistic-looking synthetic data that can be used to train 3D Human Pose Estimation methods. Instead of rendering 3D models, they choose to combine parts of real images.
Input: RGB images with 2D annotations + a query 3D pose.
Output: A synthetic image, stitched from patches of the images, so that it looks like a person in the query 3D pose.
Steps:
- Project 3D pose on random camera to get 2D coords
- For each joint, find an image in the 2D annotated dataset whose annotation is locally similar
- Based on the similarities, decide for each pixel which image is most relevant.
- For each pixel, take the histogram of the chosen images in a neighborhood, and use this as blending factors to generate the result.
They also present a method that they trained on this synthetic dataset.