Visual attribute transfer through deep image analogy on ShortScience.org

doi.acm.org
sci-hub
scholar.google.com

Visual attribute transfer through deep image analogy
Liao, Jing and Yao, Yuan and Yuan, Lu and Hua, Gang and Kang, Sing Bing
ACM Special Interest Group on computer GRAPHics - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Transfer visual attribute (color, tone, texture, and style, etc) between two semantically-meaningful images such as a picture and a sketch.

## Inner workings:

### Image analogy

An image analogy A:A′::B:B′ is a relation where:

*   B′ relates to B in the same way as A′ relates to A
*   A and A′ are in pixel-wise correspondences
*   B and B′ are in pixel-wise correspondences

In this paper only a source image A and an example image B′ are given, and both A′ and B represent latent images to be estimated.

[![screen shot 2017-05-18 at 10 43 48 am](https://cloud.githubusercontent.com/assets/17261080/26193907/f080e212-3bb6-11e7-9441-7b255e4219f5.png)](https://cloud.githubusercontent.com/assets/17261080/26193907/f080e212-3bb6-11e7-9441-7b255e4219f5.png)

### Dense correspondence

In order to find dense correspondences between two images they use features from previously trained CNN (VGG-19) and retrieve all the ReLU layers.

The mapping is divided in two sub-mappings that are easier to compute, first a visual attribute transformation and then a space transformation.

[![screen shot 2017-05-18 at 11 04 58 am](https://cloud.githubusercontent.com/assets/17261080/26194835/03ccd94a-3bba-11e7-93ca-9420d4d96162.png)](https://cloud.githubusercontent.com/assets/17261080/26194835/03ccd94a-3bba-11e7-93ca-9420d4d96162.png)

## Architecture:

The algorithm proceeds as follow:

1.  Compute features at each layer for the input image using a pre-trained CNN and initialize feature maps of latent images with coarsest layer.
2.  For said layer compute a forward and reverse nearest-neighbor field (NNF, basically an offset field).
3.  Use this NNF with the feature of the input current layer to compute the features of the latent images.
4.  Upsample the NNF and use it as the initialization for the NNF of the next layer.

[![screen shot 2017-05-18 at 11 14 33 am](https://cloud.githubusercontent.com/assets/17261080/26195178/35277e0e-3bbb-11e7-82ce-037466314640.png)](https://cloud.githubusercontent.com/assets/17261080/26195178/35277e0e-3bbb-11e7-82ce-037466314640.png)

## Results:

Impressive quality on all type of visual transfer but veryyyyy slow! (~3min on GPUs for one image).

[![screen shot 2017-05-18 at 11 36 47 am](https://cloud.githubusercontent.com/assets/17261080/26196151/54ef423c-3bbe-11e7-9433-b29be5091fae.png)](https://cloud.githubusercontent.com/assets/17261080/26196151/54ef423c-3bbe-11e7-9433-b29be5091fae.png)

Objective: Transfer visual attribute (color, tone, texture, and style, etc) between two semantically-meaningful images such as a picture and a sketch.

Inner workings:

Image analogy

An image analogy A:A′::B:B′ is a relation where:

B′ relates to B in the same way as A′ relates to A
A and A′ are in pixel-wise correspondences
B and B′ are in pixel-wise correspondences

In this paper only a source image A and an example image B′ are given, and both A′ and B represent latent images to be estimated.

Dense correspondence

In order to find dense correspondences between two images they use features from previously trained CNN (VGG-19) and retrieve all the ReLU layers.

The mapping is divided in two sub-mappings that are easier to compute, first a visual attribute transformation and then a space transformation.

Architecture:

The algorithm proceeds as follow:

Compute features at each layer for the input image using a pre-trained CNN and initialize feature maps of latent images with coarsest layer.
For said layer compute a forward and reverse nearest-neighbor field (NNF, basically an offset field).
Use this NNF with the feature of the input current layer to compute the features of the latent images.
Upsample the NNF and use it as the initialization for the NNF of the next layer.

Results:

Impressive quality on all type of visual transfer but veryyyyy slow! (~3min on GPUs for one image).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private