ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

Mask R-CNN
He, Kaiming and Gkioxari, Georgia and Dollár, Piotr and Girshick, Ross B.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Image segmentation and pose estimation with an extension of Faster R-CNN.

_Dataset:_ [COCO](http://mscoco.org/) and [Cityscapes](https://www.cityscapes-dataset.com/).


## Inner workings:

The core operator of Faster R-CNN is the _RoIPool_ which performs coarse spatial quantization for feature extraction but introduce misalignment for pixel-pixel comparison which is what segmentation is. The paper introduce a new layer _RoIAlign_ that faithfully preserves exact spatial location.

One important point is that mask and class prediction are decoupled, the segmentation is proposed for each class without competing and the class predictor finally elects the winner.

## Architecture:

Based on Faster R-CNN but with an added mask subnetwork that computes a segmentation mask for each class.

Different feature extractors and proposers are tried, see two examples below:  
[![screen shot 2017-05-22 at 7 25 04 pm](https://cloud.githubusercontent.com/assets/17261080/26320765/659bfd6e-3f24-11e7-9184-393e83e9108d.png)](https://cloud.githubusercontent.com/assets/17261080/26320765/659bfd6e-3f24-11e7-9184-393e83e9108d.png)

## Results:

Runs at about 200ms per frame on a GPU for segmentation (2 days training on a single 8-GPU) and 5 fps for pose estimation.  
Very impressive segmentation and pose estimation:

[![screen shot 2017-05-22 at 7 26 57 pm 1](https://cloud.githubusercontent.com/assets/17261080/26320824/a9a0909c-3f24-11e7-8e06-b2f132aad2d7.png)](https://cloud.githubusercontent.com/assets/17261080/26320824/a9a0909c-3f24-11e7-8e06-b2f132aad2d7.png)

[![screen shot 2017-05-22 at 7 29 26 pm](https://cloud.githubusercontent.com/assets/17261080/26320929/08b71c4a-3f25-11e7-8eb5-959ceb7b6112.png)](https://cloud.githubusercontent.com/assets/17261080/26320929/08b71c4a-3f25-11e7-8eb5-959ceb7b6112.png)

arxiv.org
arxiv-vanity.com
scholar.google.com

BEGAN: Boundary Equilibrium Generative Adversarial Networks
David Berthelot and Thomas Schumm and Luke Metz
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Improve GANs convergence to more diverse and visually pleasing images at higher resolution using a novel equilibrium method between the discriminator and the generator that also simplifies training procedures.

_Dataset:_ [LFW](http://vis-www.cs.umass.edu/lfw/)

## Inner workings:

They try to match the distribution of the errors (assumed to be normally distributed) instead of matching the distribution of the samples directly. In order to do this they compute the Wasserstein distance between a pixel-wise autoencoder loss distributions of real and generated samples defined as follow:

1.  Autoencoder loss:

[![screen shot 2017-04-24 at 3 46 32 pm](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)](https://cloud.githubusercontent.com/assets/17261080/25340190/429f9788-2905-11e7-88dc-b44567b9cd34.png)

2.  Wasserstein distance for two normal distributions μ1 = N(m1, C1) and μ2 = N(m2, C2)

[![screen shot 2017-04-24 at 3 46 44 pm](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)](https://cloud.githubusercontent.com/assets/17261080/25340191/42b23474-2905-11e7-9810-58d5326bf886.png)

They also introduce an equilibrium concept to account for the situation when `G` and `D` are not well balanced and the discriminator `D` wins easily. This is controlled by what they call the diversity ratio that balances between auto-encoding real images and discriminating real from generated images. It is defined as follow:  
[![screen shot 2017-04-24 at 3 56 29 pm](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)](https://cloud.githubusercontent.com/assets/17261080/25340609/992c2188-2906-11e7-8c51-498bbd293119.png)

To maintain this balance they use a standard SGD but they introduce a variable `kt` initially 0 to control how much emphasis is put on the generator `G`. This removes the need to do `x` steps on `D` followed by `y` steps on `G` or to pretrained one of the two.  
[![screen shot 2017-04-24 at 3 59 57 pm](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)](https://cloud.githubusercontent.com/assets/17261080/25340859/4ee06476-2907-11e7-971f-90421449cb51.png)

Finally they derive a global convergence measure by using the equilibrium concept that can be used to determine when the network has reached its final state or if the model has collapsed:  
[![screen shot 2017-04-24 at 4 04 12 pm](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)](https://cloud.githubusercontent.com/assets/17261080/25340998/b8bf6ad6-2907-11e7-8afa-294cae32c6af.png)

## Architecture:

They tried to keep the architecture simple to really study the impact of their new equilibrium principle and loss. They don't use batch normalization, dropout, transpose convolutions or exponential growth for convolution filters.

[![screen shot 2017-04-24 at 4 09 29 pm](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)](https://cloud.githubusercontent.com/assets/17261080/25341219/6fb7be28-2908-11e7-8774-287c1b7d7684.png)

## Results:

They trained on images from 32x32 to 256x256, but at higher resolution images tend to lose sharpness. Nevertheless images are very very good!  
[![screen shot 2017-04-24 at 4 20 30 pm](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)](https://cloud.githubusercontent.com/assets/17261080/25341699/f99b0770-2909-11e7-84a0-3ac0436771e5.png)

arxiv.org
scholar.google.com

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Image-to-image translation to perform visual attribute transfer using unpaired images.

_Dataset:_ [Cityscapes](https://www.cityscapes-dataset.com/), [CMP Facade](http://cmp.felk.cvut.cz/%7Etylecr1/facade/), [UT Zappos50k](http://vision.cs.utexas.edu/projects/finegrained/utzap50k/) and [ImageNet](http://www.image-net.org/).

_Code:_ [CycleGAN](https://github.com/junyanz/CycleGAN)

## Inner-workings:

Basically two GANs for each domain with their respective Generator and Discriminator plus two additional losses (called consistency losses) to make sure that translating to the other domain then back yields an image that is still realistic.  
[![screen shot 2017-06-02 at 10 24 45 am](https://cloud.githubusercontent.com/assets/17261080/26717449/bcd8a9cc-477d-11e7-9137-fd277a0ec04f.png)](https://cloud.githubusercontent.com/assets/17261080/26717449/bcd8a9cc-477d-11e7-9137-fd277a0ec04f.png)

For the consistency los they use a pixel-wise L1 norm:  
[![screen shot 2017-06-02 at 10 31 22 am](https://cloud.githubusercontent.com/assets/17261080/26717733/bc088cdc-477e-11e7-96af-2defa06a1660.png)](https://cloud.githubusercontent.com/assets/17261080/26717733/bc088cdc-477e-11e7-96af-2defa06a1660.png)

## Architecture:

Based on [Perceptual losses for real-time style transfer and super-resolution](https://arxiv.org/pdf/1603.08155.pdf), code available [here](https://github.com/jcjohnson/fast-neural-style).  
Training seems to employ several tricks and then even use a batch of 1.

## Results:

Very impressive and the really key point is that you don't need paired images which makes this trainable on any domain with the same representation behind.  
[![screen shot 2017-06-02 at 10 26 29 am](https://cloud.githubusercontent.com/assets/17261080/26717502/f6d1fb7e-477d-11e7-8174-7bdd621cf1b6.png)](https://cloud.githubusercontent.com/assets/17261080/26717502/f6d1fb7e-477d-11e7-8174-7bdd621cf1b6.png)

arxiv.org
arxiv-vanity.com
scholar.google.com

Understanding deep learning requires rethinking generalization
Chiyuan Zhang and Samy Bengio and Moritz Hardt and Benjamin Recht and Oriol Vinyals
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Theoretical study of Deep Neural Network, their expressivity and regularizations.

## Results:

The key findings of the article are:

### A. Deep neural networks easily fit random labels.

Both when randomizing labels, replacing images with raw noise or all situations in-between.

1.  The effective capacity of neural networks is sufficient for memorizing the entire data set.
2.  Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels.
3.  Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

### B. Explicit regularization may improve generalization performance, but is neither necessary nor by itself sufficient for controlling generalization error.

By explicit regularization they mean batch normalisation, weight decay, dropout, data augmentation, etc.

### C. Generically large neural networks can express any labeling of the training data.

More formally, a very simple two-layer ReLU network with `p = 2n + d` parameters can express any labeling of any sample of size `n` in `d` dimensions.

### D. The optimization algorithm itself is implicitly regularizing the solution.

SGD acts as an implicit regularizer and properties are inherited by models that were trained using SGD.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Adversarial Discriminative Domain Adaptation
Tzeng, Eric and Hoffman, Judy and Saenko, Kate and Darrell, Trevor
Conference and Computer Vision and Pattern Recognition - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Define a framework for Adversarial Domain Adaptation and propose a new architecture as state-of-the-art.

 _Dataset:_ MNIST, USPS, SVHN and NYUD.   

## Inner workings:

Subsumes previous work in a generalized framework where designing a new method is now simplified to the space of making three design choices:

*   whether to use a generative or discriminative base model.
*   whether to tie or untie the weights.
*   which adversarial learning objective to use.

[![screen shot 2017-04-18 at 5 10 01 pm](https://cloud.githubusercontent.com/assets/17261080/25138167/15d5e644-245a-11e7-9fb8-636ce4111036.png)](https://cloud.githubusercontent.com/assets/17261080/25138167/15d5e644-245a-11e7-9fb8-636ce4111036.png)

## Architecture:

[![screen shot 2017-04-18 at 5 14 44 pm](https://cloud.githubusercontent.com/assets/17261080/25138526/07848bd0-245b-11e7-94c9-f6ae7ccea76f.png)](https://cloud.githubusercontent.com/assets/17261080/25138526/07848bd0-245b-11e7-94c9-f6ae7ccea76f.png)

## Results:

Interesting as the theoretical framework seem to converge with other papers and their architecture improves on previous papers performance even if it's not a huge improvement.