ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Ren, Shaoqing and He, Kaiming and Girshick, Ross B. and Sun, Jian
Neural Information Processing Systems Conference - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Improve on Fast R-CNN and [SPPnet](https://arxiv.org/abs/1406.4729) by incorporating the region proposal network directly.

_Dataset:_ [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).


Both Fast R-CNN and SPPnet takes as input an image and several possibles objects (corresponding to regions of interest) and score each of them. They are thus two different entities:

1.  A region proposal network.
2.  A classification/detection network (Fast R-CNN/SSPnet).

## Architecture:

First image features are extracted using a state of the art ConvNet, then they are used for both Region proposal and actual detection/classification on those regions.

[![screen shot 2017-04-14 at 2 59 28 pm](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)

## Results:

By incorporating the region proposal network right after the feature ConvNet its computation cost becomes basically free which leads to an elegant solution (only one network) but more importantly greatly improve speed at test time.

arxiv.org
scholar.google.com

Deep Residual Learning for Image Recognition
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Solve the degradation problem where adding layers induces a higher training error.

_Dataset:_ [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html), [PASCAL](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).

## Inner-workings:

They argue that it is easier to learn the difference to the identity (the residual) than the actual mapping. Basically they start with the identity and learn the residual mapping.  
This allows for easier training and thus deeper network.

## Architecture:

They introduce two new building block for Residual Networks, depending on the input dimensionality:  
[![screen shot 2017-05-31 at 3 49 59 pm](https://cloud.githubusercontent.com/assets/17261080/26635061/d489dbe2-4618-11e7-911e-68772265ee9f.png)](https://cloud.githubusercontent.com/assets/17261080/26635061/d489dbe2-4618-11e7-911e-68772265ee9f.png)  
[![screen shot 2017-05-31 at 3 57 47 pm](https://cloud.githubusercontent.com/assets/17261080/26635420/f6f22af8-4619-11e7-9639-ed651f8b18bb.png)](https://cloud.githubusercontent.com/assets/17261080/26635420/f6f22af8-4619-11e7-9639-ed651f8b18bb.png)

That can then be chained to produce network such as:  
[![screen shot 2017-05-31 at 3 54 16 pm](https://cloud.githubusercontent.com/assets/17261080/26635258/7b64530c-4619-11e7-81c8-5d6be547da77.png)](https://cloud.githubusercontent.com/assets/17261080/26635258/7b64530c-4619-11e7-81c8-5d6be547da77.png)

## Results:

Won most 1st places, very impressive and adding layers do increase accuracy.

arxiv.org
arxiv-vanity.com
scholar.google.com

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford and Luke Metz and Soumith Chintala
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG, cs.CV
more

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Propose a more stable set of architectures for training GAN and show that they learn good representations of images for supervised learning and generative modeling.

_Dataset:_ [LSUN](http://www.yf.io/p/lsun) and [ImageNet 1k](www.image-net.org/).

## Architecture:

Below are the guidelines for making DCGANs.  
[![screen shot 2017-04-24 at 10 58 17 am](https://cloud.githubusercontent.com/assets/17261080/25329644/f3885f7c-28dc-11e7-8895-051124c8ff6c.png)](https://cloud.githubusercontent.com/assets/17261080/25329644/f3885f7c-28dc-11e7-8895-051124c8ff6c.png)

And here is a sample network:  
[![screen shot 2017-04-24 at 10 57 54 am](https://cloud.githubusercontent.com/assets/17261080/25329634/e9c14abc-28dc-11e7-8bed-068f7f7bc78d.png)](https://cloud.githubusercontent.com/assets/17261080/25329634/e9c14abc-28dc-11e7-8bed-068f7f7bc78d.png)

A tensorflow implementation can be found [here](https://github.com/carpedm20/DCGAN-tensorflow) along with an [online demo](https://carpedm20.github.io/faces/).

## Results:

Quite interesting especially concerning the structure learned in the Z-space and how this can be used for interpolation or object removal, see the example that is shown everywhere:  
[![screen shot 2017-04-24 at 11 20 03 am](https://cloud.githubusercontent.com/assets/17261080/25330458/080b6b4e-28e0-11e7-9ab6-ce58ef5b5562.png)](https://cloud.githubusercontent.com/assets/17261080/25330458/080b6b4e-28e0-11e7-9ab6-ce58ef5b5562.png)

Nonetheless the network is still generating small images (32x32).

www.aaai.org
sci-hub
scholar.google.com

Active Learning by Learning
Hsu, Wei-Ning and Lin, Hsuan-Tien
AAAI Conference on Artificial Intelligence - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

Automatically learn which Active Learning strategy to use.

_Code:_ [here](https://github.com/ntucllab/libact)

## Inner-workings:

They use the multi-armed bandit framework where each arm is an Active Learning strategy.

The core RL algorithm used is [EXP4.P](https://arxiv.org/abs/1002.4058) which is itself based on EXP4 (**Exp**onential weighting for **Exp**loration and **Exp**lotation with **Exp**erts). They make only slight adjustments to the reward function.

## Algorithm:

[![screen shot 2017-06-14 at 7 33 46 pm](https://user-images.githubusercontent.com/17261080/27146101-6d8392b4-5138-11e7-8e12-5617b258ddfa.png)](https://user-images.githubusercontent.com/17261080/27146101-6d8392b4-5138-11e7-8e12-5617b258ddfa.png)

## Results:

Beats all other techniques most of the time and make sure that in the long run we use the best strategy.

dx.doi.org
sci-hub
scholar.google.com

Fast R-CNN
Girshick, Ross B.
International Conference on Computer Vision - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Léo Paillier 7 years ago

Improve on [R-CNN](https://arxiv.org/abs/1311.2524) and [SPPnet](https://arxiv.org/abs/1406.4729) with easier and faster training.

Region-based Convolutional Neural Network (R-CNN), basically takes as input and image and several possibles objects (corresponding to Region of Interest) and score each of them.

## Architecture:

The feature map is computed for the whole image and then for each region of interest a new fixed-length feature vector is computed using max-pooling. From it two predictions are made for classification and bounding-box offsets.

[![screen shot 2017-04-14 at 12 46 38 pm](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)

## Results:

By sharing computation for RoIs of the same image and allowing simple SGD training it really improves performance training although at testing it's still not as fast as YOLO9000.