[link]
Summary by Léo Paillier 7 years ago
_Objective:_ Solve the degradation problem where adding layers induces a higher training error.
_Dataset:_ [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html), [PASCAL](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).
## Inner-workings:
They argue that it is easier to learn the difference to the identity (the residual) than the actual mapping. Basically they start with the identity and learn the residual mapping.
This allows for easier training and thus deeper network.
## Architecture:
They introduce two new building block for Residual Networks, depending on the input dimensionality:
[](https://cloud.githubusercontent.com/assets/17261080/26635061/d489dbe2-4618-11e7-911e-68772265ee9f.png)
[](https://cloud.githubusercontent.com/assets/17261080/26635420/f6f22af8-4619-11e7-9639-ed651f8b18bb.png)
That can then be chained to produce network such as:
[](https://cloud.githubusercontent.com/assets/17261080/26635258/7b64530c-4619-11e7-81c8-5d6be547da77.png)
## Results:
Won most 1st places, very impressive and adding layers do increase accuracy.

more
less