Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks
Caglar Gulcehre and Kyunghyun Cho and Razvan Pascanu and Yoshua Bengio
arXiv e-Print archive - 2013 via Local arXiv
Keywords: cs.NE, cs.LG, stat.ML
more

Summaries/Notes 1

[link] Summary by Cubs Reading Group 7 years ago

#### Problem addressed: 
A new type of activation function

#### Summary: 
This paper propose a new activation function that computes a Lp norm from multiple projections on an input vector. The p value can be learned from training example, and can also be different for each hidden unit. The intuition is that 1) for different datasets there may exist different optimal p-values, so it make more sense to make p tunable; 2) allowing different unit take different p-values can potentially make the approximation of decision boundaries more efficient and more flexible. The empirical results support these two intuitions, and achieved comparable results on three datasets.

#### Novelty:
A generalization of pooling but applied through channels, when the data and weight vector dot product plus bias is constrained to non-negative case, the $L_\infty$ is equivalent to maxout unit.

#### Drawbacks:
Empirical performance is not very impressive, although evidence of supporting the intuition occurs.

#### Datasets:
MNIST, TFD, Pentomino

#### Resources:
http://arxiv.org/abs/1311.1780

#### Presenter:
Yingbo Zhou

Problem addressed:

A new type of activation function

Summary:

This paper propose a new activation function that computes a Lp norm from multiple projections on an input vector. The p value can be learned from training example, and can also be different for each hidden unit. The intuition is that 1) for different datasets there may exist different optimal p-values, so it make more sense to make p tunable; 2) allowing different unit take different p-values can potentially make the approximation of decision boundaries more efficient and more flexible. The empirical results support these two intuitions, and achieved comparable results on three datasets.

Novelty:

A generalization of pooling but applied through channels, when the data and weight vector dot product plus bias is constrained to non-negative case, the $L_\infty$ is equivalent to maxout unit.

Drawbacks:

Empirical performance is not very impressive, although evidence of supporting the intuition occurs.

Datasets:

MNIST, TFD, Pentomino

Resources:

http://arxiv.org/abs/1311.1780

Presenter:

Yingbo Zhou

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private