Problem addressed:
A new type of activation function
Summary:
This paper propose a new activation function that computes a Lp norm from multiple projections on an input vector. The p value can be learned from training example, and can also be different for each hidden unit. The intuition is that 1) for different datasets there may exist different optimal p-values, so it make more sense to make p tunable; 2) allowing different unit take different p-values can potentially make the approximation of decision boundaries more efficient and more flexible. The empirical results support these two intuitions, and achieved comparable results on three datasets.
Novelty:
A generalization of pooling but applied through channels, when the data and weight vector dot product plus bias is constrained to non-negative case, the $L_\infty$ is equivalent to maxout unit.
Drawbacks:
Empirical performance is not very impressive, although evidence of supporting the intuition occurs.
Datasets:
MNIST, TFD, Pentomino
Resources:
http://arxiv.org/abs/1311.1780
Presenter:
Yingbo Zhou