Loading [MathJax]/extensions/Safe.js
arxiv.org
scholar.google.com
Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers
Xiao, Yijun and Cho, Kyunghyun
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp


[link]
Summary by Denny Britz 9 years ago

TLDR; The authors use a CNN to extract features from character-based document representations. These features are then fed into a RNN to make a final prediction. This model, called ConvRec, has significantly fewer parameters (10-50x) then comparable convolutional models with more layers, but achieves similar to better performance on large-scale document classification tasks.

Key Points
  • Shortcomings of word-level approach: Each word is distinct despite common roots, cannot handle OOV words, many parameters.
  • Character-level Convnets need many layers to capture long-term dependencies due to the small sizes of the receptive fields.
  • Network architecture: 1. Embedding 8-dim 2. Convnet: 2-5 layers, 5 and 3-dim convolutions, 2-dim pooling, ReLU activation, 3. RNN LSTM with 128d hidden state. Dropout after conv and recurrent layer.
  • Training: 96 characters, Adadelta, batch size of 128, Examples are padded and masked to longest sequence in batch, gradient norm clipping of 5, early stopping
  • Models tends to outperform large CNN for smaller datasets. Maybe because of overfitting?
  • More convolutional layers or more filters doesn't impact model performance much
Notes/Questions
  • Would've been nice to graph the effect of #params on the model performance. How much do additional filters and conv layers help?
  • hat about training time? How does it compare?
more
Your comment:

Send Feedback
ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: