ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

Learning to Compose Words into Sentences with Reinforcement Learning
Yogatama, Dani and Blunsom, Phil and Dyer, Chris and Grefenstette, Edward and Ling, Wang
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 7 years ago

The aim is to have the system discover a method for parsing that would benefit a downstream task.

https://i.imgur.com/q57gGCz.png

They construct a neural shift-reduce parser – as it’s moving through the sentence, it can either shift the word to the stack or reduce two words on top of the stack by combining them. A Tree-LSTM is used for composing the nodes recursively. The whole system is trained using reinforcement learning, based on an objective function of the downstream task. The model learns parse rules that are beneficial for that specific task, either without any prior knowledge of parsing or by initially training it to act as a regular parser.

scholar.google.com

Enriching Word Vectors with Subword Information
Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas
Transactions of the Association for Computational Linguistics - 2017 via Local Bibsonomy
Keywords: fasttext, pretrained, wikipedia, word2vec, vectors

[link] Summary by Marek Rei 7 years ago

They extend skip-grams for word embeddings to use character n-grams. Each word is represented as a bag of character n-grams, 3-6 characters long, plus the word itself. Each of these has their own embedding which gets optimised to predict the surrounding context words using skip-gram optimisation. They evaluate on word similarity and analogy tasks, in different languages, and show improvement on most benchmarks.

aclanthology.info
sci-hub
scholar.google.com

Modelling metaphor with attribute-based semantics
Clark, Stephen and Shutova, Ekaterina and Bulat, Luana
Association for Computational Linguistics EACL (2) - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 7 years ago

They propose using attribute-based vectors for detecting metaphorical word pairs.

https://i.imgur.com/dgDjSvu.png

Traditional embeddings (word2vec and count-based) are mapped to attribute vectors, using a supervised system trained on McRae norms. These vectors for a word pair are then given as input to an SVM classifier and trained to detect metaphorical (black humour) vs literal (black dress) word pairs. They show that using the attribute vectors gives higher F score over using the original vector space.

arxiv.org
arxiv-vanity.com
scholar.google.com

Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg and Volodymyr Mnih and Wojciech Marian Czarnecki and Tom Schaul and Joel Z Leibo and David Silver and Koray Kavukcuoglu
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG, cs.NE
more

[link] Summary by Marek Rei 7 years ago

They describe a version of reinforcement learning where the system also learns to solve some auxiliary tasks, which helps with the main objective.

https://i.imgur.com/fmTVxvr.png

In addition to normal Q-learning, which predicts the downstream reward, they have the system learning 1) a separate policy for maximally changing the pixels on the screen, 2) maximally activating units in a hidden layer, and 3) predicting the reward at the next step, using biased sampling. They show that this improves learning speed and performance on Atari games and Labyrinth (a Quake-like 3D game).

arxiv.org
arxiv-vanity.com
scholar.google.com

Understanding deep learning requires rethinking generalization
Chiyuan Zhang and Samy Bengio and Moritz Hardt and Benjamin Recht and Oriol Vinyals
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG
more

[link] Summary by Marek Rei 7 years ago

The authors investigate the generalisation properties of several well-known image recognition networks.

https://i.imgur.com/km0mrVs.png

They show that these networks are able to overfit to the training set with 100% accuracy even if the labels on the images are random, or if the pixels are randomly generated. Regularisation, such as weight decay and dropout, doesn’t stop overfitting as much as expected, still resulting in ~90% accuracy on random training data. They then argue that these models likely make use of massive memorization, in combination with learning low-complexity patterns, in order to perform well on these tasks.