ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

arxiv.org
arxiv-vanity.com
scholar.google.com

Learning Latent Dynamics for Planning from Pixels
Danijar Hafner and Timothy Lillicrap and Ian Fischer and Ruben Villegas and David Ha and Honglak Lee and James Davidson
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.LG, cs.AI, stat.ML
more

[link] Summary by wassname 7 years ago

**Summary**: This paper presents three tricks that make model-based reinforcement more reliable when tested in tasks that require walking and balancing. The tricks are 1) are planning based on features, 2) using a recursive network that mixes probabilistic and deterministic information, and 3) looking forward multiple steps.

**Longer summary**

Imagine playing pool, armed with a tablet that can predict exactly where the ball will bounce, and the next bounce, and so on. That would be a huge advantage to someone learning pool, however small inaccuracies in the model could mislead you especially when thinking ahead to the 2nd and third bounce.

The tablet is analogous to the dynamics model in model-based reinforcement learning (RL). Model based RL promises to solve a lot of the open problems with RL, letting the agent learn with less experience, transfer well, dream, and many others advantages. Despite the promise, dynamics models are hard to get working: they often suffer from even small inaccuracies, and need to be redesigned for specific tasks.

Enter PlaNet, a clever name, and a net that plans well in range of environments. To increase the challenge the model must predict directly from pixels in fairly difficult tasks such as teaching a cheetah to run or balancing a ball in a cup.

How do they do this? Three main tricks.

- Planning in latest space: this means that the policy network doesn't need to look at the raw image, but looks at a summary of it as represented by a feature vector.
- Recurrent state space models: They found that probabilistic information helps describe the space of possibilities but makes it harder for their RNN based model to look back multiple steps. However mixing probabilistic information and deterministic information gives it the best of both worlds, and they have results that show a starting performance increase when both compared to just one.
- Latent overshooting: They train the model to look more than one step ahead, this helps prevent errors that build up over time

Overall this paper shows great results that tackle the shortfalls of model based RL. I hope the results remain when tested on different and more complex environments.

arxiv.org
arxiv-vanity.com
scholar.google.com

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Julian Schrittwieser and Ioannis Antonoglou and Thomas Hubert and Karen Simonyan and Laurent Sifre and Simon Schmitt and Arthur Guez and Edward Lockhart and Demis Hassabis and Thore Graepel and Timothy Lillicrap and David Silver
arXiv e-Print archive - 2019 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by CodyWild 6 years ago

The successes of deep learning on complex strategic games like Chess and Go have been largely driven by the ability to do tree search: that is, simulating sequences of actions in the environment, and then training policy and value functions to more speedily approximate the results that more exhaustive search reveals. However, this relies on having a good simulator that can predict the next state of the world, given your action. In some games, with straightforward rules, this is easy to explicitly code, but in many RL tasks like Atari, and in many contexts in the real world, having a good model of how the world responds to your actions is in fact a major part of the difficulty of RL. 

A response to this within the literature has been systems that learn models of the world from trajectories, and then use those models to do this kind of simulated planning. Historically these have been done by designing models that predict the next observation, given past observations and a passed-in action. This lets you "roll out" observations from actions in a way similar to how a simulator could. However, in high-dimensional observation spaces it takes a lot of model capacity to accurately model the full observation, and many parts of a given observation space will often be irrelevant. 

https://i.imgur.com/wKK8cnj.png

To address this difficulty, the MuZero architecture uses an approach from Value Prediction Networks, and learns an internal model that can predict transitions between abstract states (which don't need to match the actual observation state of the world) and then predict a policy, value, and next-step reward from the abstract state. So, we can plan in latent space, by simulating transitions from state to state through actions, and the training signal for that space representation and transition model comes from being able to accurately predict the reward, the empirical future value at a state (discovered through Monte Carlo rollouts) and the policy action that the rollout search would have taken at that point. If two observations are identical in terms of their implications for these quantities, the transition model doesn't need to differentiate them, making it more straightforward to learn. (Apologies for the long caption in above screenshot; I feel like it's quite useful to gain intuition, especially if you're less recently familiar with the MCTS deep learning architectures DeepMind typically uses) 

https://i.imgur.com/4nepG6o.png

The most impressive empirical aspect of this paper is the fact that it claims (from what I can tell credibly) to be able to perform as well as planning algorithms with access to a real simulator in games like Chess and Go, and as well as model-free models in games like Atari where MFRL has typically been the state of the art (because world models have been difficult to learn). I feel like I've read a lot recently that suggests to me that the distinction between model-free and model-based RL is becoming increasingly blurred, and I'm really curious to see how that trajectory evolves in future.

doi.org
sci-hub
scholar.google.com

Deep Forest: Towards An Alternative to Deep Neural Networks
Zhou, Zhi-Hua and Feng, Ji
International Joint Conference on Artificial Intelligence - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by mashayekhi 8 years ago

https://i.imgur.com/QxHktQC.png
The fundamental question that the paper is going to answer is weather deep learning can be realized with other prediction model other thahttps://i.imgur.com/Wh6xAbP.pngn neural networks. The authors proposed deep forest, the realization of deep learning using random forest(gcForest). The idea is simple and was inspired by representation learning in deep neural networks which mostly relies on the layer-by-layer processing of raw features.

Importance: Deep Neural Network (DNN) has several draw backs. It needs a lot of data to train. It has many hyper-parameters to tune. Moreover, not everyone has access to GPUs to build and train them. Training DNN is mostly like an art instead of a scientific/engineering task. Finally, theoretical analysis of DNN is extremely difficult. The aim of the paper is to propose a model to address these issues and at the same time to achieve performance competitive to deep neural networks.

Model: The proposed model consists of two parts. First part is a deep forest ensemble with a cascade structure similar to layer-by-layer architecture in DNN. Each level is an ensemble of random forest and to include diversity a combination of completely-random random forests and typical random forests are employed (number of trees in each forest is a hyper-parameter). The estimated class distribution, which is obtained by k-fold cv from forests, forms a class vector, which is then concatenated with the original feature vector to be input to the next level of cascade. Second part is a multi-grained scanning for representational learning where spatial and sequential relationships are captured using a sliding window scan (by applying various window sizes) on raw features, similar to the convolution and recurrent layers in DNN. Then, those features are passed to a completely random tree-forest and a typical random forest in order to generate transformed features. When transformed feature vectors are too long to be accommodated, feature sampling can be performed.

Benefits: gcForest has much fewer hyper-parameters than deep neural networks. The number of cascade levels can be adaptively determined such that the model complexity can be automatically set. If growing a new level does not improve the performance, the growth of the cascade terminates. Its performance is quite robust to hyper-parameter settings, such that in most cases and across different data from different domains, it is able to get excellent performance by using the default settings. gcForest achieves highly competitive performance to deep neural networks, whereas the training time cost of gcForest is smaller than that of DNN.

Experimental results: the authors compared the performance of gcForest and DNN by fixing an architecture for gcForest and testing various architectures for DNN, however assumed some fixed hyper-parameters for DNN such as activation and loss function, and dropout rate. They used MNIST (digit images recognition), ORL(face recognition), GTZAN(music classification ), sEMG (Hand Movement Recognition), IMDB (movie reviews sentiment analysis), and some low-dimensional datasets. The gcForest got the best results in these experiments and sometimes with significant differences.

My Opinions: The main goal of the paper is interesting; however one concern is the amount of efforts they put to find the best CNN network for the experiments as they also mentioned that finding a good configuration is an art instead of scientific work. For instance, they could use deep recurrent layers instead of MLP for the sentiment analysis dataset, which is typically a better option for this task. For the time complexity of the method, they only reported it for one experiment not all. More importantly, the result of CIFAR-10 in the supplementary materials shows a big gap between superior deep learning method result and gcForest result although the authors argued that gcForest can be tuned to get better result. gcForest was also compared to non-deep learning methods such as random forest and SVM which showed superior results. It was good to have the time complexity comparison for them as well. In my view, the paper is good as a starting point to answer to the original question, however, the proposed method and the experimental results are not convincing enough.

Github link: https://github.com/kingfengji/gcForest

arxiv.org
scholar.google.com

The Pitfalls of Simplicity Bias in Neural Networks
Shah, Harshay and Tamuly, Kaustav and Raghunathan, Aditi and Jain, Prateek and Netrapalli, Praneeth
arXiv e-Print archive - 2020 via Local Bibsonomy
Keywords: dblp

[link] Summary by CodyWild 5 years ago

This is an interesting paper that makes a fairly radical claim, and I haven't fully decided whether what they find is an interesting-but-rare corner case, or a more fundamental weakness in the design of neural nets. The claim is: neural nets prefer learning simple features, even if there exist complex features that are equally or more predictive, and even if that means learning a classifier with a smaller margin - where margin means "the distance between the decision boundary and the nearest-by data". A large-margin classifier is preferable in machine learning because the larger the margin, the larger the perturbation that would have to be made - by an adversary, or just by the random nature of the test set - to trigger misclassification. 

https://i.imgur.com/PJ6QB6h.png

This paper defines simplicity and complexity in a few ways. In their simulated datasets,  a feature is simpler when the decision boundary along that axis requires fewer piecewise linear segments to separate datapoints. (In the example above, note that having multiple alternating blocks still allows for linear separation, but with a higher piecewise linear requirement). In their datasets that concatenate MNIST and CIFAR images, the MNIST component represents the simple feature. 

The authors then test which models use which features by training a model with access to all of the features - simple and complex - and then testing examples where one set of features is sampled in alignment with the label, and one set of features is sampled randomly. If the features being sampled randomly are being used by the model, perturbing them like this should decrease the test performance of the model. For the simulated datasets, a fully connected network was used; for the MNIST/CIFAR concatenation, a variety of different image classification convolutional architectures were tried. 

The paper finds that neural networks will prefer to use the simpler feature to the complete exclusion of more complex features, even if the complex feature is slightly more predictive (can achieve 100 vs 95% separation). The authors go on to argue that what they call this Extreme Simplicity Bias, or Extreme SB, might actually explain some of the observed pathologies in neural nets, like relying on spurious features or being subject to adversarial perturbations. They claim that spurious features - like background color or texture - will tend to be simpler, and that their theory explains networks' reliance on them. Additionally, relying completely or predominantly on single features means that a perturbation along just that feature can substantially hurt performance, as opposed to a network using multiple features, all of which must be perturbed to hurt performance an equivalent amount. 

As I mentioned earlier, I feel like I'd need more evidence before I was strongly convinced by the claims made in this paper, but they are interestingly provocative. On a broader level, I think a lot of the difficulties in articulating why we expect simpler features to perform well come from an imprecision in thinking in language around the idea - we think of complex features as inherently brittle and high-dimensional, but this paper makes me wonder how well our existing definitions of simplicity actually match those intuitions.

papers.nips.cc
scholar.google.com

Learning to Compose Domain-Specific Transformations for Data Augmentation.
Alexander J. Ratner and Henry R. Ehrenberg and Zeshan Hussain and Jared Dunnmon and Christopher R�
Neural Information Processing Systems Conference - 2017 via Local dblp
Keywords:

[link] Summary by David Stutz 7 years ago

Ratner et al. Train an adversarial generative network to learn domain-specific sequences of transformations useful for data augmentation. In particular, as indicated in Figure 1, the generator learns to predict sequences of user-specified transformations and the classifier is intended to distinguish the original images from the transformed ones. For training, the authors use reinforcement learning, because the transformations are not necessarily differentiable – which makes usage of the proposed method very convenient.

https://i.imgur.com/hHQkhIk.png
Figure 1: High-level illustration of the proposed method for learning data augmentation.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).