Problem
Given a video of robot motion, predict future frames of the motion.
Dataset
- The authors assembled a new dataset of 59,000 robot interactions involving pushing motions.
- Human3.6m - video, depth and mocap. action include: sitting, purchasing, waiting...
Approach
- Use LSTMs to "remember" previous frames.
- Predict 10 transformations from previous frame (each approach represents the transformation differently).
- Predict a mask to determine which transformation is applied to which pixel.
The authors suggest 3 models based on this approach:
- Dynamic Neural Advection
- Convolutional Dynamic Neural Advection
- Spatial Transformer Predictors