ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

Deep multi-scale video prediction beyond mean square error
Mathieu, Michaël and Couprie, Camille and LeCun, Yann
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 8 years ago

Predict frames of a video using 3 newly proposed and complementary methods:
1. Multi scale cnn
2. GAN
3. Image gradient difference loss


Datasets:
-----------
* UCF101
* Sports1M


GAN
------
Generator:
   * Input: several frames of video from dataset
   * output: next frame of video

Discriminator:
   * input: original and last frame
   * output: is the last frame from dataset or generated

Problem: Still blurry on edges on moving object.
Solution: Image gradient difference loss

arxiv.org
scholar.google.com

Unsupervised Learning for Physical Interaction through Video Prediction
Finn, Chelsea and Goodfellow, Ian J. and Levine, Sergey
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 8 years ago

Problem
----------
Given a video of robot motion, predict future frames of the motion.

Dataset
-----------
1. The authors assembled a new dataset of 59,000 robot interactions involving pushing motions.
2. Human3.6m - video, depth and mocap. action include: sitting, purchasing, waiting...

Approach
------------
* Use LSTMs to "remember" previous frames.
* Predict 10 transformations from previous frame (each approach represents the transformation differently).
* Predict a mask to determine which transformation is applied to which pixel.

The authors suggest 3 models based on this approach:
1. Dynamic Neural Advection
2. Convolutional Dynamic Neural Advection
3. Spatial Transformer Predictors

arxiv.org
scholar.google.com

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Zhang, Han and Xu, Tao and Li, Hongsheng and Zhang, Shaoting and Huang, Xiaolei and Wang, Xiaogang and Metaxas, Dimitris N.
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 8 years ago

Problem
------------
Text to image


Contributions
-----------------
* Images are more photo realistic and higher resolution then previous methods
* Stacked generative model


Approach
-------------
2 stage process:
1. Text-to-image: generates low resolution image with primitive shape and color.
2. low-to-hi-res: using low res image and text, generates hi res image. adding details and sharpening the edges.

https://pbs.twimg.com/media/Cziw6bfWgAAh3Yg.jpg


Datasets
--------------
* CUB - Birds
* Oxford-102 - Flowers


Results
--------
https://cdn-images-1.medium.com/max/1012/1*sIphVx4tqaXJxtnZNt3JWA.png


Criticism/ Questions
-------------------
* Is it possible the resulting images are replicas of images in the original dataset? To what extent does the model "hallucinate" new images?

dx.doi.org
sci-hub
scholar.google.com

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
Bogo, Federica and Kanazawa, Angjoo and Lassner, Christoph and Gehler, Peter V. and Romero, Javier and Black, Michael J.
European Conference on Computer Vision - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 8 years ago

Problem
----------
Given an unconstrained image, estimate:
1. 3d pose of human skeleton 
2. 3d body mesh


Contributions
-----------
1. full body mesh extraction from image
2. improvement of state of the art


Datasets
-------------
1. Leeds Sports
2. HumanEva
3. Human3.6M


Approach
----------------
Consider the problem both bottom-up and top-down.
1. Bottom-up: DeepCut cnn model to fit joints 2d positions onto the image.
2. top-down: A skinned multi-person linear model (SMPL) is fitted and projected onto 2d joint positions and image.

arxiv.org
arxiv-vanity.com
scholar.google.com

Skeleton-aided Articulated Motion Generation
Yichao Yan and Jingwei Xu and Bingbing Ni and Xiaokang Yang
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CV
more

[link] Summary by Kirill Pevzner 8 years ago

Problem
---------------
Video generation of human motion given:
1. Single appearance reference image
2. Skeleton motion sequence


Datasets
-----------
* KTH - grayscale human actions
* Human3.6M - color multiview human actions


Approach
---------------
Conditional GANs.
The authors try both Stack GAN and Siamese GAN.
The later provides better result.

https://preview.ibb.co/ighxQQ/Skeleton_aided_Articulated_Motion_Generation.png

Questions
----------------
Isn't using a full sequence of human skeleton motion considered more then a "hint"?