A neural network that learns to predict future video frames in a compressed representation space rather than raw pixels.