1) The paper introduces Vector Quantization Variational Autoencoders (VQ-VAEs), which use discrete rather than continuous latent codes. This allows the prior to be learned from the data distribution rather than assuming a fixed prior.
2) VQ-VAEs train with a loss that enforces the encoder outputs to be close to embeddings in a learned codebook. This allows generating new samples by sampling the prior rather than relying only on reconstruction.
3) Experiments show VQ-VAEs can generate images, video, and speech that retains semantic content, while achieving likelihoods comparable to continuous latent variable models. The discrete latent space captures long-term dependencies without supervision.