Variational Autoencoder

Disclaimer: This post assumes the reader to have knowledge of Autoencoders and basic ideas of Machine Learning.

The journey from Autoencoders to Variational AutoEncoders:

Autoencoder provides you another representation of your data.
As an example consider compressing the water into ice by lowering its temperature. Now, water would occupy lesser space than ice. However they're both representations of the same matter having properties that may be useful for different applications.





In the case of autoencoders, feed your data X directly into the encoder. It is then trained to update its parameters to yield a another(mostly low dimensional) representation of X, h(h need not always be a low dimensional representation, there exists overcomplete autoencoders where dimension of h is higher than the original data, But I digress).  Finally, this h is fed into the decoder of the autoencoder whose parameters are updated to get a reconstruction  of the original data X*.

This first step is called Abstraction, where given a data(high dimensional like - images of beaches) the encoder outputs its abstract representation on a lower dimensional space.
This is very useful when you have large amount of data and less storage space.
The second step in the autoencoder is Reconstruction. The decoder given the hidden representation from the previous step, reconstructs the original data that was fed into the encoder.

However, is it possible to do Generation with an autoencoder?
Once the training of the autoencoder is complete, the decoder can be used to generate samples from the data(X) space given samples from representation(h) space.
But do you see the problem with this approach?

h is still a high dimensional space - and there will be countless vectors present in this space to sample from as input to the decoder.
The question is, how many of these vectors will be able to produce a valid X sample?


How can we sample an h that will generate a valid image sample from the X space?
The answer is to sample an h such that it has a higher likelihood to generate a valid image from the data space.
This calls for a need to go for a probabilistic approach to Autoencoder to perform Generation.
Formally, we require to sample h from the distribution P(h|X) rather than choosing one in random.
Given a valid image, what is the probability distribution over the hidden representations? or which of the h's have higher probability given the X's.

Comments