AutoEncoders for dimensionality reduction

Autoencoders for dimensionality reduction and summarization

MM

Autoencoders? PCA? How do they relate?

An autoencoder is a unsupervised learning neural network that extracts patterns from data and convert it to an efficient representation of the important information within the data. It outputs a representation that (hopefully!) looks very similar to the inputs, not close enough to where it's simply a copy of the input but an effective summary of the input. The first part of an autoencoder is composed of an encoder, the part of the neural network that converts the inputs into an internal representation of the data. The second part is the decoder, which is the part of the neural network that takes the internal representation and expands upon it to recreate the input. Like an accordion this NN first squeezes the data down to a pattern and then expands it to look like its input again.

PCA on the other hand is a method to reduce dimensionality of a data set by projecting down the feature space onto the highest variance feature vectors to hopefully encode the most important information from the dataset. PCA uses singular value decomposition to extract principal components by projecting the data on the eigenvectors found using SVD.

PCA is a linear projection of the data down to a smaller feature space. We may not capture import non-linear interations that could contain important information. Autoencoders is a step up from using regular PCA as it gives us more flexibility to learn many more patterns from the data and encode them in a smaller representation.

The code

I tweaked a simple autoencoder example, again found in this github repo that contains plenty more tensorflow goodies to play with. I have the full code and modules I've written to help this script run in my github. After running the autoencoder.py script (it will save the model for you found in models directory) you should initialize the variables used for the model:


initializing variables
n_inputs = X_train.shape[1]
n_hidden1 = 300
n_hidden2 = 100
n_hidden3 = n_hidden1
n_outputs = n_inputs
learning_rate = 0.01
l2_reg = 0.0001

X = tf.placeholder(tf.float32, shape=[None, n_inputs])
he_init = tf.contrib.layers.variance_scaling_initializer() #He initialization
l2_regularizer = tf.contrib.layers.l2_regularizer(l2_reg)
my_dense_layer = partial(tf.layers.dense, activation=tf.nn.elu, kernel_initializer = he_init, kernel_regularizer=l2_regularizer)

hidden1 = my_dense_layer(X, n_hidden1)
hidden2 = my_dense_layer(hidden1, n_hidden2)
hidden3 = my_dense_layer(hidden2, n_hidden3)
outputs = my_dense_layer(hidden3, n_outputs, activation=None)

reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) 
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([reconstruction_loss] + reg_losses)
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
Then you should restore the model like so:

#Restoring best autoencoder model and transforming train/test data
with tf.Session() as sess:
    sess.run(init)
    saver = tf.train.import_meta_graph("my_model_all_layers.ckpt.meta")
    saver.restore(sess, "my_model_all_layers.ckpt")
    encode_train = hidden2.eval(feed_dict={X: X_train})
    encode_test = hidden2.eval(feed_dict={X:X_test})
I'm using the second hidden layer to evaluate on my training set X_train so that I can transform the dataset with the internal representation the model learned earlier. This simple autoencoder only had three hidden layers so the middle layer will contain this internal representation of the data.