Deep Learning with Tensorflow

Experimenting with a basic Multilayer Perceptron model on the small Mercedes-Benz Kaggle contest


Playing with Tensorflow

Since 2006 when the term "Deep Learning" was coined thanks to Geoffrey Hinton's work on Deep Belief Networks, deep learning has had rapid advances. In 2013, DeepMind was able to teach a machine to play atari games using just pixels and not long after that, their model beat the world champion at Go. In late 2015, Google made deep learning more accessible when they launched TensorFlow. Two years later, I'm playing with the API myself and trying to train a very simple model, the multilayer perceptron model. Perceptrons go all the way back to the late 50's, which approximates a single neuron with n binary inputs. It uses a set of weights to compute a simple sum of its inputs to spit out an output like a neuron takes a signal and according to a threshold it fires a response. Perceptrons fell out of fashion when they were not able to create an XOR gate until someone discovered you can chain multiple perceptrons together to compute more complex arithmetic. Here is a tweaked awesometensorflow tutorial to train a multilayer perceptron regression model on that tiny Mercedes-Benz kaggle dataset I used in another post.

import tensorflow as tf
import numpy as np
from sklearn.metrics import r2_score 
from helper_funcs import group_list
from data_prep import DataPrep

#Loading and Transforming data
prep = DataPrep(dummy_pipe=True)
X_train, y_train, X_val, y_val, test, test_id = prep.load_data('data/train.csv', 'data/test.csv')
X_train, X_val, _ = prep.transform(X_train, X_val, test)
#Don't need these for this example
del test
del test_id

# Parameters
learning_rate = 0.1
training_epochs = 50000
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 8 # 1st layer number of features
n_hidden_2 = 8 # 2nd layer number of features
n_input = X_train.shape[1] # MNIST data input (img shape: 28*28)
n_classes = 1 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
pred = multilayer_perceptron(x, weights, biases)

cost = tf.reduce_mean(tf.pow(pred-y, 2))/(2*X_train.shape[0])
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.global_variables_initializer()

with tf.Session() as sess:

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(X_train.shape[0]/batch_size)
        #Initializing generators
        train_gen = group_list(X_train, batch_size)
        y_train_gen = group_list(y_train, batch_size)
        # Loop over all batches
            for i in range(total_batch):
                batch_x, batch_y = next(train_gen), next(y_train_gen).reshape(-1,1)
                # Run optimization op (backprop) and cost op (to get loss value)
                _, c =[optimizer, cost], feed_dict={x: batch_x, y:batch_y})
                # Compute average loss
                avg_cost += c / total_batch
        except StopIteration:
            # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    print(r2_score(y.eval(feed_dict={y: y_val.reshape(-1,1)}, session=sess), pred.eval(feed_dict={x: X_val}, session=sess)))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: X_val, y: y_val.reshape(-1,1)}))

You can find the full code in my github to implement it yourself. This model relatively alright (highest I got as of now is ~0.41 r2 score) since I'm using a very small dataset and should use other techniques to regularize the model harder but it is a fun start.