I am trying to build a simple neural network with TensorFlow. The goal is to find the center of a rectangle in a 32 pixel x 32 pixel image. The rectangle is described by five vectors. The first vector is the position vector, the other four are direction vectors and make up the rectangle edges. One vector has two values (x and y).
The corresponding input for this image would be (2,5)(0,4)(6,0)(0,-4)(-6,0). The center (and therefore the desired output) is located at (5,7).
The code I came up with looks like the following:
import tensorflow as tf
import numpy as np
import Rectangle_Records
def init_weights(shape):
""" Weight initialization """
weights = tf.random_normal(shape, stddev=0.1)
return tf.Variable(weights)
def forwardprop(x, w_1, w_2):
""" Forward-propagation """
h = tf.nn.sigmoid(tf.matmul(x, w_1))
y_predict = tf.matmul(h, w_2)
return y_predict
def main():
x_size = 10
y_size = 2
h_1_size = 256
# Prepare input data
input_data = Rectangle_Records.DataSet()
x = tf.placeholder(tf.float32, shape = [None, x_size])
y_label = tf.placeholder(tf.float32, shape = [None, y_size])
# Weight initializations
w_1 = init_weights((x_size, h_1_size))
w_2 = init_weights((h_1_size, y_size))
# Forward propagation
y_predict = forwardprop(x, w_1, w_2)
# Backward propagation
cost = tf.reduce_mean(tf.square(y_predict - y_label))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
# Run
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(200):
batch = input_data.next_batch(10)
sess.run(updates, feed_dict = {x: batch[0], y_label: batch[1]})
sess.close()
if __name__ == "__main__":
main()
Sadly, the network won't learn properly. The result is too far off. For example, [[ 3.74561882 , 3.70766664]] when it should be arround [[ 3. , 7.]]. What am I doing wrong?
-
1are you tracking and plotting R^2 or MSE as you go?Mohammad Athar– Mohammad Athar2017年07月12日 15:59:48 +00:00Commented Jul 12, 2017 at 15:59
-
No, I wouldn't know how. I find this rather difficult with TensorFlow.Gizmo– Gizmo2017年07月12日 16:03:39 +00:00Commented Jul 12, 2017 at 16:03
-
1you can do it manually against a test data set that you've reserved for that purpose. I ask because 3.71 seems close to 3.7Mohammad Athar– Mohammad Athar2017年07月12日 16:13:20 +00:00Commented Jul 12, 2017 at 16:13
-
1It's 3.74561882 to 3. and 3.70766664 to 7.Gizmo– Gizmo2017年07月12日 16:15:58 +00:00Commented Jul 12, 2017 at 16:15
-
1Can you share the whole code as well as the dataset with me?Tarun Wadhwa– Tarun Wadhwa2017年07月12日 16:17:14 +00:00Commented Jul 12, 2017 at 16:17
4 Answers 4
The main problem is your whole training is done only for one epoch, thats not enough training. Try the following changes:
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for j in range(30):
input_data = Rectangle_Records.DataSet()
for i in range(200):
batch = input_data.next_batch(10)
loss, _ = sess.run([cost,updates], feed_dict = {x: batch[0], y_label: batch[1]})
pred = sess.run(y_predict, feed_dict={x: batch[0]})
print('Cost:', loss )
print('pred:', pred)
print('actual:', batch[1])
sess.close()
Change your optimizer to a momentum optimizer for faster convergence: tf.train.AdamOptimizer(0.01).minimize(cost)
Comments
You have forgotten to add bias.
def init_bias(shape):
biases = tf.random_normal(shape)
return tf.Variable(biases)
def forwardprop(x, w_1, w_2, b_1, b_2):
""" Forward-propagation """
h = tf.nn.sigmoid(tf.matmul(x, w_1) + b_1)
y_predict = tf.matmul(h, w_2) + b_2
return y_predict
Inside main change it to this
w_1 = init_weights((x_size, h_1_size))
w_2 = init_weights((h_1_size, y_size))
b_1 = init_bias((h_1_size,))
b_2 = init_bias((y_size,))
# Forward propagation
y_predict = forwardprop(x, w_1, w_2, b_1, b_2)
This will give you much better accuracy. You can then try adding more layers, try different activation functions etc. as mentioned above to improve it furthermore.
Comments
There are lots of ways to improve the performance of a neural net. try one or more of the following:
- add more layers, or more nodes per layer
- change your activation function (I've found relu to be quite effective)
- use an ensemble of NNs where each NN gets a vote weighted by its R^2 score
- bring in more training data
- perform a grid search to optimize parameters
Comments
The problem your network is to learn solving looks so easy that even single layer two-neuron perceptron should be able to solve. ReLU activation function could be the best as the problem is linear.
200 iterations isn't much. Try more iterations, like 1000 or more. Print the cost value every 100 iterations for example or gather data and plot at the end to see how was the learnig progressing.
import matplotlib.pyplot as plt
cost_history = np.arange(learning_steps, dtype=np.float)
...
for epoch in range(learning_steps):
...
cost_history[epoch] = sess.run(cost, feed_dict = {y: predict, y_:label})
plt.plot(cost_history, 'r', label='Cost fn')
plt.yscale('log')
plt.show()
If the line goes down it's fine. If it's very rough and doesn't descend, then learning speed might be too large. In yor case the learning speed is quite low and that's why you don't have fine results after as few as 200 iterations. Try larger value instead, like 0.1 or even more. The NN may still converge. And watch the learning curve.