Extremely basic neural network not learning

Question 1

I've gone through some of the CNTK Python tutorials and I'm trying to write an extremely basic one layer neural network that can compute a logical AND. I have functioning code, but the network isn't learning - in fact, loss gets worse and worse with each minibatch trained.

import numpy as np
from cntk import Trainer
from cntk.learner import sgd
from cntk import ops
from cntk.utils import get_train_eval_criterion, get_train_loss
input_dimensions = 2
# Define the training set
input_data = np.array([
 [0, 0], 
 [0, 1],
 [1, 0],
 [1, 1]], dtype=np.float32)
# Each index matches with an index in input data
correct_answers = np.array([[0], [0], [0], [1]])
# Create the input layer
net_input = ops.input_variable(2, np.float32)
weights = ops.parameter(shape=(2, 1))
bias = ops.parameter(shape=(1))
network_output = ops.times(net_input, weights) + bias
# Set up training
expected_output = ops.input_variable((1), np.float32)
loss_function = ops.cross_entropy_with_softmax(network_output, expected_output)
eval_error = ops.classification_error(network_output, expected_output)
learner = sgd(network_output.parameters, lr=0.02)
trainer = Trainer(network_output, loss_function, eval_error, [learner])
minibatch_size = 4
num_samples_to_train = 1000
num_minibatches_to_train = int(num_samples_to_train/minibatch_size)
training_progress_output_freq = 20
def print_training_progress(trainer, mb, frequency, verbose=1):
 training_loss, eval_error = "NA", "NA"
 if mb % frequency == 0:
 training_loss = get_train_loss(trainer)
 eval_error = get_train_eval_criterion(trainer)
 if verbose:
 print("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}".format(
 mb, training_loss, eval_error))
 return mb, training_loss, eval_error
for i in range(0, num_minibatches_to_train):
 trainer.train_minibatch({net_input: input_data, expected_output: correct_answers})
 batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq, verbose=1)

Sample training output

Minibatch: 0, Loss: -164.9998, Error: 0.75
Minibatch: 20, Loss: -166.0998, Error: 0.75
Minibatch: 40, Loss: -167.1997, Error: 0.75
Minibatch: 60, Loss: -168.2997, Error: 0.75
Minibatch: 80, Loss: -169.3997, Error: 0.75
Minibatch: 100, Loss: -170.4996, Error: 0.75
Minibatch: 120, Loss: -171.5996, Error: 0.75
Minibatch: 140, Loss: -172.6996, Error: 0.75
Minibatch: 160, Loss: -173.7995, Error: 0.75
Minibatch: 180, Loss: -174.8995, Error: 0.75
Minibatch: 200, Loss: -175.9995, Error: 0.75
Minibatch: 220, Loss: -177.0994, Error: 0.75
Minibatch: 240, Loss: -178.1993, Error: 0.75

I'm not really sure what's going on here. Error is stuck at 0.75 which, I think, means the network is performing the same as it would by chance. I'm uncertain whether I've misunderstood a requirement of ANN architecture, or if I'm misusing the library.

Any help would be appreciated.

Question 2

You are trying to solve a binary classification problem with a softmax as your final layer. The softmax layer is not the right layer here, it is only effective for multiclass (classes>= 3) problems.

For binary classification problems you should do the following two modifications:

Add a sigmoid layer to your output (this will make your output look like a probability)
Use binary_cross_entropy as your criterion (you will have to be on at least this release)

Question 3

This sounds right, but I'm getting an error now. I've switched to binary cross entropy and added a sigmoid function around the output layer, but when I run it, I get the error

RuntimeError: __v2libuid__Logistic10__v2libname__Logistic10 Logistic operation cannot compute the gradient for its first input.

on the train_minibatch call.

Question 4

Yikes! Do you mind changing your correct_answers to be dtype=np.float32 and seeing if the problem persists?

Question 5

No luck - I still get the same message.

Question 6

Well that's a bug (this piece is brand new and apparently not exposing the existing functionality properly). For now here's a workaround: switch your loss to squared_error and keep the sigmoid. This seems to work for me (the loss goes down and has a sane value). The error rate will not go down because it currently only works for multiclass problems. Until we fix it you can write a simple expression though such as eval_error = ops.not_equal(ops.greater(network_output, 0.5), expected_output). Sorry for the inconvience. The answer I gave will probably be correct after the next release ;-)

Nikos Karampatziakis 2,05012 silver badges15 bronze badges · Accepted Answer · 2016-11-12 02:19:02Z

3

You are trying to solve a binary classification problem with a softmax as your final layer. The softmax layer is not the right layer here, it is only effective for multiclass (classes>= 3) problems.

For binary classification problems you should do the following two modifications:

Add a sigmoid layer to your output (this will make your output look like a probability)
Use binary_cross_entropy as your criterion (you will have to be on at least this release)

Share

Improve this answer

answered Nov 12, 2016 at 2:19

Nikos Karampatziakis's user avatar

Nikos Karampatziakis

2,05012 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Joel Hobson

Joel Hobson Over a year ago

This sounds right, but I'm getting an error now. I've switched to binary cross entropy and added a sigmoid function around the output layer, but when I run it, I get the error

RuntimeError: __v2libuid__Logistic10__v2libname__Logistic10 Logistic operation cannot compute the gradient for its first input.

on the train_minibatch call.

2016年11月12日T16:37:50.523Z+00:00

Nikos Karampatziakis

Nikos Karampatziakis Over a year ago

Yikes! Do you mind changing your correct_answers to be dtype=np.float32 and seeing if the problem persists?

2016年11月12日T19:20:42.977Z+00:00

Joel Hobson

Joel Hobson Over a year ago

No luck - I still get the same message.

2016年11月12日T20:06:56.93Z+00:00

Nikos Karampatziakis

Nikos Karampatziakis Over a year ago

Well that's a bug (this piece is brand new and apparently not exposing the existing functionality properly). For now here's a workaround: switch your loss to squared_error and keep the sigmoid. This seems to work for me (the loss goes down and has a sane value). The error rate will not go down because it currently only works for multiclass problems. Until we fix it you can write a simple expression though such as eval_error = ops.not_equal(ops.greater(network_output, 0.5), expected_output). Sorry for the inconvience. The answer I gave will probably be correct after the next release ;-)

2016年11月13日T02:02:22.557Z+00:00

CollectivesTM on Stack Overflow

Extremely basic neural network not learning

Sample training output

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

Sample training output

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related