2

I've gone through some of the CNTK Python tutorials and I'm trying to write an extremely basic one layer neural network that can compute a logical AND. I have functioning code, but the network isn't learning - in fact, loss gets worse and worse with each minibatch trained.

import numpy as np
from cntk import Trainer
from cntk.learner import sgd
from cntk import ops
from cntk.utils import get_train_eval_criterion, get_train_loss
input_dimensions = 2
# Define the training set
input_data = np.array([
 [0, 0], 
 [0, 1],
 [1, 0],
 [1, 1]], dtype=np.float32)
# Each index matches with an index in input data
correct_answers = np.array([[0], [0], [0], [1]])
# Create the input layer
net_input = ops.input_variable(2, np.float32)
weights = ops.parameter(shape=(2, 1))
bias = ops.parameter(shape=(1))
network_output = ops.times(net_input, weights) + bias
# Set up training
expected_output = ops.input_variable((1), np.float32)
loss_function = ops.cross_entropy_with_softmax(network_output, expected_output)
eval_error = ops.classification_error(network_output, expected_output)
learner = sgd(network_output.parameters, lr=0.02)
trainer = Trainer(network_output, loss_function, eval_error, [learner])
minibatch_size = 4
num_samples_to_train = 1000
num_minibatches_to_train = int(num_samples_to_train/minibatch_size)
training_progress_output_freq = 20
def print_training_progress(trainer, mb, frequency, verbose=1):
 training_loss, eval_error = "NA", "NA"
 if mb % frequency == 0:
 training_loss = get_train_loss(trainer)
 eval_error = get_train_eval_criterion(trainer)
 if verbose:
 print("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}".format(
 mb, training_loss, eval_error))
 return mb, training_loss, eval_error
for i in range(0, num_minibatches_to_train):
 trainer.train_minibatch({net_input: input_data, expected_output: correct_answers})
 batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq, verbose=1)

Sample training output

Minibatch: 0, Loss: -164.9998, Error: 0.75
Minibatch: 20, Loss: -166.0998, Error: 0.75
Minibatch: 40, Loss: -167.1997, Error: 0.75
Minibatch: 60, Loss: -168.2997, Error: 0.75
Minibatch: 80, Loss: -169.3997, Error: 0.75
Minibatch: 100, Loss: -170.4996, Error: 0.75
Minibatch: 120, Loss: -171.5996, Error: 0.75
Minibatch: 140, Loss: -172.6996, Error: 0.75
Minibatch: 160, Loss: -173.7995, Error: 0.75
Minibatch: 180, Loss: -174.8995, Error: 0.75
Minibatch: 200, Loss: -175.9995, Error: 0.75
Minibatch: 220, Loss: -177.0994, Error: 0.75
Minibatch: 240, Loss: -178.1993, Error: 0.75

I'm not really sure what's going on here. Error is stuck at 0.75 which, I think, means the network is performing the same as it would by chance. I'm uncertain whether I've misunderstood a requirement of ANN architecture, or if I'm misusing the library.

Any help would be appreciated.

asked Nov 11, 2016 at 21:35
0

1 Answer 1

3

You are trying to solve a binary classification problem with a softmax as your final layer. The softmax layer is not the right layer here, it is only effective for multiclass (classes>= 3) problems.

For binary classification problems you should do the following two modifications:

  • Add a sigmoid layer to your output (this will make your output look like a probability)
  • Use binary_cross_entropy as your criterion (you will have to be on at least this release)
answered Nov 12, 2016 at 2:19
Sign up to request clarification or add additional context in comments.

4 Comments

This sounds right, but I'm getting an error now. I've switched to binary cross entropy and added a sigmoid function around the output layer, but when I run it, I get the error RuntimeError: __v2libuid__Logistic10__v2libname__Logistic10 Logistic operation cannot compute the gradient for its first input. on the train_minibatch call.
Yikes! Do you mind changing your correct_answers to be dtype=np.float32 and seeing if the problem persists?
No luck - I still get the same message.
Well that's a bug (this piece is brand new and apparently not exposing the existing functionality properly). For now here's a workaround: switch your loss to squared_error and keep the sigmoid. This seems to work for me (the loss goes down and has a sane value). The error rate will not go down because it currently only works for multiclass problems. Until we fix it you can write a simple expression though such as eval_error = ops.not_equal(ops.greater(network_output, 0.5), expected_output). Sorry for the inconvience. The answer I gave will probably be correct after the next release ;-)

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.