0

I am using TensorFlow to implement object recognition. I followed this tutorial but use my own dataset. https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html#deep-mnist-for-experts

I used 212 positive samples and 120 negative samples to train. The test set contains 100 positive and 20 negative samples. The training precision is only 32.15%, but the test precision is 83.19%

I am wondering what makes the test precision is higher than training precision, is my data set not large enough? The data doesn't show any statistical meaning? Or it is a general thing, because I saw some people said the training precision doesn't make any sense. But why is that?

asked Jun 1, 2016 at 20:21
2
  • Based on your description, the difference in ratio of pos/neg-examples are clearly something which can result in such observations (train on ratio ~64%, test on ratio ~83%)! Are you using sample-weighs/class-weights during training (techniques to work with unbalanced data)? Commented Jun 1, 2016 at 21:00
  • No..I didn't use that techniques. I changed my ratio to be the same(In training: 212 pos, 80 neg. In test: 100 pos, 40 neg). The training precision becomes 0.27, and test precision is 0.71...should I enlarge my negative samples to make it twice as the number of positive samples both in training and test? Commented Jun 2, 2016 at 7:58

1 Answer 1

2

There are two problems here.

First, precision is not a very good measure of performance when classes are unbalanced.

Second, and more important you have bad ratio of negative to positive in test set. Your test set should come from the same process as the training one, but in your case negatives are ~40% of the training set but only ~17% of the test set. Not very suprisingly - classifier which simply answers "true" for every single input, will get 83% precision on your test set (as positives are 83% of the whole data).

Thus it is not a matter of number of test samples, it is a matter of incorrect construction of training/test datasets. I can also imagine, that there are more issues with this split, probably there is a completely different structure in train and in test.

answered Jun 1, 2016 at 21:19
Sign up to request clarification or add additional context in comments.

5 Comments

I changed my ratio to be the same this time (In training: 212 pos, 80 neg. In test: 100 pos, 40 neg). The training precision becomes 0.27, and test precision is 0.71...The classifier indeed simply answer "true " for every input. Should I enlarge my negative samples to make it twice as the number of positive samples both in training and test?
Stop using precision, use accuracy. You will see that you are simply overfitting (or underfitting)
Thank you very much and sorry for the late response. I changed it to accuracy and reconstruct my dataset (Training,pos:212,neg:534; Test,pos:100,neg:270), finding even in training process, the classifier recognize every sample as positive after several iterations. The image's size is 63*63=3969, so this is underfitting? Adding more layers would help? I am very new to machine learning, thank you!
do not start learning ML from neural networks. Its like learning physics from quantum one... start with basics, understand what is going on and leave NN for the later time. They are really tricky to train, and you won't learn much by just following someone's instructions instead of understanding what is going on.
Thank you for your reminder, but this is a project I have to complete,so..It seems I found the root reason for my problem of why the classifier simply detect each test sample as 'True'. Because here the positive samples are with certain patterns I want to detect but negative samples are just random background, which is kind of similar to pos samples. The algorithm can't find a good way to describe neg samples pattern..Due to the softmax regression assigns the value with the highest probability, so my last question is can I set a probability threshold to filter them?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.