I am using TensorFlow to implement object recognition. I followed this tutorial but use my own dataset. https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html#deep-mnist-for-experts
I used 212 positive samples and 120 negative samples to train. The test set contains 100 positive and 20 negative samples. The training precision is only 32.15%, but the test precision is 83.19%
I am wondering what makes the test precision is higher than training precision, is my data set not large enough? The data doesn't show any statistical meaning? Or it is a general thing, because I saw some people said the training precision doesn't make any sense. But why is that?
-
Based on your description, the difference in ratio of pos/neg-examples are clearly something which can result in such observations (train on ratio ~64%, test on ratio ~83%)! Are you using sample-weighs/class-weights during training (techniques to work with unbalanced data)?sascha– sascha2016年06月01日 21:00:35 +00:00Commented Jun 1, 2016 at 21:00
-
No..I didn't use that techniques. I changed my ratio to be the same(In training: 212 pos, 80 neg. In test: 100 pos, 40 neg). The training precision becomes 0.27, and test precision is 0.71...should I enlarge my negative samples to make it twice as the number of positive samples both in training and test?Liang Li– Liang Li2016年06月02日 07:58:24 +00:00Commented Jun 2, 2016 at 7:58
1 Answer 1
There are two problems here.
First, precision is not a very good measure of performance when classes are unbalanced.
Second, and more important you have bad ratio of negative to positive in test set. Your test set should come from the same process as the training one, but in your case negatives are ~40% of the training set but only ~17% of the test set. Not very suprisingly - classifier which simply answers "true" for every single input, will get 83% precision on your test set (as positives are 83% of the whole data).
Thus it is not a matter of number of test samples, it is a matter of incorrect construction of training/test datasets. I can also imagine, that there are more issues with this split, probably there is a completely different structure in train and in test.
5 Comments
Explore related questions
See similar questions with these tags.