training loss during LSTM training is higher than validation loss

Question 1

I am training an LSTM to predict a time series. I have tried an encoder-decoder, without any dropout. I divided my data n 70% training and 30% validation. The total points in the training set and validation set are around 107 and 47 respectively. However, the validation loss is always greater than training loss. below is the code.

 seed(12346)
 tensorflow.random.set_seed(12346)
 Lrn_Rate=0.0005
 Momentum=0.8
 sgd=SGD(lr=Lrn_Rate, decay = 1e-6, momentum=Momentum, nesterov=True)
 adam=Adam(lr=Lrn_Rate, beta_1=0.9, beta_2=0.999, amsgrad=False)
 optimizernme=sgd
 optimizernmestr='sgd'
 callbacks= EarlyStopping(monitor='loss',patience=50,restore_best_weights=True)
 train_X1 = numpy.reshape(train_X1, (train_X1.shape[0], train_X1.shape[1], 1))
 test_X1 = numpy.reshape(test_X1, (test_X1.shape[0], test_X1.shape[1], 1))
 train_Y1 = train_Y1.reshape((train_Y1.shape[0], train_Y1.shape[1], 1))
 test_Y1= test_Y1.reshape((test_Y1.shape[0], test_Y1.shape[1], 1))
 model = Sequential()
 Hiddenunits=240
 DenseUnits=100
 n_features=1
 n_timesteps= look_back
 model.add(Bidirectional(LSTM(Hiddenunits, activation='relu', return_sequences=True,input_shape= 
 (n_timesteps, n_features))))#90,120 worked for us uk 
 model.add(Bidirectional(LSTM( Hiddenunits, activation='relu',return_sequences=False)))
 model.add(RepeatVector(1)) 
 model.add(Bidirectional(LSTM( Hiddenunits, activation='relu',return_sequences=True)))
 model.add(Bidirectional(LSTM(Hiddenunits, activation='relu', return_sequences=True)))
 model.add(TimeDistributed(Dense(DenseUnits, activation='relu'))) 
 model.add(TimeDistributed(Dense(1)))
 model.compile(loss='mean_squared_error', optimizer=optimizernme)
 
 
history=model.fit(train_X1,train_Y1,validation_data(test_X1,test_Y1),batch_size=batchsize,epochs=250,
 callbacks=[callbacks,TqdmCallback(verbose=0)],shuffle=True,verbose=0)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss'+ modelcaption)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

the training loss is coming greater than validation loss. training loss =0.02 and validation loss are approx 0.004 please the attached picture. I tried many things including dropouts and adding more hidden units but it did not solve the problem. Any comments suggestion is appreciated enter image description here

Question 2

You said the issue is that validation loss is greater than the training loss, but your graph shows that the validation loss is lower.

Question 3

thnx for pointing it . it was written by mistake.

Question 4

Better performance on a validation set compared to a training set is an indication that your model is overfitting to the training data. It sounds like you have tried creating the model architecture with and without a dropout layer (which combats overfitting) but the results are similar.

One possibility is data leakage, which is when information outside of the training data is used in the creation of the training data set. For example, if you normalized or standardized all of the data at once, instead of separately for the training and validation sets, then you are implicitly using the validation data in your training data, which would lead to the model overfitting.

Question 5

No, overfitting is exactly the opposite, good performance on the training set, bad performance on the test/validation set.

Question 6

If I am misunderstanding something, then I'll delete my answer. But if the validation loss is always greater than training loss as @Nhqazi says, then doesn't that mean there is good performance on the training set and comparatively worse performance on the test/validation set?

Question 7

Yes but what happens in this question is exactly the opposite.

Question 8

@Dr.snoppy why you think bad performance on test/validation as the validation loss is less than training.please see the attached image

Question 9

@Nhqazi My point is that this is not overfitting, you cannot claim overfitting because the condition for it does not happen (low training loss, high validation loss, higher than training loss by a large margin). In your plot, train and test curves would have to be swapped to be a valid overfitting scenario.

Derek O 20.2k4 gold badges32 silver badges49 bronze badges · Accepted Answer · 2020-08-09 23:26:01Z

0

Better performance on a validation set compared to a training set is an indication that your model is overfitting to the training data. It sounds like you have tried creating the model architecture with and without a dropout layer (which combats overfitting) but the results are similar.

One possibility is data leakage, which is when information outside of the training data is used in the creation of the training data set. For example, if you normalized or standardized all of the data at once, instead of separately for the training and validation sets, then you are implicitly using the validation data in your training data, which would lead to the model overfitting.

Share

Improve this answer

answered Aug 9, 2020 at 23:26

Derek O's user avatar

Derek O

20.2k4 gold badges32 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Dr. Snoopy

Dr. Snoopy Over a year ago

No, overfitting is exactly the opposite, good performance on the training set, bad performance on the test/validation set.

2020年08月09日T23:57:04.02Z+00:00

Derek O

Derek O Over a year ago

If I am misunderstanding something, then I'll delete my answer. But if the validation loss is always greater than training loss as @Nhqazi says, then doesn't that mean there is good performance on the training set and comparatively worse performance on the test/validation set?

2020年08月10日T02:22:36.447Z+00:00

Dr. Snoopy

Dr. Snoopy Over a year ago

Yes but what happens in this question is exactly the opposite.

2020年08月10日T08:09:51.033Z+00:00

Nhqazi

Nhqazi Over a year ago

@Dr.snoppy why you think bad performance on test/validation as the validation loss is less than training.please see the attached image

2020年08月10日T16:17:40.323Z+00:00

Dr. Snoopy

Dr. Snoopy Over a year ago

@Nhqazi My point is that this is not overfitting, you cannot claim overfitting because the condition for it does not happen (low training loss, high validation loss, higher than training loss by a large margin). In your plot, train and test curves would have to be swapped to be a valid overfitting scenario.

2020年08月11日T11:12:25.2Z+00:00

CollectivesTM on Stack Overflow

training loss during LSTM training is higher than validation loss

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related