-1

I am training an LSTM to predict a time series. I have tried an encoder-decoder, without any dropout. I divided my data n 70% training and 30% validation. The total points in the training set and validation set are around 107 and 47 respectively. However, the validation loss is always greater than training loss. below is the code.

 seed(12346)
 tensorflow.random.set_seed(12346)
 Lrn_Rate=0.0005
 Momentum=0.8
 sgd=SGD(lr=Lrn_Rate, decay = 1e-6, momentum=Momentum, nesterov=True)
 adam=Adam(lr=Lrn_Rate, beta_1=0.9, beta_2=0.999, amsgrad=False)
 optimizernme=sgd
 optimizernmestr='sgd'
 callbacks= EarlyStopping(monitor='loss',patience=50,restore_best_weights=True)
 train_X1 = numpy.reshape(train_X1, (train_X1.shape[0], train_X1.shape[1], 1))
 test_X1 = numpy.reshape(test_X1, (test_X1.shape[0], test_X1.shape[1], 1))
 train_Y1 = train_Y1.reshape((train_Y1.shape[0], train_Y1.shape[1], 1))
 test_Y1= test_Y1.reshape((test_Y1.shape[0], test_Y1.shape[1], 1))
 model = Sequential()
 Hiddenunits=240
 DenseUnits=100
 n_features=1
 n_timesteps= look_back
 model.add(Bidirectional(LSTM(Hiddenunits, activation='relu', return_sequences=True,input_shape= 
 (n_timesteps, n_features))))#90,120 worked for us uk 
 model.add(Bidirectional(LSTM( Hiddenunits, activation='relu',return_sequences=False)))
 model.add(RepeatVector(1)) 
 model.add(Bidirectional(LSTM( Hiddenunits, activation='relu',return_sequences=True)))
 model.add(Bidirectional(LSTM(Hiddenunits, activation='relu', return_sequences=True)))
 model.add(TimeDistributed(Dense(DenseUnits, activation='relu'))) 
 model.add(TimeDistributed(Dense(1)))
 model.compile(loss='mean_squared_error', optimizer=optimizernme)
 
 
history=model.fit(train_X1,train_Y1,validation_data(test_X1,test_Y1),batch_size=batchsize,epochs=250,
 callbacks=[callbacks,TqdmCallback(verbose=0)],shuffle=True,verbose=0)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss'+ modelcaption)
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

the training loss is coming greater than validation loss. training loss =0.02 and validation loss are approx 0.004 please the attached picture. I tried many things including dropouts and adding more hidden units but it did not solve the problem. Any comments suggestion is appreciated enter image description here

asked Aug 9, 2020 at 23:14
2
  • You said the issue is that validation loss is greater than the training loss, but your graph shows that the validation loss is lower. Commented Aug 10, 2020 at 2:24
  • thnx for pointing it . it was written by mistake. Commented Aug 10, 2020 at 10:23

1 Answer 1

0

Better performance on a validation set compared to a training set is an indication that your model is overfitting to the training data. It sounds like you have tried creating the model architecture with and without a dropout layer (which combats overfitting) but the results are similar.

One possibility is data leakage, which is when information outside of the training data is used in the creation of the training data set. For example, if you normalized or standardized all of the data at once, instead of separately for the training and validation sets, then you are implicitly using the validation data in your training data, which would lead to the model overfitting.

answered Aug 9, 2020 at 23:26
Sign up to request clarification or add additional context in comments.

5 Comments

No, overfitting is exactly the opposite, good performance on the training set, bad performance on the test/validation set.
If I am misunderstanding something, then I'll delete my answer. But if the validation loss is always greater than training loss as @Nhqazi says, then doesn't that mean there is good performance on the training set and comparatively worse performance on the test/validation set?
Yes but what happens in this question is exactly the opposite.
@Dr.snoppy why you think bad performance on test/validation as the validation loss is less than training.please see the attached image
@Nhqazi My point is that this is not overfitting, you cannot claim overfitting because the condition for it does not happen (low training loss, high validation loss, higher than training loss by a large margin). In your plot, train and test curves would have to be swapped to be a valid overfitting scenario.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.