0

I have a model training and I got this plot. It is over audio (about 70K of around 5-10s) and no augmentation is being done. I have tried the following to avoid overfitting:

  • Reduce complexity of the model by reducing number of GRU cells and hidden dimensions.
  • Add dropout in each layer.
  • I have tried with higher dataset.

What I am not sure is if my calculation of training loss and validation loss is correct. It is something like this. I am using drop_last=True and I am using the CTC loss criterion.

train_data_len = len(train_loader.dataset)
valid_data_len = len(valid_loader.dataset)
epoch_train_loss = 0
epoch_val_loss = 0
train_losses = []
valid_losses = []
 model.train()
 for e in range(n_epochs):
 t0 = time.time()
 #batch loop
 running_loss = 0.0
 for batch_idx, _data in enumerate(train_loader, 1):
 # Calculate output ...
 # bla bla
 loss = criterion(output, labels.float(), input_lengths, label_lengths)
 loss.backward()
 optimizer.step()
 scheduler.step()
 # loss stats
 running_loss += loss.item() * specs.size(0)
 
 t_t = time.time() - t0
 
 ###################### 
 # validate the model #
 ######################
 with torch.no_grad():
 model.eval() 
 tv = time.time()
 running_val_loss = 0.0
 for batch_idx_v, _data in enumerate(valid_loader, 1):
 #bla, bla
 val_loss = criterion(output, labels.float(), input_lengths, label_lengths)
 running_val_loss += val_loss.item() * specs.size(0)
 
 print("Epoch {}: Training took {:.2f} [s]\tValidation took: {:.2f} [s]\n".format(e+1, t_t, time.time() - tv))
 
 
 epoch_train_loss = running_loss / train_data_len
 epoch_val_loss = running_val_loss / valid_data_len
 train_losses.append(epoch_train_loss)
 valid_losses.append(epoch_val_loss)
 print('Epoch: {} Losses\tTraining Loss: {:.6f}\tValidation Loss: {:.6f}'.format(
 e+1, epoch_train_loss, epoch_val_loss))
 model.train()

enter image description here

asked Jun 29, 2020 at 16:19
8
  • check if you missed optimizer.zero_grad() in the training loop. Commented Jun 29, 2020 at 16:30
  • no, I didn't miss it, otherwise, the training loss wouldn't reduce I think in that case.....I omitted it to make it simpler. Commented Jun 29, 2020 at 16:31
  • which loss_criterion are you using? The way you are using train_data_len and valid_data_len is wrong, unless you are using drop_last =True in the DataLoader In this case you have to use len(batch_size) as train_data_len and valid_data_len Commented Jun 29, 2020 at 18:09
  • Yes, I am using drop_last = True, otherwise when the length didn't match the batch size, it would have given me error. criterion = nn.CTCLoss(blank=28, zero_infinity=True) Commented Jun 29, 2020 at 18:16
  • Okay, but the batch_size is not equal to len(train_loader.dataset) How big is your batch_size and print out len(train_loader.dataset) and give me that information too Commented Jun 29, 2020 at 18:31

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.