Error Training Deepspeech inside Docker with Ubuntu 20.04 integration on Windows 10 (Nvidia Gpu Rtx 3090)

Question 1

I'm working with Mozilla DeepSpeech in a Docker environment and have encountered an error during training. I'm seeking assistance to resolve this issue.

System Setup:

Docker environment on a Windows 10 PC
Using Ubuntu-20-04 in Docker
NVIDIA GPU RTX 3090 with --gpus all flag enabled
CUDA 10.0 Version 10.0.130 with cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.0
Python 3.7.3

Steps Taken:

Installed the official training image for deepspeech to use in docker (mozilla/deepspeech-train:v0.9.3) followed the exact steps mentioned in this site (https://mozilla.github.io/deepspeech-playbook/ENVIRONMENT.html#contents)
Successfully ran the provided script (./bin/run-ldc93s1.sh) in the Docker environment.
Created a custom training script for my dataset.
Faced challenges with file paths, resolved by mounting the WSL 2 directory to the Docker container.
Updated script paths to match the mounted directory.

My Script: ``` root@b11bd0a278ee:/DeepSpeech#

python -u DeepSpeech.py 
--train_files /DeepSpeech/CSV/Training/training.csv 
--dev_files /DeepSpeech/CSV/Validation/dev.csv 
--test_files /DeepSpeech/CSV/Test/test.csv 
--alphabet_config_path /DeepSpeech/data/alphabet.txt 
--scorer_path /DeepSpeech/deepspeech-0.9.3-models.scorer 
--checkpoint_dir /DeepSpeech/checkpoints_dir 
--export_dir /DeepSpeech/CSV/exports_dir 
--train_batch_size 1 
--test_batch_size 1 
--n_hidden 100 
--epochs 200 
--noshow_progressbar

Issue: When running my custom training script, I encounter the following error:

Traceback (most recent call last):
 File "DeepSpeech.py", line 12, in <module>
 ds_train.run_script()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
 absl.app.run(main)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
 _run_main(main, args)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
 sys.exit(main(argv))
 File "/DeepSpeech/training/deepspeech_training/train.py", line 949, in main
 early_training_checks()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 934, in early_training_checks
 FLAGS.scorer_path, Config.alphabet)
 File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 36, in __init__
 raise ValueError('Scorer initialization failed with error code 0x{:X}'.format(err))
ValueError: Scorer initialization failed with error code 0x2005
```

Tried looking for the path: root@b11bd0a278ee:/DeepSpeech# ls /DeepSpeech/deepspeech-0.9.3- models.scorer ls: cannot access '/DeepSpeech/deepspeech-0.9.3- models.scorer': No such file or directory Found the path: root@b11bd0a278ee:/DeepSpeech# find / -type f ( -name "alphabet.txt" -o -name ".csv" -o -name ".scorer" ) /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer /DeepSpeechData/DeepSpeech/data/alphabet.txt /DeepSpeechData/DeepSpeech/CSV/Test/test.csv /DeepSpeechData/DeepSpeech/CSV/Training/training.csv /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv /DeepSpeechData/DeepSpeech/CSV/Model Checkpoints/Model Checkpoints.csv

2nd try:

 root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py \
 > --train_files 
 /DeepSpeechData/DeepSpeech/CSV/Training/training.csv \
 > --dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \
 > --test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv \
 habet_c> --alphabet_config_path 
 /DeepSpeechData/DeepSpeech/data/alphabet.txt \
 > --scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3- 
 models.scorer \
 > --checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir \
 > --export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir \
 > --train_batch_size 1 \
 > --test_batch_size 1 \
 > --n_hidden 100 \
 > --epochs 200 \
 > --noshow_progressbar
 I Loading best validating checkpoint from 
 /DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
 I Loading variable from checkpoint: beta1_power
 I Loading variable from checkpoint: beta2_power
 I Loading variable from checkpoint: 
 cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
 Traceback (most recent call last):
 File "DeepSpeech.py", line 12, in <module>
 ds_train.run_script()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 982, 
 in 
 run_script
 absl.app.run(main)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, 
 in 
 run
 _run_main(main, args)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, 
 in 
 _run_main
 sys.exit(main(argv))
 File "/DeepSpeech/training/deepspeech_training/train.py", line 954, 
 in 
 main
 train()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 529, 
 in 
 train
 load_or_init_graph_for_training(session)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 137, in load_or_init_graph_for_training
 _load_or_init_impl(session, methods, allow_drop_layers=True)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 98, in _load_or_init_impl
 return _load_checkpoint(session, ckpt_path, allow_drop_layers, 
 allow_lr_init=allow_lr_init)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 71, in _load_checkpoint
 v.load(ckpt.get_tensor(v.op.name), session=session)
 File "/usr/local/lib/python3.6/dist- 
 packages/tensorflow_core/python/util/deprecation.py", line 324, in 
 new_func
 return func(*args, **kwargs)
 File "/usr/local/lib/python3.6/dist- 
 packages/tensorflow_core/python/ops/variables.py", line 1033, in load
 session.run(self.initializer, {self.initializer.inputs[1]: value})
 File "/usr/local/lib/python3.6/dist- 
 packages/tensorflow_core/python/client/session.py", line 956, in run
 run_metadata_ptr)
 File "/usr/local/lib/python3.6/dist- 
 packages/tensorflow_core/python/client/session.py", line 1156, in 
 _run
 (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
 ValueError: Cannot feed value of shape (8192,) for Tensor 
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Initial 
 izer/Const:0', which has shape '(400,)'

3rd Try:

 root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py -- 
 train_files /DeepSpeechData/DeepSpeech/CSV/Training/training.csv -- 
 dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv -- 
 test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv -- 
 alphabet_config_path /DeepSpeechData/DeepSpeech/data/alphabet.txt -- 
 scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer 
 --checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir -- 
 export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir -- 
 train_batch_size 1 --test_batch_size 1 --n_hidden 2048 --epochs 
 200 --noshow_progressbar
 I Loading best validating checkpoint from 
 /DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
 I Loading variable from checkpoint: beta1_power
 I Loading variable from checkpoint: beta2_power
 I Loading variable from checkpoint: 
 cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
 I Loading variable from checkpoint: 
 cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
 Traceback (most recent call last):
 File "DeepSpeech.py", line 12, in <module>
 ds_train.run_script()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in 
 run_script
 absl.app.run(main)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in 
 run
 _run_main(main, args)
 File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in 
 _run_main
 sys.exit(main(argv))
 File "/DeepSpeech/training/deepspeech_training/train.py", line 954, in 
 main
 train()
 File "/DeepSpeech/training/deepspeech_training/train.py", line 529, in 
 train
 load_or_init_graph_for_training(session)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 137, in load_or_init_graph_for_training
 _load_or_init_impl(session, methods, allow_drop_layers=True)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 98, in _load_or_init_impl
 return _load_checkpoint(session, ckpt_path, allow_drop_layers, 
 allow_lr_init=allow_lr_init)
 File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
 line 71, in _load_checkpoint
 v.load(ckpt.get_tensor(v.op.name), session=session)
 File "/usr/local/lib/python3.6/dist- 
 packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 
 915, in get_tensor
 return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
 tensorflow.python.framework.errors_impl.NotFoundError: Key 
 cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam 
 not found in checkpoint

4th Try:

 root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py --train_files 
 /DeepSpeechData/DeepSpeech/CSV/Training/training.csv --dev_files 
 /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv --test_files 
 /DeepSpeechData/DeepSpeech/CSV/Test/test.csv --alphabet_config_path 
 /DeepSpeechData/DeepSpeech/data/alphabet.txt --scorer_path 
 /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer -- 
 checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir --export_dir 
 /DeepSpeechData/DeepSpeech/CSV/exports_dir --train_batch_size 1 -- 
 test_batch_size 1 --n_hidden 2048 --epochs 200 -- 
 noshow_progressbar --use_cudnn_rnn
 
 FATAL Flags parsing error: Unknown command line flag 'use_cudnn_rnn'
 Pass --helpshort or --helpfull to see help on flags.

5th Try: added --train_cudnn flag but the output was nothing:

 root@0123a1149260:/DeepSpeech# python -u DeepSpeech.py \ --train_files 
 /DeepSpeechData/DeepSpeech/CSV/Training/training.csv \ --dev_files 
 /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \ --test_files 
 /DeepSpeechData/DeepSpeech/CSV/Test/test.csv \ alphabet_config_path 
 /DeepSpeechData/DeepSpeech/data/alphabet.txt \ --scorer_path 
 /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer \ -- 
 checkpoint_dir 
 /DeepSpeechData/DeepSpeech/checkpoints_dir \ --export_dir 
 /DeepSpeechData/DeepSpeech/CSV/exports_dir \ --train_batch_size 1 \ -- 
 test_batch_size 1 \ --n_hidden 100 \ --epochs 200 \ 
 --noshow_progressbar --train_cudnn
 root@0123a1149260:/DeepSpeech#

Question:

What could be causing this error in my setup?
Are there specific considerations or best practices when setting up DeepSpeech training in a Docker environment that I might be missing?

Any insights or suggestions to resolve this error would be greatly appreciated.

Question 2

The error is occurring when DeepSpeech tries to load the scorer, which is a specifically formatted file that helps "score" how some words should be predicted. Are you using a custom scorer file? How did you create it? What format is it in?

Question 3

I am using the standard scorer file provided by DeepSpeech. It is not a custom scorer file. I downloaded it directly from the DeepSpeech 0.9.3 release on GitHub. The scorer file is named deepspeech-0.9.3-models.scorer. github.com/mozilla/DeepSpeech/releases/tag/v0.9.3 As for the format, since it's directly from the official release, it should be in the format expected by DeepSpeech 0.9.3. I haven't made any modifications to it.

Question 4

OK, looking at this related question on the Mozilla Discourse, can you please double check the path to the scorer file. It is given as an absolute path. Can you confirm that path is correct by doing an ls using that absolute path and showing the file is shown, e.g. ls /DeepSpeech/deepspeech-0.9.3-models.scorer?

Question 5

root@b11bd0a278ee:/DeepSpeech# ls /DeepSpeech/deepspeech-0.9.3-models.scorer ls: cannot access '/DeepSpeech/deepspeech-0.9.3-models.scorer': No such file or directory

Question 6

So i ran this command to check the paths and here it displayed them root@b11bd0a278ee:/DeepSpeech# find / -type f ( -name "alphabet.txt" -o -name ".csv" -o -name ".scorer" ) /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer /DeepSpeechData/DeepSpeech/data/alphabet.txt /DeepSpeechData/DeepSpeech/CSV/Test/test.csv /DeepSpeechData/DeepSpeech/CSV/Training/training.csv /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv /DeepSpeechData/DeepSpeech/CSV/Model Checkpoints/Model Checkpoints.csv

CollectivesTM on Stack Overflow

Error Training Deepspeech inside Docker with Ubuntu 20.04 integration on Windows 10 (Nvidia Gpu Rtx 3090)

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions