I'm working with Mozilla DeepSpeech in a Docker environment and have encountered an error during training. I'm seeking assistance to resolve this issue.
System Setup:
- Docker environment on a Windows 10 PC
- Using Ubuntu-20-04 in Docker
- NVIDIA GPU RTX 3090 with
--gpus allflag enabled - CUDA 10.0 Version 10.0.130 with cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.0
- Python 3.7.3
Steps Taken:
- Installed the official training image for deepspeech to use in docker (mozilla/deepspeech-train:v0.9.3) followed the exact steps mentioned in this site (https://mozilla.github.io/deepspeech-playbook/ENVIRONMENT.html#contents)
- Successfully ran the provided script (
./bin/run-ldc93s1.sh) in the Docker environment. - Created a custom training script for my dataset.
- Faced challenges with file paths, resolved by mounting the WSL 2 directory to the Docker container.
- Updated script paths to match the mounted directory.
My Script: ``` root@b11bd0a278ee:/DeepSpeech#
python -u DeepSpeech.py
--train_files /DeepSpeech/CSV/Training/training.csv
--dev_files /DeepSpeech/CSV/Validation/dev.csv
--test_files /DeepSpeech/CSV/Test/test.csv
--alphabet_config_path /DeepSpeech/data/alphabet.txt
--scorer_path /DeepSpeech/deepspeech-0.9.3-models.scorer
--checkpoint_dir /DeepSpeech/checkpoints_dir
--export_dir /DeepSpeech/CSV/exports_dir
--train_batch_size 1
--test_batch_size 1
--n_hidden 100
--epochs 200
--noshow_progressbar
Issue: When running my custom training script, I encounter the following error:
Traceback (most recent call last):
File "DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/DeepSpeech/training/deepspeech_training/train.py", line 949, in main
early_training_checks()
File "/DeepSpeech/training/deepspeech_training/train.py", line 934, in early_training_checks
FLAGS.scorer_path, Config.alphabet)
File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 36, in __init__
raise ValueError('Scorer initialization failed with error code 0x{:X}'.format(err))
ValueError: Scorer initialization failed with error code 0x2005
```
Tried looking for the path: root@b11bd0a278ee:/DeepSpeech# ls /DeepSpeech/deepspeech-0.9.3- models.scorer ls: cannot access '/DeepSpeech/deepspeech-0.9.3- models.scorer': No such file or directory Found the path: root@b11bd0a278ee:/DeepSpeech# find / -type f ( -name "alphabet.txt" -o -name ".csv" -o -name ".scorer" ) /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer /DeepSpeechData/DeepSpeech/data/alphabet.txt /DeepSpeechData/DeepSpeech/CSV/Test/test.csv /DeepSpeechData/DeepSpeech/CSV/Training/training.csv /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv /DeepSpeechData/DeepSpeech/CSV/Model Checkpoints/Model Checkpoints.csv
2nd try:
root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py \
> --train_files
/DeepSpeechData/DeepSpeech/CSV/Training/training.csv \
> --dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \
> --test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv \
habet_c> --alphabet_config_path
/DeepSpeechData/DeepSpeech/data/alphabet.txt \
> --scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-
models.scorer \
> --checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir \
> --export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir \
> --train_batch_size 1 \
> --test_batch_size 1 \
> --n_hidden 100 \
> --epochs 200 \
> --noshow_progressbar
I Loading best validating checkpoint from
/DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint:
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
Traceback (most recent call last):
File "DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/DeepSpeech/training/deepspeech_training/train.py", line 982,
in
run_script
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300,
in
run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251,
in
_run_main
sys.exit(main(argv))
File "/DeepSpeech/training/deepspeech_training/train.py", line 954,
in
main
train()
File "/DeepSpeech/training/deepspeech_training/train.py", line 529,
in
train
load_or_init_graph_for_training(session)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 98, in _load_or_init_impl
return _load_checkpoint(session, ckpt_path, allow_drop_layers,
allow_lr_init=allow_lr_init)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 71, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File "/usr/local/lib/python3.6/dist-
packages/tensorflow_core/python/util/deprecation.py", line 324, in
new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-
packages/tensorflow_core/python/ops/variables.py", line 1033, in load
session.run(self.initializer, {self.initializer.inputs[1]: value})
File "/usr/local/lib/python3.6/dist-
packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-
packages/tensorflow_core/python/client/session.py", line 1156, in
_run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (8192,) for Tensor
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Initial
izer/Const:0', which has shape '(400,)'
3rd Try:
root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py --
train_files /DeepSpeechData/DeepSpeech/CSV/Training/training.csv --
dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv --
test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv --
alphabet_config_path /DeepSpeechData/DeepSpeech/data/alphabet.txt --
scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer
--checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir --
export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir --
train_batch_size 1 --test_batch_size 1 --n_hidden 2048 --epochs
200 --noshow_progressbar
I Loading best validating checkpoint from
/DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint:
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint:
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
File "DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in
run_script
absl.app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in
run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in
_run_main
sys.exit(main(argv))
File "/DeepSpeech/training/deepspeech_training/train.py", line 954, in
main
train()
File "/DeepSpeech/training/deepspeech_training/train.py", line 529, in
train
load_or_init_graph_for_training(session)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 98, in _load_or_init_impl
return _load_checkpoint(session, ckpt_path, allow_drop_layers,
allow_lr_init=allow_lr_init)
File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py",
line 71, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File "/usr/local/lib/python3.6/dist-
packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line
915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
not found in checkpoint
4th Try:
root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py --train_files
/DeepSpeechData/DeepSpeech/CSV/Training/training.csv --dev_files
/DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv --test_files
/DeepSpeechData/DeepSpeech/CSV/Test/test.csv --alphabet_config_path
/DeepSpeechData/DeepSpeech/data/alphabet.txt --scorer_path
/DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer --
checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir --export_dir
/DeepSpeechData/DeepSpeech/CSV/exports_dir --train_batch_size 1 --
test_batch_size 1 --n_hidden 2048 --epochs 200 --
noshow_progressbar --use_cudnn_rnn
FATAL Flags parsing error: Unknown command line flag 'use_cudnn_rnn'
Pass --helpshort or --helpfull to see help on flags.
5th Try: added --train_cudnn flag but the output was nothing:
root@0123a1149260:/DeepSpeech# python -u DeepSpeech.py \ --train_files
/DeepSpeechData/DeepSpeech/CSV/Training/training.csv \ --dev_files
/DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \ --test_files
/DeepSpeechData/DeepSpeech/CSV/Test/test.csv \ alphabet_config_path
/DeepSpeechData/DeepSpeech/data/alphabet.txt \ --scorer_path
/DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer \ --
checkpoint_dir
/DeepSpeechData/DeepSpeech/checkpoints_dir \ --export_dir
/DeepSpeechData/DeepSpeech/CSV/exports_dir \ --train_batch_size 1 \ --
test_batch_size 1 \ --n_hidden 100 \ --epochs 200 \
--noshow_progressbar --train_cudnn
root@0123a1149260:/DeepSpeech#
Question:
- What could be causing this error in my setup?
- Are there specific considerations or best practices when setting up DeepSpeech training in a Docker environment that I might be missing?
Any insights or suggestions to resolve this error would be greatly appreciated.
pathto the scorer file. It is given as an absolute path. Can you confirm that path is correct by doing anlsusing that absolute path and showing the file is shown, e.g.ls /DeepSpeech/deepspeech-0.9.3-models.scorer?