How can I fix KeyError: ' ' problem?

Hi, I am using TAO OCR of nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt docker for training to recognize a line in documents.
Here is my config file: configs.txt (702 Bytes)
Char_list: char_list.txt (679 Bytes)
An example in my dataset:

178001.jpg	Tôi không thể làm điều đó!
124430.jpg	Tôi có một tấm thẻ điểm ghi trên mỗi căn nhà.

I have a problem with training process. Although I add the space character into char_list.txt, I think the preprocessing process delete the space character when encoding character. How can I overcome this problem? I am looking forward to hearing from you :

Epoch 0:   0%|                                                                                                     | 0/77026 [00:00<?, ?it/s]
' '
Error executing job with overrides: ['results_dir=/workspace/results_32x384']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py>", line 3, in <module>
  File "<frozen cv.ocrnet.scripts.train>", line 136, in <module>
  File "<frozen core.hydra.hydra_runner>", line 107, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.ocrnet.scripts.train>", line 132, in main
  File "<frozen cv.ocrnet.scripts.train>", line 121, in main
  File "<frozen cv.ocrnet.scripts.train>", line 102, in run_experiment
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
    self._run_train()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 247, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 357, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1342, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/module.py", line 1661, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 169, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
    return self.precision_plugin.optimizer_step(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 121, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/optimizer.py", line 198, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/optimizer.py", line 29, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/adadelta.py", line 133, in step
    loss = closure()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 107, in _wrap_closure
    closure_result = closure()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
    step_output = self._step_fn()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
    return self.model.training_step(*args, **kwargs)
  File "<frozen cv.ocrnet.model.pl_ocrnet>", line 162, in training_step
  File "<frozen cv.ocrnet.utils.utils>", line 67, in encode
  File "<frozen cv.ocrnet.utils.utils>", line 65, in <listcomp>
KeyError: ' '

If there is space in your original groundtruth, please add “[space]” .
For example,
B JT 304 E → change groudtruth to B[space]JT[space]304[space]E, then use Attention decode way to retrain the model. More, character.list should contain the [space]. For example, A,B,C,D,xxx,a,b,c,d,xxx,[space]

Thank you for replying. In char_list.txt file includes some special characters like [,],-,… I wonder if each character [s, p, a, c, e, ] of [space] is impacted by these special characters when encoding each character in a sequence like B[space]JT[space]304[space]E.
An example:

char_list = ["[space]", "a", 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 's', 'p', '[', ']']
seq = "abc[space]edf"
t = [char_list.index(c) for c in seq]
print(t)

Results: [1, 2, 3, 18, 16, 17, 1, 3, 5, 19, 5, 4, 6]

No, it will not impact.
BTW, there is a similar topic in Using OCRNet from Python script - #65 by foreverneilyoung

Thank you for supporting. I have a different question about text recognition. Model trained is so good for custom printed text data, but with handwritten text data, model doesn’t recognize well. Which models I can build for handwritten text recognition tasks in documents? Could you give me some suggestions for it, please?
I use configs.yaml below.

results_dir: /workspace/results_32x384
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 384
  input_height: 32
  input_channel: 3
dataset:
  train_dataset_dir: [/workspace/dataset/train]
  val_dataset_dir: /workspace/dataset/val
  character_list_file: /workspace/char_list.txt
  max_label_length: 6000
  batch_size: 100
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [3]
  optim:
    name: "adadelta"
    lr: 0.1
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1

For handwritten text recognition, it is suggested to add handwriting OCR dataset to retrain the model.

Thank you for your help. But I have a problem in exporting .onnx file to .engine file.
I use docker image nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy
My command: ocrnet gen_trt_engine -e /workspace/configs/configs_ocr_reg.yaml gen_trt_engine.onnx_file=/workspace/results_32x384_retrain_v2/export/best_accuracy.onnx gen_trt_engine.trt_engine=/workspace/results_32x384_retrain_v2/export/best_accuracy.engine gen_trt_engine.tensorrt.min_batch_size=1 gen_trt_engine.tensorrt.opt_batch_size=1 gen_trt_engine.tensorrt.max_batch_size=1 gen_trt_engine.tensorrt.data_type=fp32

Error:

python /usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/ocrnet/scripts/gen_trt_engine.py  --config-path /workspace/configs --config-name configs_ocr_reg.yaml gen_trt_engine.onnx_file=/workspace/results_32x384_retrain_v2/export/best_accuracy.onnx gen_trt_engine.trt_engine=/workspace/results_32x384_retrain_v2/export/best_accuracy.engine gen_trt_engine.tensorrt.min_batch_size=1 gen_trt_engine.tensorrt.opt_batch_size=1 gen_trt_engine.tensorrt.max_batch_size=8 gen_trt_engine.tensorrt.data_type=fp32
sys:1: UserWarning:
'configs_ocr_reg.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
<frozen cv.common.hydra.hydra_runner>:99: UserWarning:
'configs_ocr_reg.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Log file already exists at /workspace/results_32x384/status.json
Starting ocrnet gen_trt_engine.
[07/09/2025-09:49:44] [TRT] [I] [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 358, GPU 22438 (MiB)
[07/09/2025-09:49:51] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +431, GPU +110, now: CPU 844, GPU 22548 (MiB)
[07/09/2025-09:49:51] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Parsing ONNX model
List inputs:
Input 0 -> input.
[3, 32, 384].
0.
[07/09/2025-09:49:51] [TRT] [W] The NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION flag has been deprecated and has no effect. Please do not use this flag when creating the network.
[07/09/2025-09:49:51] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/09/2025-09:49:52] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
[07/09/2025-09:49:52] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
Network Description
Input 'input' with shape (-1, 3, 32, 384) and dtype DataType.FLOAT
Output 'output_id' with shape (-1, 97) and dtype DataType.INT32
Output 'output_prob' with shape (-1, 97) and dtype DataType.FLOAT
Output '769' with shape (-1, 97) and dtype DataType.INT32
dynamic batch size handling
[07/09/2025-09:49:58] [TRT] [E] 1: [wrapper.cpp::CublasWrapper::85] Error Code 1: Cublas (Could not initialize cublas. Please check CUDA installation.)
__enter__
Error executing job with overrides: ['gen_trt_engine.onnx_file=/workspace/results_32x384_retrain_v2/export/best_accuracy.onnx', 'gen_trt_engine.trt_engine=/workspace/results_32x384_retrain_v2/export/best_accuracy.engine', 'gen_trt_engine.tensorrt.min_batch_size=1', 'gen_trt_engine.tensorrt.opt_batch_size=1', 'gen_trt_engine.tensorrt.max_batch_size=8', 'gen_trt_engine.tensorrt.data_type=fp32']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_deploy/cv/ocrnet/scripts/gen_trt_engine.py>", line 3, in <module>
  File "<frozen cv.ocrnet.scripts.gen_trt_engine>", line 79, in <module>
  File "<frozen cv.common.hydra.hydra_runner>", line 99, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.common.decorators>", line 63, in _func
  File "<frozen cv.common.decorators>", line 48, in _func
  File "<frozen cv.ocrnet.scripts.gen_trt_engine>", line 75, in main
  File "<frozen cv.ocrnet.engine_builder>", line 154, in create_engine
AttributeError: __enter__
Sending telemetry data.
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: HTTPSConnectionPool(host='telemetry.metropolis.nvidia.com', port=443): Max retries exceeded with url: /api/v1/telemetry (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))
Execution status: FAIL

Could you give me some suggestions for this? I’m looking forward to hearing from you!