How can I fix KeyError: ' ' problem?

Hi, I am using TAO OCR of nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt docker for training to recognize a line in documents.
Here is my config file: configs.txt (702 Bytes)
Char_list: char_list.txt (679 Bytes)
An example in my dataset:

178001.jpg	Tôi không thể làm điều đó!
124430.jpg	Tôi có một tấm thẻ điểm ghi trên mỗi căn nhà.

I have a problem with training process. Although I add the space character into char_list.txt, I think the preprocessing process delete the space character when encoding character. How can I overcome this problem? I am looking forward to hearing from you :

Epoch 0:   0%|                                                                                                     | 0/77026 [00:00<?, ?it/s]
' '
Error executing job with overrides: ['results_dir=/workspace/results_32x384']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "</usr/local/lib/python3.8/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py>", line 3, in <module>
  File "<frozen cv.ocrnet.scripts.train>", line 136, in <module>
  File "<frozen core.hydra.hydra_runner>", line 107, in wrapper
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "<frozen cv.ocrnet.scripts.train>", line 132, in main
  File "<frozen cv.ocrnet.scripts.train>", line 121, in main
  File "<frozen cv.ocrnet.scripts.train>", line 102, in run_experiment
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
    self._run_train()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train
    self.fit_loop.run()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 214, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 200, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 247, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 357, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1342, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/module.py", line 1661, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/optimizer.py", line 169, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
    return self.precision_plugin.optimizer_step(
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 121, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/optimizer.py", line 198, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/optimizer.py", line 29, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/optim/adadelta.py", line 133, in step
    loss = closure()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 107, in _wrap_closure
    closure_result = closure()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 147, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 133, in closure
    step_output = self._step_fn()
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 406, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
    return self.model.training_step(*args, **kwargs)
  File "<frozen cv.ocrnet.model.pl_ocrnet>", line 162, in training_step
  File "<frozen cv.ocrnet.utils.utils>", line 67, in encode
  File "<frozen cv.ocrnet.utils.utils>", line 65, in <listcomp>
KeyError: ' '

If there is space in your original groundtruth, please add “[space]” .
For example,
B JT 304 E → change groudtruth to B[space]JT[space]304[space]E, then use Attention decode way to retrain the model. More, character.list should contain the [space]. For example, A,B,C,D,xxx,a,b,c,d,xxx,[space]

Thank you for replying. In char_list.txt file includes some special characters like [,],-,… I wonder if each character [s, p, a, c, e, ] of [space] is impacted by these special characters when encoding each character in a sequence like B[space]JT[space]304[space]E.
An example:

char_list = ["[space]", "a", 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 's', 'p', '[', ']']
seq = "abc[space]edf"
t = [char_list.index(c) for c in seq]
print(t)

Results: [1, 2, 3, 18, 16, 17, 1, 3, 5, 19, 5, 4, 6]

No, it will not impact.
BTW, there is a similar topic in Using OCRNet from Python script - #65 by foreverneilyoung

Thank you for supporting. I have a different question about text recognition. Model trained is so good for custom printed text data, but with handwritten text data, model doesn’t recognize well. Which models I can build for handwritten text recognition tasks in documents? Could you give me some suggestions for it, please?
I use configs.yaml below.

results_dir: /workspace/results_32x384
encryption_key: nvidia_tao
model:
  TPS: True
  backbone: ResNet
  feature_channel: 512
  sequence: BiLSTM
  hidden_size: 256
  prediction: CTC
  quantize: False
  input_width: 384
  input_height: 32
  input_channel: 3
dataset:
  train_dataset_dir: [/workspace/dataset/train]
  val_dataset_dir: /workspace/dataset/val
  character_list_file: /workspace/char_list.txt
  max_label_length: 6000
  batch_size: 100
  workers: 4
  augmentation:
    keep_aspect_ratio: False
train:
  seed: 1111
  gpu_ids: [3]
  optim:
    name: "adadelta"
    lr: 0.1
  clip_grad_norm: 5.0
  num_epochs: 10
  checkpoint_interval: 2
  validation_interval: 1

For handwritten text recognition, it is suggested to add handwriting OCR dataset to retrain the model.