MMSegmentation——流程化使用

最新推荐文章于 2025-05-26 15:05:19 发布

吨吨不打野

最新推荐文章于 2025-05-26 15:05:19 发布

阅读量2.3k

点赞数 20

CC 4.0 BY-SA版权

分类专栏：意外接触的一些知识 # OpenMMLab-AI实战营第二期文章标签： mmseg

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/Castlehe/article/details/134993220

意外接触的一些知识同时被 2 个专栏收录

52 篇文章

订阅专栏

OpenMMLab-AI实战营第二期

17 篇文章

订阅专栏

1. 准备步骤

1.0 ❌只推理

如果只是打算使用mmseg进行推理，而不进行训练的话，其实安装可以简单一些

conda install pytorch torchvision -c pytorch
pip install -U openmim
mim install mmengine

pip install mmcv-lite # 不能安装这个版本，mmcv还是会报错 NoduleNotFoundError: No module named 'mmcv._ext'
"""
另外注意：mmcv-lite和mmcv完整版不能同时安装，如果提前安装了mmcv-lite，先删除干净，再去安装mmcv完整版
如果还是解决不了，就重新建立一个环境安装mmcv
"""

pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.html

pip install "mmsegmentation>=1.0.0"
pip install ftfy # mmseg里用到这个库了

参考：

1.1 安装mmseg库和clone repo

详见: Get started: Install and Run MMSeg

conda create --name openmmlab python=3.8 -y
conda activate openmmlab

conda install pytorch torchvision -c pytorch

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

git clone -b main https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
pip install -v -e .

# 如果把mmsegmentation作为第三方依赖，则可以安装
pip install "mmsegmentation>=1.0.0"
# 后续为了可以直接使用本地更新的代码，卸载了mmsegmentation
~/huangs$ pip uninstall mmsegmentation
>  Successfully uninstalled mmsegmentation-1.2.1

这里注意，

pip安装的mmsegmentation包，和直接基于mmsegmentation包文件夹，是不一样的两种使用方式，
要分清楚自己的from mmseg import xxx调用的是mmsegmentation的repo文件夹里的内容，还是安装的mmseg库的内容
- 尤其是自己对mmsegmentation的repo文件夹进行了一些自定义的修改，但是不生效的时候
对python的当前可执行路径，相对路径和系统搜索路径那些比较迷糊的话，容易混用出错

1.2 测试是否安装成功

下载一下测试需要的配置文件和模型，

conda activate openmmlab
cd mmsegmentation
mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .

可以直接运行已经写好的一个脚本：

python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg

也可以在mmsegmentation的一级文件夹下，创建一个脚本check_mmseg.py，复制以下代码：

from mmseg.apis import inference_model, init_model, show_result_pyplot

config_file = 'pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py'
checkpoint_file = 'pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
model = init_model(config_file, checkpoint_file, device='cuda:0')

img = 'demo/demo.png'  
result = inference_model(model, img)

"""如果是vscode连接的服务器，这里的show会卡住"""
# show_result_pyplot(model, img, result, show=True)  
show_result_pyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)
"""如果有视频文件的话，可以看看"""
# video = mmcv.VideoReader('video.mp4')
# for frame in video:
#    result = inference_model(model, frame)
#    show_result_pyplot(model, frame, result, wait_time=1)

可以直接命令行运行：

python check_mmseg.py
或者选择好相应的python环境，然后点击运行按钮

报错：

import ftfy
ModuleNotFoundError: No module named 'ftfy'

解决：

pip install ftfy

报错：

  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory

解决：

> pip show mmcv
Name: mmcv
Version: 2.0.0
Summary: OpenMMLab Computer Vision Foundation
Home-page: https://github.com/open-mmlab/mmcv
Author: MMCV Contributors
Author-email: openmmlab@gmail.com
License: UNKNOWN
Location: /home/ubuntu/.local/lib/python3.8/site-packages
Requires: addict, mmengine, numpy, packaging, Pillow, pyyaml, yapf
Required-by: 


> python -c 'import torch;print(torch.__version__);print(torch.version.cuda)'
1.13.0+cu117
11.7

在页面：<https://mmcv.readthedocs.io/en/latest/get_started/installation.html>选择相应的环境信息，执行以下命令
>  pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html

1.3 准备数据

1.3.1 明确数据格式

可以去 mmsegmentation/docs/en/user_guides/2_dataset_prepare.md 里看看，有没有自己可能会用到的，或者同一个研究目标/领域的，看看人家的数据处理脚本。

比如，我就是按照 mmsegmentation/tools/dataset_converters/stare.py，把数据集整理成下面这种格式：

.
├── mmseg_vessel
│   ├── annotations
│   │   ├── training
│   │   └── validation
│   └── images
│       ├── training
│       └── validation

需要注意的点：

在 mmsegmentation/tools/dataset_converters/stare.py 中，
- 对应的数据集的mask是三通道的RGB图，不过图像内容只有黑-背景-[0,0,0]和白-前景-[255,255,255]，
- 因此需要把mask转为单通道图像，比如：0表示背景，1表示前景

在 mmsegmentation/mmseg/datasets/basesegdataset.py 中，可以看到很详细的注释：

 ├── data
    │   ├── my_dataset
    │   │   ├── img_dir
    │   │   │   ├── train
    │   │   │   │   ├── xxx{img_suffix}
    │   │   │   │   ├── yyy{img_suffix}
    │   │   │   │   ├── zzz{img_suffix}
    │   │   │   ├── val
    │   │   ├── ann_dir
    │   │   │   ├── train
    │   │   │   │   ├── xxx{seg_map_suffix}
    │   │   │   │   ├── yyy{seg_map_suffix}
    │   │   │   │   ├── zzz{seg_map_suffix}
    │   │   │   ├── val

因此，要按照上面的格式：

自己把数据分成训练和验证
label和image的名称要一样，后缀可以不一样，比如：img_suffix='.png', seg_map_suffix='.ah.png',，对应的图像名字类似：im0001.png，对应的mask名称类似：im0001.ah.png

参考：

1.3.2 定义自己的数据类——持久化（文件形式）

有两种方式可以自定义数据类，第一种，就是文件形式，另外一种就是直接下载要运行的代码里，下面会说。

详见：OpenMMLab-AI实战营第二期——5-2. MMSegmentation代码课的**1.1 持久化运行（用文件定义）**部分

或者下面的官方文档：

1.3.3 定义自己的数据类——运行时✅

个人更推荐这种使用方式，不需要去在文件夹里改文件，比较方便。
但是如果长时间操作一个数据集，那还是把它写成文件的形式比较好。

详见：OpenMMLab-AI实战营第二期——5-2. MMSegmentation代码课的1.2 运行时生效（直接运行时定义一个class） 部分

这里给个例子：

from mmseg.datasets.basesegdataset import BaseSegDataset
from mmseg.registry import DATASETS

"""自定义的数据类"""
@DATASETS.register_module()
class WatermelonDataset(BaseSegDataset):
    METAINFO = dict(
        classes=('red', 'green', 'white','seed-black', 'seed-white','tabBlue'),
        palette=[[214, 39, 40], [44, 160, 44], [255, 255, 255],[0, 0, 0],
         [255, 255, 255],[31, 119, 180],])

    def __init__(self,
                 img_suffix='.jpg',
                 seg_map_suffix='.png',
                 reduce_zero_label=False,
                 **kwargs) -> None:
        super().__init__(
            img_suffix=img_suffix,
            seg_map_suffix=seg_map_suffix,
            reduce_zero_label=reduce_zero_label,
            **kwargs)

from mmengine.runner import Runner
from mmseg.utils import register_all_modules

# register all modules in mmseg into the registries
# do not init the default scope here because it will be init in the runner
"""注册刚刚定义的数据类"""
register_all_modules(init_default_scope=False)

def train():
	cfg = xxx

"""
注册完自定义数据类之后，在config.py中使用自定义的数据类就不会报错了
"""

可以看看: mmsegmentation/demo/MMSegmentation_Tutorial.ipynb

在页面中搜索@DATASETS.register_module()就可以看到一个自定义数据集的示例了

❓1.4 reduce_zero_label参数设置

根据 Functionality of reduce_zero_label或者相应的中文mmsegmentation/docs/zh_cn/notes/faq.md

dataset类中的参数reduce_zero_label是一个布尔类型的值，默认是False，表示忽略数据集中label=0的类。

具体做法是把label为0的类的label改为255，剩下的类的label编号都相应的减去1，同时 decode head 里将 255 设为 ignore index，即不参与 loss 计算。reduce_zero_label实现的具体逻辑如下：

if self.reduce_zero_label:
    # avoid using underflow conversion 下溢
    gt_semantic_seg[gt_semantic_seg == 0] = 255
    gt_semantic_seg = gt_semantic_seg - 1
    gt_semantic_seg[gt_semantic_seg == 254] = 255

关于你自定义的数据集类是否需要使用reduce_zero_label，有以下两种情况
- 比如对于 Potsdam 数据集，有 0-非渗透表面（马路等地面）、1-建筑、2-低矮植被、3-树、4-汽车、5-杂乱这6类标签。不过这个数据集提供了两类RGB标签，一种是图像边缘处有黑色像素的标签，另一种是没有黑色边缘的标签。
- 对于有黑色边缘的标签，使用 dataset_converters.py脚本，把黑色边缘转换为 label 0，其余标签分别为 1-非渗透表面（马路等地面）、2-建筑、3-低矮植被、4-树、5-汽车、6-杂乱，那么此时，就应该在自定义数据类potsdam.py 中将reduce_zero_label=True。
- 如果使用的是没有黑色边缘的标签，那么 mask label 中只有 0-5，此时就应该使reduce_zero_label=False。
- 需要结合您的实际情况来使用。
简单来说，如果类别0是不需要的背景类，那就reduce_zero_label=True，如果类别0虽然是背景，但是你需要这个类，那么保持reduce_zero_label=False即可

参考：

MMSegmentation系列之训练与推理自己的数据集（三）

1.5 三通道RGB的mask转为语义图

篇幅有限，写到OpenMMLab-AI实战营第二期——相关3. RGB语义分割标注图像转为Gray格式的mask这里了

2. 配置文件

配置文件的修改也有两种：

直接修改xxconfig.py文件（持久化）
代码修改（运行时）

2.1 修改config.py文件——持久化✅

个人更喜欢这种，可视化，直接打开文件用编辑器修改，代码逐个修改很慢

① 确定自己所使用的基础config

按照网络找：https://github.com/open-mmlab/mmsegmentation/tree/main/configs
- 比如：mmsegmentation/configs/unet/

②保存完整的config文件

由于MMengine的config都是继承的，所以一般直接找到的config文件里都是引用其它文件的内容，想看到完整的，可以：

"""注意：这个python脚本需要在mmsegmentation这个可执行目录下运行"""
from mmengine import Config
"""自己想要基于的config"""
raw_config_path ="./configs/pspnet/pspnet_r18-d8_4xb4-80k_potsdam-512x512.py"
cfg = Config.fromfile(raw_config_path)
print(cfg.pretty_text)

"""自己要把完整的config保存到哪里"""
config_file_path = 'project/project_name/config/pspnet-watermelon_20230618.py'
cfg.dump(config_file_path)

③直接去保存的配置文件里修改

2.2 代码修改config变量的元素——运行时

这个其实也很常见，mmsegmentation的教程和一些网课基本都是使用的这种方式，例如：

from mmengine import Config
cfg = Config.fromfile('./configs/deeplabv3plus/deeplabv3plus_r101-d8_4xb4-160k_ade20k-512x512.py')
dataset_cfg = Config.fromfile('./configs/_base_/datasets/ZihaoDataset_pipeline.py')
cfg.merge_from_dict(dataset_cfg)

cfg.crop_size = (512, 512)
cfg.model.data_preprocessor.size = cfg.crop_size

# 单卡训练时，需要把 SyncBN 改成 BN
cfg.norm_cfg = dict(type='BN', requires_grad=True)
cfg.model.backbone.norm_cfg = cfg.norm_cfg
cfg.model.decode_head.norm_cfg = cfg.norm_cfg
cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg

# 模型 decode/auxiliary 输出头，指定为类别个数
cfg.model.decode_head.num_classes = NUM_CLASS
cfg.model.auxiliary_head.num_classes = NUM_CLASS

# 训练 Batch Size
cfg.train_dataloader.batch_size = 4

# 结果保存目录
cfg.work_dir = './work_dirs/ZihaoDataset-DeepLabV3plus'

# 模型保存与日志记录
cfg.train_cfg.max_iters = 20000 # 训练迭代次数
cfg.train_cfg.val_interval = 500 # 评估模型间隔
cfg.default_hooks.logger.interval = 100 # 日志记录间隔
cfg.default_hooks.checkpoint.interval = 2500 # 模型权重保存间隔
cfg.default_hooks.checkpoint.max_keep_ckpts = 1 # 最多保留几个模型权重
cfg.default_hooks.checkpoint.save_best = 'mIoU' # 保留指标最高的模型权重

# 随机数种子
cfg['randomness'] = dict(seed=0)

# 查看完整config配置文件
print(cfg.pretty_text)

# 保存
cfg.dump('Zihao-Configs/ZihaoDataset_DeepLabV3plus_20230818.py')

参考：

2.3 常见的配置项

2.3.1 early stop

这个HOOK钩子的支持，是在OpenMMLab 2.0之后实现的，使用方式类似：

default_hooks = dict(
    early_stopping=dict(
        type="EarlyStoppingHook",
        monitor="coco/bbox_mAP",
        patience=10,
        min_delta=0.005),
    checkpoint=dict(
        type="CheckpointHook",
        interval=interval,
        save_begin=100,
        max_keep_ckpts=max_keep_ckpts,
        save_best=save_best)
)

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=4000),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='SegVisualizationHook'),
    early_stop=dict(type='EarlyStoppingHook',monitor = "mDice",patience=20)) // mmsegmentation的early stop

关于EarlyStoppingHook类，可以看类的定义和文档

关于其中使用的检测指标-monitor字段(评估指标)的值，

如果是mmdetection库，看mmdetection/mmdet/evaluation/metrics/coco_metric.py，其中68行有 default_prefix: Optional[str] = 'coco'
如果是mmpretrain库，看 mmpretrain.evaluation
如果是mmsegmentation，看mmsegmentation/mmseg/evaluation/metrics/iou_metric.py

参考：

OpenMMLab 2.0: new architecture, algorithm, and ecology， 2022.9.22
✅Early stopping with mmdetection #2062

2.3.2 tensorboard和wandb

网上比较常见的可能已经过期了，比如：

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        dict(type='TensorboardLoggerHook') # 启动tensorboard记录（该行一般默认被注释起来）
    ])

来自： MMSeg Tensorboard 开关和写给 MMSegmentation 工具箱新手的避坑指南

2.3.3 save best

default_hooks = dict(checkpoint=dict(type='CheckpointHook', save_best='auto'))

参考： mmengine-hook-checkpointhook

2.3.4 概率图转mask时的阈值(最后一个输出层选择常规的ce-softmax还是bce二分类交叉熵损失-sigmoid)

根据How to handle binary segmentation task可知，这个参数的配置在decode_head 中

在mmsegmentation/mmseg/models/decode_heads/decode_head.py中，有：

 threshold (float): Threshold for binary segmentation in the case of
            `num_classes==1`. Default: None.
 if out_channels is None:
            if num_classes == 2:
                warnings.warn('For binary segmentation, we suggest using'
                              '`out_channels = 1` to define the output'
                              'channels of segmentor, and use `threshold`'
                              'to convert `seg_logits` into a prediction'
                              'applying a threshold')
            out_channels = num_classes

        if out_channels != num_classes and out_channels != 1:
            raise ValueError(
                'out_channels should be equal to num_classes,'
                'except binary segmentation set out_channels == 1 and'
                f'num_classes == 2, but got out_channels={out_channels}'
                f'and num_classes={num_classes}')

        if out_channels == 1 and threshold is None:
            threshold = 0.3
            warnings.warn('threshold is not defined for binary, and defaults'
                          'to 0.3')

class ASPPHead(BaseDecodeHead) // 目前网络中使用的ASPPHead是继承自上面的BaseDecodeHead

即: 使用sigmoid进行二分类，输出单通道的时候，才会用到threshold这个参数。
不过对于ASPPHead类而言，config中并没有这个threshold的参数设置。
其实拿到网络原始的输出之后，自己用sigmoid过一遍，再去设置阈值，其实是自己可控的一个状态，所以也不需要一致纠结这个参数没有暴露出来。

而如果使用的是softmax，则会选取输出概率最大的那个类，不会用到threshold这个参数

另外，在mmsegmentation/mmseg/models/losses
/cross_entropy_loss.py中，有：

  if self.use_sigmoid:
            self.cls_criterion = binary_cross_entropy
 elif self.use_mask:
     self.cls_criterion = mask_cross_entropy
 else:
     self.cls_criterion = cross_entropy

所以在decode_head中配置use_sigmoid=True，其实就是使用bce了。（对于多类，就是每个类进行二分类）

 decode_head=dict(
        type='ASPPHead',
        in_channels=64,
        in_index=4,
        channels=16,
        dilations=(1, 12, 24, 36),
        dropout_ratio=0.1,
        num_classes=5, # 解码器部分类别数变为5
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=[
            dict(
                type='CrossEntropyLoss', loss_name='loss_ce', use_sigmoid=True,loss_weight=1.0), 
            dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
        ]),

不需要把数据集的单通道mask改成多通道的语义mask

参考：

❓2.4 数据增强

mmsegmentation/docs/en/advanced_guides/transforms.md页面中有比较全面的解释，

即：

2.4.1 修改亮度/PhotoMetricDistortion

页面mmcv-Docs > Module code > mmcv.image.photometric搜索adjust_brightness

页面mmsegmentation/docs/en/advanced_guides/transforms.md中搜索PhotoMetricDistortion

以及mmsegmentation-Docs > Basic Concepts > Data Transforms

2.4.2 gamma和CLAHE

页面mmcv-Docs > Module code > mmcv.image.photometric搜索def clahe(img, clip_limit=40.0, tile_grid_size=(8, 8)):

参考：

深度学习数据增强之亮度变换
Data augmentation in semantic segmentation
- Albumentations里的数据增强很多，pytorch有些图像分割方面的数据增强也没有

❓ 2.4.3 RandomRotate

参考：

2.5 评估指标

config修改如下：

test_evaluator = dict(type='IoUMetric', iou_metrics=['mDice'])  
改为：（直接在列表里加要的指标就可以了）
test_evaluator = dict(type='IoUMetric', iou_metrics=['mDice','mFscore'])

对应的脚本如下：

// tools/test.py#L41-L46
 parser.add_argument(
     '--eval',
     type=str,
     nargs='+',
     help='evaluation metrics, which depends on the dataset, e.g., "mIoU"'
     ' for generic datasets, and "cityscapes" for Cityscapes')
"""
Just add --eval mIoU or --eval mDice or other metric in ./tools/test.py
"""
// https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/core/evaluation/metrics.py#L360
 allowed_metrics = ['mIoU', 'mDice', 'mFscore']
 ....
 mIoU
ret_metrics['IoU'] = iou
ret_metrics['Acc'] = acc
 ...
 mDice
ret_metrics['Dice'] = dice
ret_metrics['Acc'] = acc

mFscore
ret_metrics['Fscore'] = f_value
ret_metrics['Precision'] = precision
ret_metrics['Recall'] = recall

参考：

How can I compute the average recall for a dataset? #1678
https://github.com/open-mmlab/mmsegmentation/blob/master/tools/test.py#L41-L46
https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/core/evaluation/metrics.py中搜索metrics=[，支持的metrics有：
- metrics=[‘mIoU’], metrics=[‘mDice’], metrics=[‘mFscore’],
https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/advanced_guides/add_metrics.md

2.6 load_from 和 resume区别

~~resume-from~~（0.x到1.x迁移后，已经改为resume ）
加载模型权重和优化器状态，epoch是从指定的checkpoint开始的，通常用于恢复突然中断的训练过程。
resume = False # Whether to resume from existed model.
load-from # Load checkpoint from file.
只会加载模型权重，同时训练的epoch从0开始，一般用于finetune
pretrained # The pretrained backbone to be loaded（这个参数位于model中，仅限于模型的加载，不涉及优化器）

其中，resume 和load-from是配套使用的。
具体规则，在Runtime settings中，
在这里插入图片描述
实际使用的体验是：

需要自己不仅resume=True，而且需要load_from指定last_checkpoint的位置
否则，如果注释掉pretrained，resume=True, load_from=None，则会参数从头训练

在mmsegmentation的0.x到1.x版本迁移中，可以看到参数表示变化：
在这里插入图片描述

一般只会用到pretrained。

参考：

2.7 关于tta配置和test.py脚本运行无法保存图像的问题

参考：

2.8 tensorboard中显示image这个tab

2.8.1 显示val阶段的推理结果

默认不配置的时候，tensorboard显示的是数据增强的代码的text，

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=2000),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='SegVisualizationHook', draw=True, interval=1))

// 上面的是SegVisualizationHook，是个钩子，下面是SegLocalVisualizer
// 上面这个true，visualize ground truth and prediction of segmentation during model testing and evaluation

// 而下面这个vis_backends和visualizer，主要用在可视化scalars in TensorBoard，并不包括image
vis_backends = [dict(type='LocalVisBackend'),
                dict(type='TensorboardVisBackend')]
visualizer = dict(
    type='SegLocalVisualizer', vis_backends=vis_backends, name='visualizer')

在上面的visualization=dict(type='SegVisualizationHook', draw=True, interval=1))设置True之后，其实训练结束（经过val阶段），其实可以在tensorboard中看到IMAGES这个tab，其中就显示了

有验证集和测试集，test_或者val_前缀+文件名，如果你用的测试集和验证集是一样的，那显示出的其实是重复的结果）
(和直接运行 python tools/test.py脚本效果是一样的)
同时在对应的work_dir/vis_data/vis_image/文件夹中，保存了test的结果。例如：test_31_training.png_0.png（没有保存val的结果，因为训练过程中会进行多次val）

参考：

2.8.2 ❓显示送入网络的batch

篇幅有限，写在另一个博客：

❓2.9 测试时是在全图上测试还是patch测试

在mmsegmentation-Docs > Train & Test > Tutorial 1: Learn about Configs中，有：

 test_cfg=dict(mode='whole'))  # The test mode, options are 'whole' and 'slide'. 'whole': whole image fully-convolutional test. 'slide': sliding crop window on the image.

test_cfg=dict(mode='slide', crop_size=(128, 128), stride=(85, 85)),
test_cfg=dict(mode='whole')

❓2.10 RepeatDataset类型的times

参考：

❓2.11 优化器

2.12 Data Preprocessor

参考：

数据预处理器（Data Preprocessor）

2.13 输入转为灰度图进行训练

一个邪门的操作：

在pipeline中使用RGB2Gray，同时设置输出通道数是3，根据代码，也就是会把灰度通道复制/重复3次
网络模型接受的输入是3通道

参考：

2.14 在tensorboard中添加val_loss（验证集相关的指标）

根据以下网页：

mmdetection使用tensorboard可视化训练集与验证集指标参数
mmseg1.0到2.x接口迁移： https://mmsegmentation.readthedocs.io/zh-cn/latest/migration/interface.html#id8
Validation Loss During Training #2002
可知：

workflow = [('train', 1), ('val', 1)]

workflow 的改动:workflow 相关功能被删除。

根据Logging validation loss without library code hacks#1396以及：

Fix validation loss logging (#1494)

大意就是val_step和train_step重名了，导致val的loss等信息无法打印出来，因此进行修改，但是以下修改也是在0.x版本中

 def val_step(self, data_batch, optimizer=None, **kwargs):
        """The iteration step during validation.

        This method shares the same signature as :func:`train_step`, but used
        during val epochs. Note that the evaluation after training epochs is
        not implemented with this method, but an evaluation hook.
        """
        losses = self(**data_batch)
        loss, log_vars = self._parse_losses(losses)
        log_vars_ = dict()
        for loss_name, loss_value in log_vars.items():
            k = loss_name + '_val'
            log_vars_[k] = loss_value

        outputs = dict(
            loss=loss,
            log_vars=log_vars_,
            num_samples=len(data_batch['img_metas']))

        return outputs

根据数据流概述
在这里插入图片描述
所以如果使用了系统默认的ValLoop，则不会打印出loss，而是会直接传递给Evaluator，计算评估指标。

所以就会出现我之前遇到的： dice_loss里忽略了背景类，但是Evaluator输出的val的结果中包含了DICE指标

关于ValLoop等的实现，在 mmengine/mmengine/runner/loops.py

在mmsegmentation/docs/zh_cn/advanced_guides/models.md中，有：

我们通常将深度学习任务中的神经网络定义为模型，这个模型即是算法的核心。MMEngine 抽象出了一个统一模型 BaseModel 以标准化训练、测试和其他过程。MMSegmentation 实现的所有模型都继承自BaseModel，并且在 MMSegmention 中，我们实现了前向传播并为语义分割算法添加了一些功能。
https://github.com/open-mmlab/mmengine/blob/main/mmengine/model/base_model/base_model.py#L16
- base_model中有val_step和train_step，还有test_step
https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/segmentors/base.py

2.15 二分类和多分类时的CE中softmax和sigmoid的问题

根据：How to handle binary segmentation task或者中文： mmsegmentation的二值分割注意事项自用笔记

在这里插入图片描述

以及代码：mmsegmentation/mmseg/models/decode_heads/decode_head.py

在这里插入图片描述

在进行二分类的时候，如果按照上面推荐的第二种方式，设置out_channels=1，use_sigmoid=True，则再去设置ignore_index=0，就会报以下错误

  File "/home/ubuntu/huangs/VesselSeg/mmsegmentation/mmseg/models/losses/dice_loss.py", line 178, in forward
    loss = self.loss_weight * dice_loss(
  File "/home/ubuntu/huangs/VesselSeg/mmsegmentation/mmseg/models/losses/dice_loss.py", line 73, in dice_loss
    assert pred.shape[1] != 0  # if the ignored index is the only class

因此，如果进行二分类，但是要设置ignore_index=0，则只能使用第一种方式，即设置out_channels=2，use_sigmoid=False

3. 运行

一般如果要使用mmsegmentation的github仓库，而不是单单使用pip安装的作为第三方库的mmsegmentation，就会涉及

运行路径的问题，1.1 安装mmseg库和clone repo里提到的， 3.1会详细描述这个问题
自己的项目如何和mmsegmentation仓库文件结构整合的问题，3.2会详细描述这个问题
- 自己的datasets，config和训练时生成的文件放在哪里

3.1 当前执行目录和系统路径

如果想在非mmsegmentation目录下正确使用mmsegmentation这个repo的各种脚本，那么需要添加系统路径

运行时添加

import sys
sys.path.append('~/XXX/mmsegmentation') 
"""
把项目添加到系统路径，这样就会在mmsegmentation下寻找mmseg
from mmseg.utils import register_all_modules
这种代码就不会报错找不到 mmseg了

可以打印一下sys.path
 ['~/XXX/mmsegmentation/projects/vesselSeg/code', ...., '~/XXX/mmsegmentation' ,...]
默认第一个搜索路径是当前脚本所在的父文件夹
确保看到了自己添加的`~/XXX/mmsegmentation`搜索路径
"""

根据 mmsegmentation/projects/example_project/README.md，也可以提前使用bash添加

cd mmsegmentation
export PYTHONPATH=`pwd`:$PYTHONPATH

参考：OpenMMLab【超级视客营】——支持InverseForm Loss(MMSegmentation的第三个PR)中 3.6 报错 ImportError: cannot import name ‘InverseFormLoss’ from ‘mmseg.models.losses’ 部分

3.2 项目结构

这是个人习惯使用的方式，

.
├── CITATION.cff
├── configs
├── dataset-index.yml
├── data    // ✅ 新建
├── demo
├── docker
├── docs
├── LICENSE
├── MANIFEST.in
├── mmseg
├── model-index.yml
├── projects  // ✅ 删除新建
├── README.md
├── README_zh-CN.md
├── requirements
├── requirements.txt
├── resources
├── setup.cfg
├── setup.py
├── tests
└── tools

在clone好mmsegmentation项目之后，

新建data文件夹，按照1.3.1 明确数据格式中的内容去放置数据。这样做的好处在于，大部分现有的mmsegmentation/config中的config.py文件，都是假设数据集位于data文件夹写的

清空projects 中的内容，之后自己的项目就放在这里。文件结构可以是：

	.
├── AV_Seg  // 项目名称
├── pretrained  // 用到的预训练模型存放的位置
│   └── deeplabv3_unet_s5-d16_ce-1.0-dice-3.0_128x128_40k_stare_20211210_201825-21db614c.pth
└── vesselSeg // 另一个项目名称
    ├── code // 放项目需要的一些代码
    	├── train.py 
    	├── XXX
    ├── config // 放自己用的完整的配置文件
    	├── unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py
    ├── experiments 
    └── README.md // 本项目的描述

仅供参考，这是我自己用的

3.3 指定GPU运行

// 以前看到的都是
export CUDA_VISIBLE_DEVICES=0,1 # 0号和1号GPU
export CUDA_VISIBLE_DEVICES="" means CPU
// 这种存在于当前会话期间

// 不过现在好像更多都是使用
CUDA_VISIBLE_DEVICES=3 python ../tools/train.py 
多个gpu的话，写法也是类似的
CUDA_VISIBLE_DEVICES=1,3
// 这种就是存在于训练期间

参考：

3.4 训练命令和测试命令

mmsegmentation提供了两个很简单的脚本：

直接把这个脚本里面相关的内容改改，就可以变成自己调用的内容了。

最基本的调用命令

// 训练命令
CUDA_VISIBLE_DEVICES=0 python ./projects/vesselSeg/code/train.py \
./projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
--work-dir ./projects/vesselSeg/experiments/unet_deeplabv3_604 
// 第一个参数，配置文件，config，不需要加--config，是强制参数
// --work-dir 训练过程中生成的文件存放的位置


// 测试命令
CUDA_VISIBLE_DEVICES=0 python tools/test.py \
./projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
/content/drive/MyDrive/OpenMMLab/workdir/PSPNet/iter_1000.pth\
--work-dir ./projects/vesselSeg/experiments/unet_deeplabv3_604

3.5 推理概率图

其实之前也涉及过相关的操作

在 openMMLabCampusLearn/Exercise_2/Exercise_2.ipynb中也分析过调用函数结构输出的数据格式（直接拉到页面最下方）
在openMMLabCampusLearn/Exercise_4/Exercise_4.ipynb的3.4 推理部分也直接把推理后的数据结构拆分过

from mmseg.apis import inference_model, init_model, show_result_pyplot
config_file = './projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py'
checkpoint_file = './projects/vesselSeg/experiments/unet_deeplabv3_604/iter_40000.pth'

model = init_model(config_file, checkpoint_file, device='cuda:1')
img = '/home/ubuntu/huangs/VesselSeg/Datasets/0323_img/36_青光眼_OS.jpg'

result = inference_model(model, img)

type(result)
> mmseg.structures.seg_data_sample.SegDataSample

result.seg_logits.data.shape
> torch.Size([2, 3072, 4096])

在这里插入图片描述

参考：

mmsegmatation-Docs > Basic Concepts > Structures

3.6 可视化——数据增强之后的输入网络的batch

这里注意，mmsegmentation暂时还不支持 2.3.3 tensorboard中显示image这个tab

不过，opencvmmlab下面的库提供了一些便利的脚本，可以用来快速查看输入网络的batch，即：

报错
KeyError: 'AV62Dataset is not in the mmseg::dataset registry. Please check whether the value of `AV62Dataset` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'
所以如果想使用tools里的脚本，还是需要把自定义的数据集配到文件中去，而不是运行时的那种。

正确使用方式：

cd ~/mmsegmentation // 确保是在mmsegmentation目录中

export PYTHONPATH=`pwd`:$PYTHONPATH // 加入系统路径

// 典型使用的命令行
 python ./tools/analysis_tools/browse_dataset.py ./projects/AV_Seg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
 --output-dir ./projects/AV_Seg/infer 

> [                                                  ] 9/1840000, 0.5 task/s, elapsed: 19s, ETA: 3943510s

不过这个使用其实依然不是很方便，这里训练集一共46个图，保存结果的时候不是按照batch保存显示为一张大图，而是逐文件覆盖，

同时还需要下列修改一下才能看到变换后的img和mask（非叠加格式，而是拼接模式）

// mmsegmentation/tools/analysis_tools/browse_dataset.py

visualizer = VISUALIZERS.build(cfg.visualizer)
visualizer.alpha =1.0  // 加这句，这样保存的时候，mask就是百分百（会忽视之前和背景叠加时的背景），image也是百分百了
visualizer.dataset_meta = dataset.metainfo

visualizer.add_datasample(
    name=osp.basename(img_path),
    image=img,
    data_sample=data_sample,
    draw_gt=True,
    draw_pred=False,
    wait_time=args.show_interval,
    out_file=out_file,
    withLabels=False, # 把withLabels改为False，不然生成的图像会有标注文字
    show=not args.not_show)

// mmsegmentation/mmseg/visualization/local_visualizer.py
mask（变为） → np.ascontiguousarray(mask) 三个都改
 mask = cv2.rectangle(np.ascontiguousarray(mask), loc,
                       (loc[0] + label_width + baseline,
                        loc[1] + label_height + baseline),
                       classes_color, -1)

def _draw_sem_seg(self,
...
	return color_seg,image  // 返回的内容，添加image，不然browse_dataset的时候，只有mask，而没有image

def add_datasample
        if gt_img_data is not None and pred_img_data is not None:
            drawn_img = np.concatenate((gt_img_data, pred_img_data), axis=1)
        elif gt_img_data is not None: 
            drawn_img = gt_img_data // 这行，改为下面的样子
            drawn_img = np.concatenate((gt_img_data, img), axis=1)

在这里插入图片描述
最后显示就是这个样子，但是看起来确实不是很方便，暂时可以将就着看。

由于之前删除了git，所以这里修改的记录就比较麻烦

参考：

4. mmsegmentation一些结构上的内容

4.1 模型结果数据类型

使用softmax转为概率图：

result = inference_model(model, img)
"""
result: inference_model方法直接返回的mmseg.structures.seg_data_sample.SegDataSample格式
"""
seg_logits = result.seg_logits.data.cpu().numpy()
probility = softmax(seg_logits, axis=0)
probility= (np.clip((probility[1]),0, 1) * 255).astype(np.uint8) 
# 转成可以显示或者保存的单通道图像

SegDataSample这种数据类型包含三个数据类型为PixelData的变量，分别是

gt_sem_seg，标注信息（真值mask）
~~pred_instances~~，预测的mask，文档 mmsegmentation-structures已经过期，还是直接看代码里写的内容吧，预测结果不叫pred_instances，而是pred_sem_seg
seg_logits，网络的直接输出，没有经过激活函数等正则化过的——The raw (non-normalized) predicted result.

其中PixelData类型的定义：

在mmsegmentation/mmseg/models/segmentors/base.py中，有from mmengine.structures import PixelData
mmengine/mmengine/structures/pixel_data.py

class PixelData(BaseDataElement):
    """Data structure for pixel-level annotations or predictions.

    All data items in ``data_fields`` of ``PixelData`` meet the following
    requirements:

    - They all have 3 dimensions in orders of channel, height, and width.
    - They should have the same height and width.

    Examples:
        >>> metainfo = dict(
        ...     img_id=random.randint(0, 100),
        ...     img_shape=(random.randint(400, 600), random.randint(400, 600)))
        >>> image = np.random.randint(0, 255, (4, 20, 40))
        >>> featmap = torch.randint(0, 255, (10, 20, 40))
        >>> pixel_data = PixelData(metainfo=metainfo,
        ...                        image=image,
        ...                        featmap=featmap)
        >>> print(pixel_data.shape)
        (20, 40)

        >>> # slice
        >>> slice_data = pixel_data[10:20, 20:40]
        >>> assert slice_data.shape == (10, 20)
        >>> slice_data = pixel_data[10, 20]
        >>> assert slice_data.shape == (1, 1)

        >>> # set
        >>> pixel_data.map3 = torch.randint(0, 255, (20, 40))
        >>> assert tuple(pixel_data.map3.shape) == (1, 20, 40)
        >>> with self.assertRaises(AssertionError):
        ...     # The dimension must be 3 or 2
        ...     pixel_data.map2 = torch.randint(0, 255, (1, 3, 20, 40))
    """

参考：

关于这里的softmax，假设是一个多分类问题，比如3个类别，那么输出的result.seg_logits的维度就应该是：(classes_num, height, width)，这里假设是(3,4,5)
那么

probility = softmax(seg_logits, axis=0)
# axis=0，则会输出一个4*5的矩阵，也就是刚好是图像的高和宽，即mask

关于axis轴的理解，详见：numpy中关于数组维度的理解——dim和axis

How to implement the Softmax function in Python?
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html

4.2 mmcv读取图像是bgr

注意，mmcv≠opencv，以imread为例：

默认使用的后端是 
imread_backend = 'cv2'
所以读图等操作其实和opencv是差不多的

参考：

5. git保存自己的内容

5.1 删除mmsegmentation的.git信息

由于直接clone下来的mmsegmentation有自己的.git信息，为了排除mmsegmentation的.git信息的影响，可以直接删除.git文件夹

~/mmsegmentation$ rm -rf .git

删除之后，vscode的source control这个tab部分的changes会消失。

如果没有消失的话，ll命令看一下目录下所有的文件（包括隐藏文件）是不是没有删除掉.git文件夹。

5.2 新建自己的git repo

删除之后，可以新建一个git repo，然后关联自己的project文件夹。

git init
git add *
git commit -m "init commit"
git branch -M main
git config --local user.name "huangshan"
git config --local user.email "hs8023hfp@gmail.com"
git remote add origin https://github.com/CastleDream/mmsegProject.git


git push -u origin main
fatal: unable to access 'https://github.com/CastleDream/mmsegProject.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.

# 报错之后，使用sudo，根据参考，可能是权限不够
sudo git push --set-upstream origin main

# 可能会继续报错，运气好的话，网速好，会提示让输入用户名和密码
Username for 'https://github.com': 
Password for 'https://huangshan@github.com': 
# 不过会以失败告终，因为现在必须是ssh的token才行

5.3 给新机器配置ssh

根据git传输时使用的两种协议ssh和http的区别的🌸5. ssh使用去为新的机器设置ssh的token

Your identification has been saved in /home/ubuntu/.ssh/id_ed25519
Your public key has been saved in /home/ubuntu/.ssh/id_ed25519.pub

如果期间报错：

~/mmsegmentation/projects$ git push
fatal: The current branch main has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin main

To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.

~/mmsegmentation/projects$ git push --set-upstream origin main
fatal: unable to access 'https://github.com/XXX/XXX.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.

可以考虑：

 sudo apt-get install build-essential fakeroot dpkg-dev
 // 更新一下git就可以，这是以前的一个bug
 sudo apt install git
 // 更新之后再去 git push --set-upstream origin main
 // 就会提示要打开网页，输入用户名和密码，以及邮箱的验证码，然后就可以正常传输了

参考：

5.4 确认配置的github账号是local

配置完之后，可以确认一下，自己配置的github账号是否是local的

~/mmsegmentation/projects$ git config --local user.name
yourname
~/mmsegmentation/projects$  git config --local user.email
youremail@gmail.com

~/mmsegmentation/projects$ git config --global user.email
anothername
~/mmsegmentation/projects$ git config --global user.name
another_email

5.5 确认关联的远程库正确

git remote -v
origin  https://github.com/XX/XXX.git (fetch)
origin  https://github.com/XX/XX.git (push)

// 还是以前遇到过的老问题，关联remote的时候关联的是https协议，导致push时候依然需要用户名和密码
~/mmsegmentation/projects$git push
Username for 'https://github.com': huangshan
Password for 'https://huangshan@github.com': 
remote: Support for password authentication was removed on August 13, 2021.
remote: Please see https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls for information on currently recommended modes of authentication.
fatal: Authentication failed for 'https://github.com/CastleDream/mmsegProject.git/'

// 把remote的https协议改为git协议即可，然后就可以正常push了
~/mmsegmentation/projects$ git remote set-url origin git@github.com:CastleDream/mmsegProject
~/mmsegmentation/projects$ git push
Enumerating objects: 32, done.
Counting objects: 100% (32/32), done.
Delta compression using up to 72 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (19/19), 387.21 KiB | 2.16 MiB/s, done.
Total 19 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.
To github.com:CastleDream/mmsegProject
   27b16ca..37d21a3  main -> main

参考： github设置token后依然提示要输入用户名和密码

5.6 将project项目设置为mmsegmentation的Submodule项目

这样两个项目可以独立提交。。

5.6.1 主项目-mmsegmentation的配置

// 用ssh的方式来clone项目
git clone git@github.com:CastleDream/mmsegmentation.git // 默认只clone了main分支

// 配置局部用户名
git config --local user.name yourname
git config --local user.email youremail

// 添加原代码库为上游代码库
git remote add upstream git@github.com:open-mmlab/mmsegmentation
// 查看远程repo绑定
git remote -v
origin  git@github.com:CastleDream/mmsegmentation.git (fetch)
origin  git@github.com:CastleDream/mmsegmentation.git (push)
upstream        git@github.com:open-mmlab/mmsegmentation (fetch)
upstream        git@github.com:open-mmlab/mmsegmentation (push)	

// 查看本地分支
git branch
*main

// 查看所有分支（本地和远程）
git branch -a
* main
  remotes/origin/AddLabel2Visualize
  remotes/origin/CastleDream/ADD_BDD100K
  remotes/origin/HEAD -> origin/main
  remotes/origin/SupportInverseFormLoss
  remotes/origin/main

// 把upstream上最新的所有内容都拉到本地
git fetch upstream

// 基于dev-1.x创建新分支，此时用git branch检查，其实会发现，当前已经位于新创建的这个MediWorks分支了
git checkout -b  MediWorks upstream/dev-1.x

// 删除project和多余的文件
git rm -rf projects

参考：OpenMMLab——BDD100K数据集(MMSegmentation的第一个PR)

5.6.2 子项目project的配置

这里和网上常规教程相比，有一个比较特殊的地方，就是我是对某个repo的某个分支设置子仓库的，因此

$ git submodule add -b MediWorks git@github.com:CastleDream/mmsegProject.git projects
// 把子项目放到projects文件夹中

// 如果忘记加-b分支，或者子项目保存的路径写错，可以
$ git rm --cached projects

在使用submodule之后，其实会多出一个`.gitmodules`文件，内容如下：
[submodule "projects"]
	path = projects
	url = git@github.com:CastleDream/mmsegProject.git
	branch = MediWorks

// 确认关联正确
$ git status
On branch MediWorks
Your branch is up to date with 'upstream/dev-1.x'.

参考：

5.6.3 后续操作

主仓库和子项目分别更新：

之后主仓库更新，切换到主仓库目录下去pull
projects仓库更新，切换到projects仓库下去pull
但是即便projects更新的位置就是在主仓库的文件夹中，git上的remote显示也不会主动更新。。略微有点违和，但是也可以理解
- 相当于主仓库中的submodule只是一个软链接，链接到了创建submodule时候的那个节点，如果要更新github上/远端的这个节点，需要额外在主仓库git pull --recurse-submodules
后来发现。。。如果你修改了主仓库下的submodule仓库，默认改动并不会进行记录（这一操作可能是因为submodule可能是来自于别人的仓库）
- submodule正确的更新顺序
- 另一个本地submodule→submodule原始的remote仓库→本地仓库更新submodule文件夹中的内容。
所以感觉不太好用，算了，不要这个submodule了，还是直接全部复制过去吧，都归档在mmsegmentation文件夹下好了，改起来比较方便。

# 1. 删除子模块文件夹
$ git rm --cached projects
rm 'projects'
# 2. 删除文件 .gitmodules
# 3. 删除 .git/config 中相关子模块信息
cd .git
vi config
[submodule "projects"]
        url = git@github.com:CastleDream/mmsegProject.git
        active = true

参考:Unable to find current origin/master revision in submodule path

// 配进去之后，就可以提交了
git add *
git commit -m "update submodule"
# 第一次推送，可以在 git push 后加上 -u 参数以关联远程分支,这个origin还是自己的remote，不是upstream的那个，upstream用PR来交互
git push -u origin MediWorks

就会看到那个submodule的文件夹长得不太一样
在这里插入图片描述
子模块的操作和主仓库是分开进行的，

主仓库目录中除了子模块外的任何子目录下进行的 commit 操作，都会记到主仓库下。
只有在子模块目录内的任何 commit 操作，才会记到子模块仓库下

更新项目内子模块到最新版本
$ git submodule update

更新子模块为远程项目的最新版本
$ git submodule update --remote

clone含有submodule的仓库

git clone  XXX
// 那么得到的submodule文件夹是空的，还需要
git submodule update --init --recursive // 递归的初始化并下载子模块仓库的内容
git pull // 或者cd到相应的子模块目录中，进行拉取

如果是首次clone，那么
git clone --recursive <project url>

参考：

X.报错

X.1使用/安装库相关的报错

1.（tensorboard）ImportError: cannot import name ‘notf’ from ‘tensorboard.compat’

经验之谈：
遇到tensorboard报错，先去安装tensorflow，很可能是因为tensorflow没有安装。。

参考： ImportError: cannot import name ‘notf’ from ‘tensorboard.compat’

报错：

ImportError: Please run "pip install future tensorboard" to install the dependencies to use torch.utils.tensorboard (applicable to PyTorch 1.1 or higher)

但是实际上已经安装了tensorboard，同时按照上面的要求运行了"pip install future tensorboard"。本质上是因为没有安装tensorflow
pip install tensorflow即可

报错：

  File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 45, in tf
    import tensorflow
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 438, in <module>
    _ll.load_library(_main_dir)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 154, in load_library
    py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringB5cxx11ERKNS_15OpKernelContextEb

根据tensorflow/text/issues/385， tensorflow/issues/47212可知，换版本

pip install -U tensorflow # 从2.4到了2.13
# 但是本机的numpy版本很高, 报错
File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. 
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

后来发现报错的位置使用的tensorflow是.local文件夹的，而不是openmmlab这个conda环境里的，去.local文件夹里删除相应的tensorflow库就好了

2. from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.

 File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 553, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

参考：TypeError: Descriptors cannot not be created directly

运行：

pip install protobuf==3.20.*

X.2 已经切换了conda环境，但是访问时优先使用了`.local`里的库（不要用mim安装）

X.2.1 问题描述和有用信息

在已经conda activate openmmlab的情况下运行某个脚本，报错：

Traceback (most recent call last):
  File "./projects/vesselSeg/code/train.py", line 168, in <module>
    train()
  File "./projects/vesselSeg/code/train.py", line 154, in train
    runner = Runner.from_cfg(cfg)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 439, in from_cfg
    runner = cls(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 395, in __init__
    self.visualizer.add_config(self.cfg)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/dist/utils.py", line 366, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/visualizer.py", line 1068, in add_config
    vis_backend.add_config(config, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/vis_backend.py", line 56, in wrapper
    obj._init_env()  # type: ignore
  File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/vis_backend.py", line 537, in _init_env
    raise ImportError(
ImportError: Please run "pip install future tensorboard" to install the dependencies to use torch.utils.tensorboard (applicable to PyTorch 1.1 or higher)

正常调用的应该是是：/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages里的包，而这里调用的是/home/ubuntu/.local/lib/python3.8/site-packages里的包。

根据How to prevent anaconda environment from reading libraries installed in local

(openmmlab) (base) ubuntu@ubun:~/XX/mmsegmentation$ python -m site
sys.path = [
    '/home/ubuntu/XX/mmsegmentation',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python38.zip',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/lib-dynload',
    '/home/ubuntu/.local/lib/python3.8/site-packages',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages',  
    // .local的site-packages位于openmmlab的site-packages之前，也就是会先去.local里搜索包
]
USER_BASE: '/home/ubuntu/.local' (exists)
USER_SITE: '/home/ubuntu/.local/lib/python3.8/site-packages' (exists)
ENABLE_USER_SITE: True

// 切换python的site，就看不到local了。。但是好像没啥用
(openmmlab) (base) ubuntu@ubun:~/XX/mmsegmentation$ python -s -m site
sys.path = [
    '/home/ubuntu/XX/mmsegmentation',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python38.zip',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/lib-dynload',
    '/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages',
]
USER_BASE: '/home/ubuntu/.local' (exists)
USER_SITE: '/home/ubuntu/.local/lib/python3.8/site-packages' (exists)
ENABLE_USER_SITE: False

更详细的关于隔离conda环境和.local的讨论可以看看：Isolating conda environments from ~/.local #7173

可以：

(openmmlab) (base) ubuntu@ubun:~/XX$ which python
/home/ubuntu/anaconda3/envs/openmmlab/bin/python 

(openmmlab) (base) ubuntu@ubun:~/XX$ conda deactivate

(base) (base) ubuntu@ubun:~/XX$ conda deactivate
// 用 conda deactivate就可以退出conda环境了

ubuntu@ubun:~/XX$ which python
/home/ubuntu/anaconda3/bin/python
// 按理说这里应该显示 /usr/bin/python
// 但是这里还是指向了anaconda默认base环境的python。。。

所以服务器环境应该是有点问题。

X.2.2 粗暴解决

目前可以想到的解决办法就是：删除不希望从.local找的包。。这样就会去调用conda env里的包了

cd /home/ubuntu/.local/lib/python3.8/site-packages/
rm -rf XXX库

后来发现是因为MIM安装的原因，似乎MIM会把包装到.local环境下，删除.local环境下的mmengine库之后，运行代码会报错
找不到mmengine库。

直接在相应的conda环境里pip install mmengine即可，再去

pip show mmengine
Name: mmengine
Version: 0.10.1
Summary: Engine of OpenMMLab projects
Home-page: https://github.com/open-mmlab/mmengine
Author: MMEngine Authors
Author-email: openmmlab@gmail.com
License: UNKNOWN
Location: /home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages
Requires: addict, matplotlib, numpy, opencv-python, pyyaml, rich, termcolor, yapf
Required-by: mmcv

就可以看到安装在了对的位置

参考：