文章目录
- 1. 准备步骤
- 2. 配置文件
- 2.1 修改config.py文件——持久化✅
- 2.2 代码修改config变量的元素——运行时
- 2.3 常见的配置项
- ❓2.4 数据增强
- 2.5 评估指标
- 2.6 load_from 和 resume区别
- 2.7 关于tta配置和test.py脚本运行无法保存图像的问题
- 2.8 tensorboard中显示image这个tab
- ❓2.9 测试时是在全图上测试还是patch测试
- ❓2.10 RepeatDataset类型的times
- ❓2.11 优化器
- 2.12 Data Preprocessor
- 2.13 输入转为灰度图进行训练
- 2.14 在tensorboard中添加val_loss(验证集相关的指标)
- 2.15 二分类和多分类时的CE中softmax和sigmoid的问题
- 3. 运行
- 4. mmsegmentation一些结构上的内容
- 5. git保存自己的内容
- X.报错
1. 准备步骤
1.0 ❌只推理
如果只是打算使用mmseg进行推理,而不进行训练的话,其实安装可以简单一些
conda install pytorch torchvision -c pytorch
pip install -U openmim
mim install mmengine
pip install mmcv-lite # 不能安装这个版本,mmcv还是会报错 NoduleNotFoundError: No module named 'mmcv._ext'
"""
另外注意:mmcv-lite和mmcv完整版不能同时安装,如果提前安装了mmcv-lite,先删除干净,再去安装mmcv完整版
如果还是解决不了,就重新建立一个环境安装mmcv
"""
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.html
pip install "mmsegmentation>=1.0.0"
pip install ftfy # mmseg里用到这个库了
参考:
- ImportError: DLL load failed while importing _ext: The specified procedure could not be found. #1594
- https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html
- https://github.com/open-mmlab/mmsegmentation/blob/main/docs/zh_cn/get_started.md#installation
- https://github.com/open-mmlab/mmdetection/issues/10664
- https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.html
1.1 安装mmseg库和clone repo
详见: Get started: Install and Run MMSeg
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch torchvision -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
git clone -b main https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
pip install -v -e .
# 如果把mmsegmentation作为第三方依赖,则可以安装
pip install "mmsegmentation>=1.0.0"
# 后续为了可以直接使用本地更新的代码,卸载了mmsegmentation
~/huangs$ pip uninstall mmsegmentation
> Successfully uninstalled mmsegmentation-1.2.1
这里注意,
- pip安装的mmsegmentation包,和直接基于mmsegmentation包文件夹,是不一样的两种使用方式,
- 要分清楚自己的
from mmseg import xxx
调用的是mmsegmentation的repo文件夹里的内容,还是安装的mmseg库的内容- 尤其是自己对mmsegmentation的repo文件夹进行了一些自定义的修改,但是不生效的时候
- 对python的当前可执行路径,相对路径和系统搜索路径那些比较迷糊的话,容易混用出错
1.2 测试是否安装成功
下载一下测试需要的配置文件和模型,
conda activate openmmlab
cd mmsegmentation
mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .
可以直接运行已经写好的一个脚本:
python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg
也可以在mmsegmentation的一级文件夹下,创建一个脚本check_mmseg.py
,复制以下代码:
from mmseg.apis import inference_model, init_model, show_result_pyplot
config_file = 'pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py'
checkpoint_file = 'pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
model = init_model(config_file, checkpoint_file, device='cuda:0')
img = 'demo/demo.png'
result = inference_model(model, img)
"""如果是vscode连接的服务器,这里的show会卡住"""
# show_result_pyplot(model, img, result, show=True)
show_result_pyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)
"""如果有视频文件的话,可以看看"""
# video = mmcv.VideoReader('video.mp4')
# for frame in video:
# result = inference_model(model, frame)
# show_result_pyplot(model, frame, result, wait_time=1)
可以直接命令行运行:
python check_mmseg.py
或者选择好相应的python环境,然后点击运行按钮
报错:
import ftfy
ModuleNotFoundError: No module named 'ftfy'
解决:
pip install ftfy
报错:
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
from .active_rotated_filter import active_rotated_filter
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
ext_module = ext_loader.load_ext(
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libcudart.so.10.1: cannot open shared object file: No such file or directory
解决:
> pip show mmcv
Name: mmcv
Version: 2.0.0
Summary: OpenMMLab Computer Vision Foundation
Home-page: https://github.com/open-mmlab/mmcv
Author: MMCV Contributors
Author-email: openmmlab@gmail.com
License: UNKNOWN
Location: /home/ubuntu/.local/lib/python3.8/site-packages
Requires: addict, mmengine, numpy, packaging, Pillow, pyyaml, yapf
Required-by:
> python -c 'import torch;print(torch.__version__);print(torch.version.cuda)'
1.13.0+cu117
11.7
在页面:<https://mmcv.readthedocs.io/en/latest/get_started/installation.html>选择相应的环境信息,执行以下命令
> pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
1.3 准备数据
1.3.1 明确数据格式
可以去 mmsegmentation/docs/en/user_guides/2_dataset_prepare.md 里看看,有没有自己可能会用到的,或者同一个研究目标/领域的,看看人家的数据处理脚本。
比如,我就是按照 mmsegmentation/tools/dataset_converters/stare.py, 把数据集整理成下面这种格式:
.
├── mmseg_vessel
│ ├── annotations
│ │ ├── training
│ │ └── validation
│ └── images
│ ├── training
│ └── validation
需要注意的点:
- 在 mmsegmentation/tools/dataset_converters/stare.py 中,
- 对应的数据集的mask是三通道的RGB图,不过图像内容只有黑-背景-[0,0,0]和白-前景-[255,255,255],
- 因此需要把mask转为单通道图像,比如:0表示背景,1表示前景
- 在 mmsegmentation/mmseg/datasets/basesegdataset.py 中,可以看到很详细的注释:
因此,要按照上面的格式:├── data │ ├── my_dataset │ │ ├── img_dir │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── ann_dir │ │ │ ├── train │ │ │ │ ├── xxx{seg_map_suffix} │ │ │ │ ├── yyy{seg_map_suffix} │ │ │ │ ├── zzz{seg_map_suffix} │ │ │ ├── val
- 自己把数据分成训练和验证
- label和image的名称要一样,后缀可以不一样,比如:
img_suffix='.png', seg_map_suffix='.ah.png',
,对应的图像名字类似:im0001.png
,对应的mask名称类似:im0001.ah.png
参考:
- mmsegmentation/configs/unet/README.md
- Add 4 retinal vessel segmentation benchmark (#315)
- OpenMMLab-AI实战营第二期——5-2. MMSegmentation代码课
1.3.2 定义自己的数据类——持久化(文件形式)
有两种方式可以自定义数据类,第一种,就是文件形式,另外一种就是直接下载要运行的代码里,下面会说。
详见:OpenMMLab-AI实战营第二期——5-2. MMSegmentation代码课的**1.1 持久化运行(用文件定义)**部分
或者下面的官方文档:
- 英文:https://mmsegmentation.readthedocs.io/en/main/advanced_guides/add_datasets.html
- 中文:https://mmsegmentation.readthedocs.io/zh-cn/latest/advanced_guides/add_datasets.html
- https://mmsegmentation.readthedocs.io/zh-cn/latest/advanced_guides/datasets.html
1.3.3 定义自己的数据类——运行时✅
个人更推荐这种使用方式,不需要去在文件夹里改文件,比较方便。
但是如果长时间操作一个数据集,那还是把它写成文件的形式比较好。
详见:OpenMMLab-AI实战营第二期——5-2. MMSegmentation代码课的1.2 运行时生效(直接运行时定义一个class) 部分
这里给个例子:
from mmseg.datasets.basesegdataset import BaseSegDataset
from mmseg.registry import DATASETS
"""自定义的数据类"""
@DATASETS.register_module()
class WatermelonDataset(BaseSegDataset):
METAINFO = dict(
classes=('red', 'green', 'white','seed-black', 'seed-white','tabBlue'),
palette=[[214, 39, 40], [44, 160, 44], [255, 255, 255],[0, 0, 0],
[255, 255, 255],[31, 119, 180],])
def __init__(self,
img_suffix='.jpg',
seg_map_suffix='.png',
reduce_zero_label=False,
**kwargs) -> None:
super().__init__(
img_suffix=img_suffix,
seg_map_suffix=seg_map_suffix,
reduce_zero_label=reduce_zero_label,
**kwargs)
from mmengine.runner import Runner
from mmseg.utils import register_all_modules
# register all modules in mmseg into the registries
# do not init the default scope here because it will be init in the runner
"""注册刚刚定义的数据类"""
register_all_modules(init_default_scope=False)
def train():
cfg = xxx
"""
注册完自定义数据类之后,在config.py中使用自定义的数据类就不会报错了
"""
可以看看: mmsegmentation/demo/MMSegmentation_Tutorial.ipynb
- 在页面中搜索
@DATASETS.register_module()
就可以看到一个自定义数据集的示例了
❓1.4 reduce_zero_label参数设置
根据 Functionality of reduce_zero_label或者相应的中文mmsegmentation/docs/zh_cn/notes/faq.md
- dataset类中的参数
reduce_zero_label
是一个布尔类型的值,默认是False,表示忽略数据集中label=0的类。 - 具体做法是把label为0的类的label改为255,剩下的类的label编号都相应的减去1,同时 decode head 里将 255 设为 ignore index,即不参与 loss 计算。
reduce_zero_label
实现的具体逻辑如下:if self.reduce_zero_label: # avoid using underflow conversion 下溢 gt_semantic_seg[gt_semantic_seg == 0] = 255 gt_semantic_seg = gt_semantic_seg - 1 gt_semantic_seg[gt_semantic_seg == 254] = 255
- 关于你自定义的数据集类是否需要使用
reduce_zero_label
,有以下两种情况- 比如对于 Potsdam 数据集,有 0-非渗透表面(马路等地面)、1-建筑、2-低矮植被、3-树、4-汽车、5-杂乱这6类标签。不过这个数据集提供了两类RGB标签,一种是图像边缘处有黑色像素的标签,另一种是没有黑色边缘的标签。
- 对于有黑色边缘的标签,使用
dataset_converters.py
脚本,把黑色边缘转换为 label 0,其余标签分别为 1-非渗透表面(马路等地面)、2-建筑、3-低矮植被、4-树、5-汽车、6-杂乱,那么此时,就应该在自定义数据类potsdam.py 中将reduce_zero_label=True
。 - 如果使用的是没有黑色边缘的标签,那么 mask label 中只有 0-5,此时就应该使reduce_zero_label=False。
- 需要结合您的实际情况来使用。
- 简单来说,如果类别0是不需要的背景类,那就
reduce_zero_label=True
,如果类别0虽然是背景,但是你需要这个类,那么保持reduce_zero_label=False
即可
参考:
1.5 三通道RGB的mask转为语义图
篇幅有限,写到OpenMMLab-AI实战营第二期——相关3. RGB语义分割标注图像转为Gray格式的mask这里了
2. 配置文件
配置文件的修改也有两种:
- 直接修改
xxconfig.py
文件(持久化) - 代码修改(运行时)
2.1 修改config.py文件——持久化✅
个人更喜欢这种,可视化,直接打开文件用编辑器修改,代码逐个修改很慢
① 确定自己所使用的基础config
②保存完整的config文件
- 由于MMengine的config都是继承的,所以一般直接找到的config文件里都是引用其它文件的内容,想看到完整的,可以:
"""注意:这个python脚本需要在mmsegmentation这个可执行目录下运行"""
from mmengine import Config
"""自己想要基于的config"""
raw_config_path ="./configs/pspnet/pspnet_r18-d8_4xb4-80k_potsdam-512x512.py"
cfg = Config.fromfile(raw_config_path)
print(cfg.pretty_text)
"""自己要把完整的config保存到哪里"""
config_file_path = 'project/project_name/config/pspnet-watermelon_20230618.py'
cfg.dump(config_file_path)
③直接去保存的配置文件里修改
2.2 代码修改config变量的元素——运行时
这个其实也很常见,mmsegmentation的教程和一些网课基本都是使用的这种方式,例如:
from mmengine import Config
cfg = Config.fromfile('./configs/deeplabv3plus/deeplabv3plus_r101-d8_4xb4-160k_ade20k-512x512.py')
dataset_cfg = Config.fromfile('./configs/_base_/datasets/ZihaoDataset_pipeline.py')
cfg.merge_from_dict(dataset_cfg)
cfg.crop_size = (512, 512)
cfg.model.data_preprocessor.size = cfg.crop_size
# 单卡训练时,需要把 SyncBN 改成 BN
cfg.norm_cfg = dict(type='BN', requires_grad=True)
cfg.model.backbone.norm_cfg = cfg.norm_cfg
cfg.model.decode_head.norm_cfg = cfg.norm_cfg
cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg
# 模型 decode/auxiliary 输出头,指定为类别个数
cfg.model.decode_head.num_classes = NUM_CLASS
cfg.model.auxiliary_head.num_classes = NUM_CLASS
# 训练 Batch Size
cfg.train_dataloader.batch_size = 4
# 结果保存目录
cfg.work_dir = './work_dirs/ZihaoDataset-DeepLabV3plus'
# 模型保存与日志记录
cfg.train_cfg.max_iters = 20000 # 训练迭代次数
cfg.train_cfg.val_interval = 500 # 评估模型间隔
cfg.default_hooks.logger.interval = 100 # 日志记录间隔
cfg.default_hooks.checkpoint.interval = 2500 # 模型权重保存间隔
cfg.default_hooks.checkpoint.max_keep_ckpts = 1 # 最多保留几个模型权重
cfg.default_hooks.checkpoint.save_best = 'mIoU' # 保留指标最高的模型权重
# 随机数种子
cfg['randomness'] = dict(seed=0)
# 查看完整config配置文件
print(cfg.pretty_text)
# 保存
cfg.dump('Zihao-Configs/ZihaoDataset_DeepLabV3plus_20230818.py')
参考:
- MMSegmentation_Tutorials/20230816/【F2】语义分割算法-DeepLabV3+.ipynb
- mmsegmentation/demo/MMSegmentation_Tutorial.ipynb
2.3 常见的配置项
2.3.1 early stop
这个HOOK钩子的支持,是在OpenMMLab 2.0之后实现的,使用方式类似:
default_hooks = dict(
early_stopping=dict(
type="EarlyStoppingHook",
monitor="coco/bbox_mAP",
patience=10,
min_delta=0.005),
checkpoint=dict(
type="CheckpointHook",
interval=interval,
save_begin=100,
max_keep_ckpts=max_keep_ckpts,
save_best=save_best)
)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=4000),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='SegVisualizationHook'),
early_stop=dict(type='EarlyStoppingHook',monitor = "mDice",patience=20)) // mmsegmentation的early stop
关于EarlyStoppingHook
类,可以看类的定义和文档
- https://mmengine.readthedocs.io/zh-cn/stable/api/generated/mmengine.hooks.EarlyStoppingHook.html
- https://mmengine.readthedocs.io/zh-cn/stable/_modules/mmengine/hooks/early_stopping_hook.html
关于其中使用的检测指标-monitor
字段(评估指标)的值,
- 如果是
mmdetection
库,看mmdetection/mmdet/evaluation/metrics/coco_metric.py,其中68行有default_prefix: Optional[str] = 'coco'
- 如果是
mmpretrain
库,看 mmpretrain.evaluation - 如果是
mmsegmentation
,看mmsegmentation/mmseg/evaluation/metrics/iou_metric.py
参考:
- OpenMMLab 2.0: new architecture, algorithm, and ecology, 2022.9.22
- ✅Early stopping with mmdetection #2062
2.3.2 tensorboard和wandb
网上比较常见的可能已经过期了,比如:
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook', by_epoch=False),
dict(type='TensorboardLoggerHook') # 启动tensorboard记录(该行一般默认被注释起来)
])
来自: MMSeg Tensorboard 开关 和 写给 MMSegmentation 工具箱新手的避坑指南
最新的应该是:
visualizer = dict(
type='SegLocalVisualizer',
# vis_backends=[dict(type='LocalVisBackend')],
vis_backends=[dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')],
name='visualizer')
注意,需要安装
pip install tensorflow
pip install tensorboardX
pip install future tensorboard
参考:
- 官方文档-mmsegmentation-Visualization,本质上来自mmengine
- 官方文档-mmengine-visualize_training_log,这里有wandb的设置
2.3.3 save best
default_hooks = dict(checkpoint=dict(type='CheckpointHook', save_best='auto'))
参考: mmengine-hook-checkpointhook
2.3.4 概率图转mask时的阈值(最后一个输出层选择常规的ce-softmax还是bce二分类交叉熵损失-sigmoid)
根据How to handle binary segmentation task可知,这个参数的配置在decode_head 中
在mmsegmentation/mmseg/models/decode_heads/decode_head.py中,有:
threshold (float): Threshold for binary segmentation in the case of
`num_classes==1`. Default: None.
if out_channels is None:
if num_classes == 2:
warnings.warn('For binary segmentation, we suggest using'
'`out_channels = 1` to define the output'
'channels of segmentor, and use `threshold`'
'to convert `seg_logits` into a prediction'
'applying a threshold')
out_channels = num_classes
if out_channels != num_classes and out_channels != 1:
raise ValueError(
'out_channels should be equal to num_classes,'
'except binary segmentation set out_channels == 1 and'
f'num_classes == 2, but got out_channels={out_channels}'
f'and num_classes={num_classes}')
if out_channels == 1 and threshold is None:
threshold = 0.3
warnings.warn('threshold is not defined for binary, and defaults'
'to 0.3')
class ASPPHead(BaseDecodeHead) // 目前网络中使用的ASPPHead是继承自上面的BaseDecodeHead
即: 使用sigmoid进行二分类,输出单通道的时候,才会用到
threshold
这个参数。
不过对于ASPPHead
类而言,config中并没有这个threshold的参数设置。
其实拿到网络原始的输出之后,自己用sigmoid过一遍,再去设置阈值,其实是自己可控的一个状态,所以也不需要一致纠结这个参数没有暴露出来。
而如果使用的是softmax,则会选取输出概率最大的那个类,不会用到threshold
这个参数
另外,在mmsegmentation/mmseg/models/losses
/cross_entropy_loss.py中,有:
if self.use_sigmoid:
self.cls_criterion = binary_cross_entropy
elif self.use_mask:
self.cls_criterion = mask_cross_entropy
else:
self.cls_criterion = cross_entropy
所以在decode_head
中配置use_sigmoid=True
,其实就是使用bce了。(对于多类,就是每个类进行二分类)
decode_head=dict(
type='ASPPHead',
in_channels=64,
in_index=4,
channels=16,
dilations=(1, 12, 24, 36),
dropout_ratio=0.1,
num_classes=5, # 解码器部分类别数变为5
norm_cfg=dict(type='SyncBN', requires_grad=True),
align_corners=False,
loss_decode=[
dict(
type='CrossEntropyLoss', loss_name='loss_ce', use_sigmoid=True,loss_weight=1.0),
dict(type='DiceLoss', loss_name='loss_dice', loss_weight=3.0)
]),
不需要把数据集的单通道mask改成多通道的语义mask
参考:
- mmsegmentation/mmseg/models/decode_heads/aspp_head.py
- Training Tricks(Different Learning Rate for Backbone and Heads) cannot work with cross entropy loss(set use_sigmoid=True) #314
❓2.4 数据增强
mmsegmentation/docs/en/advanced_guides/transforms.md页面中有比较全面的解释,
不过本质上更多的是继承自: https://mmcv.readthedocs.io/en/latest/api/transforms.html
即:
2.4.1 修改亮度/PhotoMetricDistortion
页面mmcv-Docs > Module code > mmcv.image.photometric搜索adjust_brightness
页面mmsegmentation/docs/en/advanced_guides/transforms.md中搜索PhotoMetricDistortion
以及mmsegmentation-Docs > Basic Concepts > Data Transforms
2.4.2 gamma和CLAHE
页面mmcv-Docs > Module code > mmcv.image.photometric搜索def clahe(img, clip_limit=40.0, tile_grid_size=(8, 8)):
参考:
- 深度学习数据增强之亮度变换
- Data augmentation in semantic segmentation
- Albumentations里的数据增强很多,pytorch有些图像分割方面的数据增强也没有
❓ 2.4.3 RandomRotate
参考:
- https://github.com/search?q=repo%3Aopen-mmlab%2Fmmsegmentation RandomRotate&type=code
- https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/datasets/transforms/transforms.py
- [Bug Fix] Add missing transforms in init.py #260
- mmsegment数据pipeline操作(七)
2.5 评估指标
config修改如下:
test_evaluator = dict(type='IoUMetric', iou_metrics=['mDice'])
改为:(直接在列表里加要的指标就可以了)
test_evaluator = dict(type='IoUMetric', iou_metrics=['mDice','mFscore'])
对应的脚本如下:
// tools/test.py#L41-L46
parser.add_argument(
'--eval',
type=str,
nargs='+',
help='evaluation metrics, which depends on the dataset, e.g., "mIoU"'
' for generic datasets, and "cityscapes" for Cityscapes')
"""
Just add --eval mIoU or --eval mDice or other metric in ./tools/test.py
"""
// https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/core/evaluation/metrics.py#L360
allowed_metrics = ['mIoU', 'mDice', 'mFscore']
....
mIoU
ret_metrics['IoU'] = iou
ret_metrics['Acc'] = acc
...
mDice
ret_metrics['Dice'] = dice
ret_metrics['Acc'] = acc
mFscore
ret_metrics['Fscore'] = f_value
ret_metrics['Precision'] = precision
ret_metrics['Recall'] = recall
参考:
- How can I compute the average recall for a dataset? #1678
- https://github.com/open-mmlab/mmsegmentation/blob/master/tools/test.py#L41-L46
- https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/core/evaluation/metrics.py中搜索
metrics=[
,支持的metrics有:- metrics=[‘mIoU’], metrics=[‘mDice’], metrics=[‘mFscore’],
- https://github.com/open-mmlab/mmsegmentation/blob/main/docs/en/advanced_guides/add_metrics.md
2.6 load_from 和 resume区别
resume-from(0.x到1.x迁移后,已经改为resume )
加载模型权重和优化器状态,epoch是从指定的checkpoint开始的,通常用于恢复突然中断的训练过程。- resume = False # Whether to resume from existed model.
- load-from # Load checkpoint from file.
只会加载模型权重,同时训练的epoch从0开始,一般用于finetune - pretrained # The pretrained backbone to be loaded(这个参数位于model中,仅限于模型的加载,不涉及优化器)
其中,resume
和load-from
是配套使用的。
具体规则,在Runtime settings中,
实际使用的体验是:
- 需要自己不仅
resume=True
,而且需要load_from
指定last_checkpoint的位置 - 否则,如果注释掉
pretrained
,resume=True
,load_from=None
,则会参数从头训练
在mmsegmentation的0.x到1.x版本迁移中,可以看到参数表示变化:
一般只会用到pretrained
。
参考:
- What’s the difference between ‘load_from’ and ‘resume_from’ in configs? #200
- https://mmdetection.readthedocs.io/en/v2.2.0/getting_started.html#train-with-multiple-gpus
- https://mmsegmentation.readthedocs.io/en/latest/user_guides/1_config.html
- https://mmsegmentation.readthedocs.io/en/latest/migration/interface.html#train-launch
2.7 关于tta配置和test.py脚本运行无法保存图像的问题
参考:
- https://mmsegmentation.readthedocs.io/zh-cn/main/migration/interface.html
- Inconsistent evaluation results #2594
- History for mmsegmentation/tools/test.py on main
2.8 tensorboard中显示image这个tab
2.8.1 显示val阶段的推理结果
默认不配置的时候,tensorboard显示的是数据增强的代码的text,
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=2000),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='SegVisualizationHook', draw=True, interval=1))
// 上面的是SegVisualizationHook,是个钩子,下面是SegLocalVisualizer
// 上面这个true,visualize ground truth and prediction of segmentation during model testing and evaluation
// 而下面这个vis_backends和visualizer,主要用在可视化scalars in TensorBoard,并不包括image
vis_backends = [dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend')]
visualizer = dict(
type='SegLocalVisualizer', vis_backends=vis_backends, name='visualizer')
在上面的visualization=dict(type='SegVisualizationHook', draw=True, interval=1))
设置True之后,其实训练结束(经过val阶段),其实可以在tensorboard中看到IMAGES这个tab,其中就显示了
- 有验证集和测试集,
test_
或者val_
前缀+文件名,如果你用的测试集和验证集是一样的,那显示出的其实是重复的结果)
(和直接运行python tools/test.py
脚本效果是一样的) - 同时在对应的
work_dir/vis_data/vis_image/
文件夹中,保存了test的结果。例如:test_31_training.png_0.png
(没有保存val的结果,因为训练过程中会进行多次val)
参考:
- mmsegmentation/docs/zh_cn/user_guides/visualization.md-模型测试或验证期间的可视化数据样本
- https://mmsegmentation.readthedocs.io/en/main/user_guides/visualization.html
2.8.2 ❓显示送入网络的batch
篇幅有限,写在另一个博客:
❓2.9 测试时是在全图上测试还是patch测试
在mmsegmentation-Docs > Train & Test > Tutorial 1: Learn about Configs中,有:
test_cfg=dict(mode='whole')) # The test mode, options are 'whole' and 'slide'. 'whole': whole image fully-convolutional test. 'slide': sliding crop window on the image.
test_cfg=dict(mode='slide', crop_size=(128, 128), stride=(85, 85)),
test_cfg=dict(mode='whole')
❓2.10 RepeatDataset类型的times
参考:
- Dataset Wrapper
- https://mmengine.readthedocs.io/en/v0.6.0/api/generated/mmengine.dataset.RepeatDataset.html#mmengine.dataset.RepeatDataset
- https://github.com/open-mmlab/mmengine/blob/main/mmengine/dataset/dataset_wrapper.py#L195
❓2.11 优化器
- https://mmsegmentation.readthedocs.io/en/main/advanced_guides/engine.html#configuring-hyperparameters-for-different-layers-of-the-model-network
- https://mmsegmentation.readthedocs.io/en/main/migration/interface.html#optimizer-and-schedule-settings
2.12 Data Preprocessor
参考:
2.13 输入转为灰度图进行训练
一个邪门的操作:
- 在pipeline中使用
RGB2Gray
,同时设置输出通道数是3,根据代码,也就是会把灰度通道复制/重复3次 - 网络模型接受的输入是3通道
参考:
- How to configure segformer to handle single channel image(gray image) #3370
- class RGB2Gray(object)
- [Feature] Add Rgb2Gray transform #227
2.14 在tensorboard中添加val_loss(验证集相关的指标)
根据以下网页:
- mmdetection使用tensorboard可视化训练集与验证集指标参数
- mmseg1.0到2.x接口迁移: https://mmsegmentation.readthedocs.io/zh-cn/latest/migration/interface.html#id8
- Validation Loss During Training #2002
可知:
workflow = [('train', 1), ('val', 1)]
workflow 的改动:workflow 相关功能被删除。
根据Logging validation loss without library code hacks#1396以及:
大意就是val_step和train_step重名了,导致val的loss等信息无法打印出来,因此进行修改,但是以下修改也是在0.x版本中
def val_step(self, data_batch, optimizer=None, **kwargs):
"""The iteration step during validation.
This method shares the same signature as :func:`train_step`, but used
during val epochs. Note that the evaluation after training epochs is
not implemented with this method, but an evaluation hook.
"""
losses = self(**data_batch)
loss, log_vars = self._parse_losses(losses)
log_vars_ = dict()
for loss_name, loss_value in log_vars.items():
k = loss_name + '_val'
log_vars_[k] = loss_value
outputs = dict(
loss=loss,
log_vars=log_vars_,
num_samples=len(data_batch['img_metas']))
return outputs
根据数据流概述
所以如果使用了系统默认的ValLoop
,则不会打印出loss,而是会直接传递给Evaluator
,计算评估指标。
所以就会出现我之前遇到的: dice_loss里忽略了背景类,但是Evaluator输出的val的结果中包含了DICE指标
关于ValLoop等的实现,在 mmengine/mmengine/runner/loops.py
在mmsegmentation/docs/zh_cn/advanced_guides/models.md中,有:
- 我们通常将深度学习任务中的神经网络定义为模型,这个模型即是算法的核心。
MMEngine 抽象出了一个统一模型 BaseModel 以标准化训练、测试和其他过程
。MMSegmentation 实现的所有模型都继承自BaseModel
,并且在 MMSegmention 中,我们实现了前向传播并为语义分割算法添加了一些功能。 - https://github.com/open-mmlab/mmengine/blob/main/mmengine/model/base_model/base_model.py#L16
- base_model中有
val_step
和train_step
, 还有test_step
- base_model中有
- https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/segmentors/base.py
2.15 二分类和多分类时的CE中softmax和sigmoid的问题
根据:How to handle binary segmentation task或者中文: mmsegmentation的二值分割注意事项自用笔记
以及代码:mmsegmentation/mmseg/models/decode_heads/decode_head.py
在进行二分类的时候,如果按照上面推荐的第二种方式,设置out_channels=1
,use_sigmoid=True
,则再去设置ignore_index=0
,就会报以下错误
File "/home/ubuntu/huangs/VesselSeg/mmsegmentation/mmseg/models/losses/dice_loss.py", line 178, in forward
loss = self.loss_weight * dice_loss(
File "/home/ubuntu/huangs/VesselSeg/mmsegmentation/mmseg/models/losses/dice_loss.py", line 73, in dice_loss
assert pred.shape[1] != 0 # if the ignored index is the only class
因此,如果进行二分类,但是要设置ignore_index=0
,则只能使用第一种方式,即设置out_channels=2
,use_sigmoid=False
3. 运行
一般如果要使用mmsegmentation的github仓库,而不是单单使用pip安装的作为第三方库的mmsegmentation,就会涉及
- 运行路径的问题,1.1 安装mmseg库和clone repo里提到的, 3.1会详细描述这个问题
- 自己的项目如何和mmsegmentation仓库文件结构整合的问题,3.2会详细描述这个问题
- 自己的datasets,config和训练时生成的文件放在哪里
3.1 当前执行目录和系统路径
如果想在非mmsegmentation
目录下正确使用mmsegmentation
这个repo的各种脚本,那么需要添加系统路径
- 运行时添加
import sys
sys.path.append('~/XXX/mmsegmentation')
"""
把项目添加到系统路径,这样就会在mmsegmentation下寻找mmseg
from mmseg.utils import register_all_modules
这种代码就不会报错找不到 mmseg了
可以打印一下sys.path
['~/XXX/mmsegmentation/projects/vesselSeg/code', ...., '~/XXX/mmsegmentation' ,...]
默认第一个搜索路径是当前脚本所在的父文件夹
确保看到了自己添加的`~/XXX/mmsegmentation`搜索路径
"""
- 根据 mmsegmentation/projects/example_project/README.md, 也可以提前使用bash添加
cd mmsegmentation
export PYTHONPATH=`pwd`:$PYTHONPATH
参考:OpenMMLab【超级视客营】——支持InverseForm Loss(MMSegmentation的第三个PR)中 3.6 报错 ImportError: cannot import name ‘InverseFormLoss’ from ‘mmseg.models.losses’ 部分
3.2 项目结构
这是个人习惯使用的方式,
.
├── CITATION.cff
├── configs
├── dataset-index.yml
├── data // ✅ 新建
├── demo
├── docker
├── docs
├── LICENSE
├── MANIFEST.in
├── mmseg
├── model-index.yml
├── projects // ✅ 删除新建
├── README.md
├── README_zh-CN.md
├── requirements
├── requirements.txt
├── resources
├── setup.cfg
├── setup.py
├── tests
└── tools
在clone好mmsegmentation项目之后,
- 新建
data
文件夹,按照1.3.1 明确数据格式中的内容去放置数据。这样做的好处在于,大部分现有的mmsegmentation/config
中的config.py
文件,都是假设数据集位于data
文件夹写的 - 清空
projects
中的内容,之后自己的项目就放在这里。文件结构可以是:
仅供参考,这是我自己用的. ├── AV_Seg // 项目名称 ├── pretrained // 用到的预训练模型存放的位置 │ └── deeplabv3_unet_s5-d16_ce-1.0-dice-3.0_128x128_40k_stare_20211210_201825-21db614c.pth └── vesselSeg // 另一个项目名称 ├── code // 放项目需要的一些代码 ├── train.py ├── XXX ├── config // 放自己用的完整的配置文件 ├── unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py ├── experiments └── README.md // 本项目的描述
3.3 指定GPU运行
// 以前看到的都是
export CUDA_VISIBLE_DEVICES=0,1 # 0号和1号GPU
export CUDA_VISIBLE_DEVICES="" means CPU
// 这种存在于当前会话期间
// 不过现在好像更多都是使用
CUDA_VISIBLE_DEVICES=3 python ../tools/train.py
多个gpu的话,写法也是类似的
CUDA_VISIBLE_DEVICES=1,3
// 这种就是存在于训练期间
参考:
3.4 训练命令和测试命令
mmsegmentation提供了两个很简单的脚本:
- https://github.com/open-mmlab/mmsegmentation/blob/main/tools/test.py
- https://github.com/open-mmlab/mmsegmentation/blob/main/tools/train.py
直接把这个脚本里面相关的内容改改,就可以变成自己调用的内容了。
最基本的调用命令
// 训练命令
CUDA_VISIBLE_DEVICES=0 python ./projects/vesselSeg/code/train.py \
./projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
--work-dir ./projects/vesselSeg/experiments/unet_deeplabv3_604
// 第一个参数,配置文件,config,不需要加--config,是强制参数
// --work-dir 训练过程中生成的文件存放的位置
// 测试命令
CUDA_VISIBLE_DEVICES=0 python tools/test.py \
./projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
/content/drive/MyDrive/OpenMMLab/workdir/PSPNet/iter_1000.pth\
--work-dir ./projects/vesselSeg/experiments/unet_deeplabv3_604
3.5 推理概率图
其实之前也涉及过相关的操作
- 在 openMMLabCampusLearn/Exercise_2/Exercise_2.ipynb中也分析过调用函数结构输出的数据格式(直接拉到页面最下方)
- 在openMMLabCampusLearn/Exercise_4/Exercise_4.ipynb的3.4 推理部分也直接把推理后的数据结构拆分过
from mmseg.apis import inference_model, init_model, show_result_pyplot
config_file = './projects/vesselSeg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py'
checkpoint_file = './projects/vesselSeg/experiments/unet_deeplabv3_604/iter_40000.pth'
model = init_model(config_file, checkpoint_file, device='cuda:1')
img = '/home/ubuntu/huangs/VesselSeg/Datasets/0323_img/36_青光眼_OS.jpg'
result = inference_model(model, img)
type(result)
> mmseg.structures.seg_data_sample.SegDataSample
result.seg_logits.data.shape
> torch.Size([2, 3072, 4096])
参考:
3.6 可视化——数据增强之后的输入网络的batch
这里注意,mmsegmentation暂时还不支持 2.3.3 tensorboard中显示image这个tab
不过,opencvmmlab下面的库提供了一些便利的脚本,可以用来快速查看输入网络的batch,即:
报错
KeyError: 'AV62Dataset is not in the mmseg::dataset registry. Please check whether the value of `AV62Dataset` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'
所以如果想使用tools里的脚本,还是需要把自定义的数据集配到文件中去,而不是运行时的那种。
正确使用方式:
cd ~/mmsegmentation // 确保是在mmsegmentation目录中
export PYTHONPATH=`pwd`:$PYTHONPATH // 加入系统路径
// 典型使用的命令行
python ./tools/analysis_tools/browse_dataset.py ./projects/AV_Seg/config/unet-s5-d16_deeplabv3_4xb4-ce-1.0-dice-3.0-40k_stare-128x128.py \
--output-dir ./projects/AV_Seg/infer
> [ ] 9/1840000, 0.5 task/s, elapsed: 19s, ETA: 3943510s
不过这个使用其实依然不是很方便,这里训练集一共46个图,保存结果的时候不是按照batch保存显示为一张大图,而是逐文件覆盖,
同时还需要下列修改一下才能看到变换后的img和mask(非叠加格式,而是拼接模式)
// mmsegmentation/tools/analysis_tools/browse_dataset.py
visualizer = VISUALIZERS.build(cfg.visualizer)
visualizer.alpha =1.0 // 加这句,这样保存的时候,mask就是百分百(会忽视之前和背景叠加时的背景),image也是百分百了
visualizer.dataset_meta = dataset.metainfo
visualizer.add_datasample(
name=osp.basename(img_path),
image=img,
data_sample=data_sample,
draw_gt=True,
draw_pred=False,
wait_time=args.show_interval,
out_file=out_file,
withLabels=False, # 把withLabels改为False,不然生成的图像会有标注文字
show=not args.not_show)
// mmsegmentation/mmseg/visualization/local_visualizer.py
mask(变为) → np.ascontiguousarray(mask) 三个都改
mask = cv2.rectangle(np.ascontiguousarray(mask), loc,
(loc[0] + label_width + baseline,
loc[1] + label_height + baseline),
classes_color, -1)
def _draw_sem_seg(self,
...
return color_seg,image // 返回的内容,添加image,不然browse_dataset的时候,只有mask,而没有image
def add_datasample
if gt_img_data is not None and pred_img_data is not None:
drawn_img = np.concatenate((gt_img_data, pred_img_data), axis=1)
elif gt_img_data is not None:
drawn_img = gt_img_data // 这行,改为下面的样子
drawn_img = np.concatenate((gt_img_data, img), axis=1)
最后显示就是这个样子,但是看起来确实不是很方便,暂时可以将就着看。
由于之前删除了git,所以这里修改的记录就比较麻烦
参考:
- https://github.com/open-mmlab/mmsegmentation/pull/2649
- https://github.com/open-mmlab/mmsegmentation/blob/main/tools/analysis_tools/browse_dataset.py
- OpenCV Error When Browser Dataset #3405
4. mmsegmentation一些结构上的内容
4.1 模型结果数据类型
使用softmax转为概率图:
result = inference_model(model, img)
"""
result: inference_model方法直接返回的mmseg.structures.seg_data_sample.SegDataSample格式
"""
seg_logits = result.seg_logits.data.cpu().numpy()
probility = softmax(seg_logits, axis=0)
probility= (np.clip((probility[1]),0, 1) * 255).astype(np.uint8)
# 转成可以显示或者保存的单通道图像
SegDataSample这种数据类型包含三个数据类型为PixelData
的变量,分别是
- gt_sem_seg,标注信息(真值mask)
pred_instances,预测的mask,文档 mmsegmentation-structures已经过期,还是直接看代码里写的内容吧,预测结果不叫pred_instances
, 而是pred_sem_seg
- seg_logits,网络的直接输出,没有经过激活函数等正则化过的——The raw (non-normalized) predicted result.
其中PixelData
类型的定义:
- 在mmsegmentation/mmseg/models/segmentors/base.py中,有
from mmengine.structures import PixelData
- mmengine/mmengine/structures/pixel_data.py
class PixelData(BaseDataElement):
"""Data structure for pixel-level annotations or predictions.
All data items in ``data_fields`` of ``PixelData`` meet the following
requirements:
- They all have 3 dimensions in orders of channel, height, and width.
- They should have the same height and width.
Examples:
>>> metainfo = dict(
... img_id=random.randint(0, 100),
... img_shape=(random.randint(400, 600), random.randint(400, 600)))
>>> image = np.random.randint(0, 255, (4, 20, 40))
>>> featmap = torch.randint(0, 255, (10, 20, 40))
>>> pixel_data = PixelData(metainfo=metainfo,
... image=image,
... featmap=featmap)
>>> print(pixel_data.shape)
(20, 40)
>>> # slice
>>> slice_data = pixel_data[10:20, 20:40]
>>> assert slice_data.shape == (10, 20)
>>> slice_data = pixel_data[10, 20]
>>> assert slice_data.shape == (1, 1)
>>> # set
>>> pixel_data.map3 = torch.randint(0, 255, (20, 40))
>>> assert tuple(pixel_data.map3.shape) == (1, 20, 40)
>>> with self.assertRaises(AssertionError):
... # The dimension must be 3 or 2
... pixel_data.map2 = torch.randint(0, 255, (1, 3, 20, 40))
"""
参考:
- https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/structures/seg_data_sample.py
- mmsegmentation-structures
关于这里的softmax,假设是一个多分类问题,比如3个类别,那么输出的result.seg_logits
的维度就应该是:(classes_num, height, width),这里假设是(3,4,5)
那么
probility = softmax(seg_logits, axis=0)
# axis=0,则会输出一个4*5的矩阵,也就是刚好是图像的高和宽,即mask
关于axis轴的理解,详见:numpy中关于数组维度的理解——dim和axis
4.2 mmcv读取图像是bgr
注意,mmcv≠opencv,以imread为例:
默认使用的后端是
imread_backend = 'cv2'
所以读图等操作其实和opencv是差不多的
参考:
- https://github.com/open-mmlab/mmcv/blob/main/mmcv/image/io.py
- https://github.com/open-mmlab/mmengine/blob/main/mmengine/fileio/file_client.py#L285
- https://mmcv-jm.readthedocs.io/en/latest/image.html
- https://mmcv.readthedocs.io/en/latest/understand_mmcv/data_process.html
5. git保存自己的内容
5.1 删除mmsegmentation的.git信息
由于直接clone下来的mmsegmentation
有自己的.git
信息,为了排除mmsegmentation
的.git
信息的影响,可以直接删除.git
文件夹
~/mmsegmentation$ rm -rf .git
删除之后,vscode的source control
这个tab部分的changes会消失。
如果没有消失的话,ll
命令看一下目录下所有的文件(包括隐藏文件)是不是没有删除掉.git
文件夹。
5.2 新建自己的git repo
删除之后,可以新建一个git repo,然后关联自己的project文件夹。
git init
git add *
git commit -m "init commit"
git branch -M main
git config --local user.name "huangshan"
git config --local user.email "hs8023hfp@gmail.com"
git remote add origin https://github.com/CastleDream/mmsegProject.git
git push -u origin main
fatal: unable to access 'https://github.com/CastleDream/mmsegProject.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
# 报错之后,使用sudo,根据参考,可能是权限不够
sudo git push --set-upstream origin main
# 可能会继续报错,运气好的话,网速好,会提示让输入用户名和密码
Username for 'https://github.com':
Password for 'https://huangshan@github.com':
# 不过会以失败告终,因为现在必须是ssh的token才行
5.3 给新机器配置ssh
根据git传输时使用的两种协议ssh和http的区别的🌸5. ssh使用去为新的机器设置ssh的token
Your identification has been saved in /home/ubuntu/.ssh/id_ed25519
Your public key has been saved in /home/ubuntu/.ssh/id_ed25519.pub
如果期间报错:
~/mmsegmentation/projects$ git push
fatal: The current branch main has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin main
To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.
~/mmsegmentation/projects$ git push --set-upstream origin main
fatal: unable to access 'https://github.com/XXX/XXX.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
可以考虑:
sudo apt-get install build-essential fakeroot dpkg-dev
// 更新一下git就可以,这是以前的一个bug
sudo apt install git
// 更新之后再去 git push --set-upstream origin main
// 就会提示要打开网页,输入用户名和密码,以及邮箱的验证码,然后就可以正常传输了
参考:
- git报错:The TLS connection was non-properly terminated.
- ✅GnuTLS recv error (-110): The TLS connection was non-properly terminated
- GnuTLS recv error (-110): The TLS connection was non-properly terminated #3994
5.4 确认配置的github账号是local
配置完之后,可以确认一下,自己配置的github账号是否是local的
~/mmsegmentation/projects$ git config --local user.name
yourname
~/mmsegmentation/projects$ git config --local user.email
youremail@gmail.com
~/mmsegmentation/projects$ git config --global user.email
anothername
~/mmsegmentation/projects$ git config --global user.name
another_email
5.5 确认关联的远程库正确
git remote -v
origin https://github.com/XX/XXX.git (fetch)
origin https://github.com/XX/XX.git (push)
// 还是以前遇到过的老问题,关联remote的时候关联的是https协议,导致push时候依然需要用户名和密码
~/mmsegmentation/projects$git push
Username for 'https://github.com': huangshan
Password for 'https://huangshan@github.com':
remote: Support for password authentication was removed on August 13, 2021.
remote: Please see https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls for information on currently recommended modes of authentication.
fatal: Authentication failed for 'https://github.com/CastleDream/mmsegProject.git/'
// 把remote的https协议改为git协议即可,然后就可以正常push了
~/mmsegmentation/projects$ git remote set-url origin git@github.com:CastleDream/mmsegProject
~/mmsegmentation/projects$ git push
Enumerating objects: 32, done.
Counting objects: 100% (32/32), done.
Delta compression using up to 72 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (19/19), 387.21 KiB | 2.16 MiB/s, done.
Total 19 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.
To github.com:CastleDream/mmsegProject
27b16ca..37d21a3 main -> main
参考: github设置token后依然提示要输入用户名和密码
5.6 将project项目设置为mmsegmentation的Submodule项目
这样两个项目可以独立提交。。
5.6.1 主项目-mmsegmentation的配置
// 用ssh的方式来clone项目
git clone git@github.com:CastleDream/mmsegmentation.git // 默认只clone了main分支
// 配置局部用户名
git config --local user.name yourname
git config --local user.email youremail
// 添加原代码库为上游代码库
git remote add upstream git@github.com:open-mmlab/mmsegmentation
// 查看远程repo绑定
git remote -v
origin git@github.com:CastleDream/mmsegmentation.git (fetch)
origin git@github.com:CastleDream/mmsegmentation.git (push)
upstream git@github.com:open-mmlab/mmsegmentation (fetch)
upstream git@github.com:open-mmlab/mmsegmentation (push)
// 查看本地分支
git branch
*main
// 查看所有分支(本地和远程)
git branch -a
* main
remotes/origin/AddLabel2Visualize
remotes/origin/CastleDream/ADD_BDD100K
remotes/origin/HEAD -> origin/main
remotes/origin/SupportInverseFormLoss
remotes/origin/main
// 把upstream上最新的所有内容都拉到本地
git fetch upstream
// 基于dev-1.x创建新分支,此时用git branch检查,其实会发现,当前已经位于新创建的这个MediWorks分支了
git checkout -b MediWorks upstream/dev-1.x
// 删除project和多余的文件
git rm -rf projects
参考:OpenMMLab——BDD100K数据集(MMSegmentation的第一个PR)
5.6.2 子项目project的配置
这里和网上常规教程相比,有一个比较特殊的地方,就是我是对某个repo的某个分支设置子仓库的,因此
$ git submodule add -b MediWorks git@github.com:CastleDream/mmsegProject.git projects
// 把子项目放到projects文件夹中
// 如果忘记加-b分支,或者子项目保存的路径写错,可以
$ git rm --cached projects
在使用submodule之后,其实会多出一个`.gitmodules`文件,内容如下:
[submodule "projects"]
path = projects
url = git@github.com:CastleDream/mmsegProject.git
branch = MediWorks
// 确认关联正确
$ git status
On branch MediWorks
Your branch is up to date with 'upstream/dev-1.x'.
参考:
- https://stackoverflow.com/questions/1777854/how-can-i-specify-a-branch-tag-when-adding-a-git-submodule
- How can I specify a branch/tag when adding a Git submodule?
5.6.3 后续操作
主仓库和子项目分别更新:
- 之后主仓库更新,切换到主仓库目录下去pull
- projects仓库更新,切换到projects仓库下去pull
- 但是即便projects更新的位置就是在主仓库的文件夹中,git上的remote显示也不会主动更新。。略微有点违和,但是也可以理解
- 相当于主仓库中的submodule只是一个软链接,链接到了创建submodule时候的那个节点,如果要更新github上/远端的这个节点,需要额外在主仓库
git pull --recurse-submodules
- 相当于主仓库中的submodule只是一个软链接,链接到了创建submodule时候的那个节点,如果要更新github上/远端的这个节点,需要额外在主仓库
- 后来发现。。。如果你修改了主仓库下的submodule仓库,默认改动并不会进行记录(这一操作可能是因为submodule可能是来自于别人的仓库)
- submodule正确的更新顺序
- 另一个本地submodule→submodule原始的remote仓库→本地仓库更新submodule文件夹中的内容。
- 所以感觉不太好用,算了,不要这个submodule了,还是直接全部复制过去吧,都归档在mmsegmentation文件夹下好了,改起来比较方便。
# 1. 删除子模块文件夹
$ git rm --cached projects
rm 'projects'
# 2. 删除文件 .gitmodules
# 3. 删除 .git/config 中相关子模块信息
cd .git
vi config
[submodule "projects"]
url = git@github.com:CastleDream/mmsegProject.git
active = true
参考:Unable to find current origin/master revision in submodule path
// 配进去之后,就可以提交了
git add *
git commit -m "update submodule"
# 第一次推送,可以在 git push 后加上 -u 参数以关联远程分支,这个origin还是自己的remote,不是upstream的那个,upstream用PR来交互
git push -u origin MediWorks
就会看到那个submodule的文件夹长得不太一样
子模块的操作和主仓库是分开进行的,
- 主仓库目录中除了子模块外的任何子目录下进行的 commit 操作,都会记到主仓库下。
- 只有在子模块目录内的任何 commit 操作,才会记到子模块仓库下
更新项目内子模块到最新版本
$ git submodule update
更新子模块为远程项目的最新版本
$ git submodule update --remote
clone含有submodule的仓库
git clone XXX
// 那么得到的submodule文件夹是空的,还需要
git submodule update --init --recursive // 递归的初始化并下载子模块仓库的内容
git pull // 或者cd到相应的子模块目录中,进行拉取
如果是首次clone,那么
git clone --recursive <project url>
参考:
- ✅Git: submodule 子模块简明教程
- GitHub Desktop下添加Submodule
- https://github.blog/2016-02-01-working-with-submodules/
- https://git-scm.com/docs/git-submodule
- https://www.git-scm.com/book/en/v2/Git-Tools-Submodules
X.报错
X.1使用/安装库相关的报错
1.(tensorboard)ImportError: cannot import name ‘notf’ from ‘tensorboard.compat’
经验之谈:
遇到tensorboard报错,先去安装tensorflow,很可能是因为tensorflow没有安装。。
参考: ImportError: cannot import name ‘notf’ from ‘tensorboard.compat’
报错:
ImportError: Please run "pip install future tensorboard" to install the dependencies to use torch.utils.tensorboard (applicable to PyTorch 1.1 or higher)
但是实际上已经安装了tensorboard
,同时按照上面的要求运行了"pip install future tensorboard"
。本质上是因为没有安装tensorflow
pip install tensorflow
即可
报错:
File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 45, in tf
import tensorflow
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 438, in <module>
_ll.load_library(_main_dir)
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 154, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringB5cxx11ERKNS_15OpKernelContextEb
根据tensorflow/text/issues/385, tensorflow/issues/47212可知,换版本
pip install -U tensorflow # 从2.4到了2.13
# 但是本机的numpy版本很高, 报错
File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
后来发现报错的位置使用的tensorflow是.local文件夹的,而不是openmmlab这个conda环境里的,去.local文件夹里删除相应的tensorflow库就好了
2. from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
File "/home/ubuntu/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
_descriptor.FieldDescriptor(
File "/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 553, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
参考:TypeError: Descriptors cannot not be created directly
运行:
pip install protobuf==3.20.*
X.2 已经切换了conda环境,但是访问时优先使用了.local
里的库(不要用mim安装)
X.2.1 问题描述和有用信息
在已经conda activate openmmlab
的情况下运行某个脚本,报错:
Traceback (most recent call last):
File "./projects/vesselSeg/code/train.py", line 168, in <module>
train()
File "./projects/vesselSeg/code/train.py", line 154, in train
runner = Runner.from_cfg(cfg)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 439, in from_cfg
runner = cls(
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/runner/runner.py", line 395, in __init__
self.visualizer.add_config(self.cfg)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/dist/utils.py", line 366, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/visualizer.py", line 1068, in add_config
vis_backend.add_config(config, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/vis_backend.py", line 56, in wrapper
obj._init_env() # type: ignore
File "/home/ubuntu/.local/lib/python3.8/site-packages/mmengine/visualization/vis_backend.py", line 537, in _init_env
raise ImportError(
ImportError: Please run "pip install future tensorboard" to install the dependencies to use torch.utils.tensorboard (applicable to PyTorch 1.1 or higher)
正常调用的应该是是:/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages
里的包,而这里调用的是/home/ubuntu/.local/lib/python3.8/site-packages
里的包。
根据How to prevent anaconda environment from reading libraries installed in local
(openmmlab) (base) ubuntu@ubun:~/XX/mmsegmentation$ python -m site
sys.path = [
'/home/ubuntu/XX/mmsegmentation',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python38.zip',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/lib-dynload',
'/home/ubuntu/.local/lib/python3.8/site-packages',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages',
// .local的site-packages位于openmmlab的site-packages之前,也就是会先去.local里搜索包
]
USER_BASE: '/home/ubuntu/.local' (exists)
USER_SITE: '/home/ubuntu/.local/lib/python3.8/site-packages' (exists)
ENABLE_USER_SITE: True
// 切换python的site,就看不到local了。。但是好像没啥用
(openmmlab) (base) ubuntu@ubun:~/XX/mmsegmentation$ python -s -m site
sys.path = [
'/home/ubuntu/XX/mmsegmentation',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python38.zip',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/lib-dynload',
'/home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages',
]
USER_BASE: '/home/ubuntu/.local' (exists)
USER_SITE: '/home/ubuntu/.local/lib/python3.8/site-packages' (exists)
ENABLE_USER_SITE: False
更详细的关于隔离conda环境和.local
的讨论可以看看:Isolating conda environments from ~/.local #7173
可以:
(openmmlab) (base) ubuntu@ubun:~/XX$ which python
/home/ubuntu/anaconda3/envs/openmmlab/bin/python
(openmmlab) (base) ubuntu@ubun:~/XX$ conda deactivate
(base) (base) ubuntu@ubun:~/XX$ conda deactivate
// 用 conda deactivate就可以退出conda环境了
ubuntu@ubun:~/XX$ which python
/home/ubuntu/anaconda3/bin/python
// 按理说这里应该显示 /usr/bin/python
// 但是这里还是指向了anaconda默认base环境的python。。。
所以服务器环境应该是有点问题。
X.2.2 粗暴解决
目前可以想到的解决办法就是:删除不希望从.local
找的包。。这样就会去调用conda env里的包了
cd /home/ubuntu/.local/lib/python3.8/site-packages/
rm -rf XXX库
后来发现是因为MIM安装的原因,似乎MIM会把包装到.local
环境下,删除.local
环境下的mmengine库之后,运行代码会报错
找不到mmengine库。
直接在相应的conda环境里pip install mmengine
即可,再去
pip show mmengine
Name: mmengine
Version: 0.10.1
Summary: Engine of OpenMMLab projects
Home-page: https://github.com/open-mmlab/mmengine
Author: MMEngine Authors
Author-email: openmmlab@gmail.com
License: UNKNOWN
Location: /home/ubuntu/anaconda3/envs/openmmlab/lib/python3.8/site-packages
Requires: addict, matplotlib, numpy, opencv-python, pyyaml, rich, termcolor, yapf
Required-by: mmcv
就可以看到安装在了对的位置
参考: