「解析」cache训练模型增强性能

ViatorSun

已于 2023-04-23 13:53:46 修改

阅读量810

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Pytorch 文章标签： Pytorch cache disk ram

于 2023-04-23 13:42:43 首次发布

本文链接：https://blog.csdn.net/ViatorSun/article/details/130198476

Pytorch 专栏收录该内容

39 篇文章

订阅专栏

实验采用的模型是 YOLOv5-7.0版本，m模型配置，便于实验测试，以 coco2012 val 数据集为例，

可以看出来，通过 ram 方式的训练并没有很明显的提升，分析原因可能是硬盘本身的读取速度较快，可以满足显卡的吞吐，但是通过 ram形式训练占用内存比较大，如果内存足够的小伙伴可以尝试使用 ram形式进行训练，如果内存较小的情况，可以考虑更换速度较快的固态硬盘用作数据盘。

此外对于系统开设较多进程的情况，会降占用CPU的性能以及硬盘的读取速度，从而降低 GPU的数据读取，影响运算性能，故还是建议使用ram形式进行训练。

""" 博主的主机配置 """

CPU		： 	Intel 13700k
GPU		：	Nvidia 4090 
硬盘	：	致钛TiPro7000
内存	：	金士顿 FURY D5 6000 EXPO 16G x 4
主板	：  华硕 ROG STRIX Z690-G

实验	batch size	内存占用	训练时长
ram	16	28.7GB	28:29
disk	16	7.2GB	28:17
ram	auto	29.5GB	20:24
disk	auto	8.3GB	20:29

将 Data 移至 ram

实验结果

batch size 为16，YOLOv5-7.0 m 模型
# From ram
Transferred 481/481 items from yolov5m.pt
AMP: checks passed 
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), 
				CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
train: 95.1GB RAM required, 42.9/63.7GB available, not caching images 
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00
val: Caching images (4.1GB ram): 100%|██████████| 5000/5000 00:01

AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Plotting labels to runs\train\exp5\labels.jpg... 
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp5
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/99      5.72G    0.03863    0.05964    0.01552        206        640: 100%|██████████| 7393/7393 28:29
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 157/157 00:28
                   all       5000      36335       0.69      0.562      0.606      0.415


#---------------------------------------------------------------------------------------------------------------------------------

# From Disk
Transferred 481/481 items from yolov5m.pt
AMP: checks passed 
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), 
				CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00

AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Plotting labels to runs\train\exp4\labels.jpg... 
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp4
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/99      5.72G    0.03863    0.05964    0.01552        206        640: 100%|██████████| 7393/7393 28:17
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 157/157 00:27
                   all       5000      36335       0.69      0.562      0.606      0.415