一. 基本的训练教程见:
1. github地址:https://github.com/smallcorgi/Faster-RCNN_TF.git
2.教程地址:
二. 在训练过程过遇到的问题:
1. 创建环境:
source activate base
conda env create --name faster_rcnn python=2.7
pip install
2. 安装tensorflow
2.1 查询本机的cuda和cudnn
cat /usr/local/cuda/version.txt
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
2.2 选择匹配的tensorflow-gpu的版本:https://www.jianshu.com/p/1d6a508a8d04
2.3 安装对应的tensorflow-gpu命令:https://blog.csdn.net/z349177893/article/details/101535203
3. 数据上传服务器:rz
4. 生成train.txt、val.txt、trainval.txt、test.txt文件,(参考自https://blog.csdn.net/hitzijiyingcai/article/details/81636455,感谢大佬)
import os
import random
trainval_percent = 0.5
train_percent = 0.5
xmlfilepath = '/home/hqd/桌面/VOC2010/Annotations'
txtsavepath = '/home/hqd/桌面/VOC2010/ImageSets/Main'
total_xml = os.listdir(xmlfilepath)
num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)
ftrainval = open(txtsavepath+'/trainval.txt', 'w')
ftest = open(txtsavepath+'/test.txt', 'w')
ftrain = open(txtsavepath+'/train.txt', 'w')
fval = open(txtsavepath+'/val.txt', 'w')
for i in list:
name=total_xml[i][:-4]+'\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest .close()
4. _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE错误
在lib中的make.sh中加入-D_GLIBCXX_USE_CXX11_ABI=0:
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 \
-lcudart -L $CUDA_PATH/lib64
5. _tkinter.TclError: no display name and no $DISPLAY environment variable错误
5.1 将import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')
修改成:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
5.2安装xming:https://blog.csdn.net/qq_24036403/article/details/86535401
6. 命令输入:./experiments/scripts/faster_rcnn_end2end.sh $DEVICE $DEVICE_ID VGG16 pascal_voc
应该为: ./experiments/scripts/faster_rcnn_end2end.sh --device gpu --device_id 0 VGG16 pascal_voc
6. raise ValueError("Object arrays cannot be loaded when "
numpy版本太高,切换成低版本的
pip install numpy==1.16.2
7. 在后台运行的命令:nohup ./experiments/scripts/faster_rcnn_end2end.sh gpu 0 VGG16 pascal_voc > test.log 2>&1 &
实时查看的命令:tail -f test.log/cat test.log
demo运行命令:python ./tools/demo.py --model /share2/home/qingdong/sharedir/Faster-RCNN_TF/tools//model/VGGnet_fast_rcnn_iter_70000.ckpt
8. make时报错:
python setup.py build_ext --inplace
File "setup.py", line 84
print extra_postargs
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(extra_postargs)?
Makefile:2: recipe for target 'all' failed
可能是你的没有进入python的环境,需要先切换成有tensorflow的python,再运行make,就可以成功。
9. 用自己的数据集报错:
Preparing training data...
Traceback (most recent call last):
File "./tools/train_net.py", line 83, in <module>
roidb = get_training_roidb(imdb)
File "/home/user/work/faster_rcnn/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 212, in get_training_roidb
rdl_roidb.prepare_roidb(imdb)
File "/home/user/work/faster_rcnn/Faster-RCNN_TF/tools/../lib/roi_data_layer/roidb.py", line 27, in prepare_roidb
roidb[i]['image'] = imdb.image_path_at(i)
IndexError: list index out of rang
解决方案:删除fast-rcnn-master/data/cache/ 文件夹下的.pkl文件,或者改名备份,重新训练即可。
问题3:https://www.jianshu.com/p/08c1faa38358