深度学习 --- stanford cs231 编程作业(assignment1，Q2: SVM分类器)

本文链接：https://blog.csdn.net/daduzimama/article/details/139152673

stanford cs231 编程作业之SVM分类器

写在最前面：

深度学习，或者是广义上的任何学习，都是“行千里路”胜过“读万卷书”的学识。这两天光是学了斯坦福cs231n的一些基础理论，越往后学越觉得没什么。但听的云里雾里的地方也越来越多。昨天无意中在这门课的官网上无意中看到了对应的assignments。里面的问题和code都设计的极好！自己在做作业的时候，也才真的认识到“纸上得来终觉浅，绝知此事要躬行。”此言不虚。下面是我自己作业的相关笔记，为了记录也为了分享。

作业相关的code可从这里下载：

Assignment 1

相应的安装方法官网上有详细说明，我自己这里也写了一篇安装说明：

深度学习 --- stanford cs231 编程作业(如何在chrome中安装Google colab）-CSDN博客

1，这个作业要我们做什么？

Todo list:

下面我结合代码逐一作出说明。

2，assignment coding --- CIFAR-10数据集

2，1，配置环境，sync Google Drive with Google Colab

# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = 'google colab/cs231/assignments/assignment1/'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

后面为了debug我在这里加了ipdb。

!pip install ipdb
import ipdb

这里load了几个常用的库numpy和matplotlib，值得一提的是专门load了一个保存在“cs231n”目录下的“data_utils.py”文件中的名叫“load_CIFAR10”的函数。

# Run some setup code for this notebook.
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

2，2，Load Data

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

load CIFAR-10数据（保存在'cs231n/datasets/cifar-10-batches-py'），并且用到了上面的那个函数“load_CIFAR10”。

这里面有个read me，里面是个网站。

CIFAR-10 and CIFAR-100 datasets

网站上有关于CIFAR-10数据库的说明：

首先，这个数据集是由Alex Krizhevsky(好像他就是著名的AlexNet的作者), Vinod Nair, and Geoffrey Hinton这三个人收集起来的。

CIFAR-10总共有60000张32x32个像素的小图，包含10个种类，每一类都有6000张，共60000张。其中，50000张可专门用于训练模型，即，训练组。另外10000张用于测试训练好的模型，它通过在每个类别中随机选出的1000张图像组成，共10*1000=10000张，即，测试组。

根据输出的结果来看，变量X_train, y_train, X_test, y_test中保存数组的尺寸和官方的说明一致。共50000个数据for train，10000个数据for test。其中，X_train和X_test保存的是彩色图像。而y_train和y_test，保存的是这些图像所对应的种类（0~9之间的一个数）。（下面的这段print是我自己加的。）

2，3，Data visualize数据的可视化

# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
#遍历classes数组，其中y返回的是数组classes的索引,cls返回的是数组classes的Data。例如y=0,cls='plane'.
for y, cls in enumerate(classes):
    #在y_train数组中找出所有等于y的元素，并保存到idxs中
    idxs = np.flatnonzero(y_train == y)
    #在idxs中随机选择7个元素，保存到idxs中
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    #遍历idxs数组，其中i返回的是数组idxs的索引,idx返回的是数组idxs所保存的内容。例如y=0,idx=35628.
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

这些代码是官方写好的，我这里只是大概介绍一下。这段代码主要实现了两个功能，其中上部分代码是在选图。

大意就是先找到某一类图像在y_train中所有的位置。例如在下图的debug信息中，当y=0时，(对应的cls="plane")函数np.flatnonzero汇总了5000个lable=y=0的图像在y_train中的位置信息并返回给idxs。

接下来，在idxs所保存的5000个“第0类”图像中随机选出了7个。

这一第一部分的工作，下面的工作就是画图了。遍历上一步保存在数组“idxs”中的7张图，并通过subplot的方式画在同一个figure里。其中，对plt.subplot(samples_per_class, num_classes, plt_idx) 而言：samples_per_class(表示每个类的样本数)是行数，num_classes(表示类别数) 是列数，plt_idx 是子图的位置。

因此，就本例而言共有10个类别，每个类别显示7个样本，总共有70个子图，它们将被排列在一个7行10列的网格中。

得到如下结果（这里每个人的运行结果都可能不一样，因为他是随机抽取的）：

2，4，分配数据集

# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

#ipdb.set_trace()

# Our validation set(验证集) will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set(训练集) will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# 用于开发的样本子集
# We will also make a development set(开发集), which is a small subset of
# the training set.
'''
np.random.choice：函数用于从一个给定的一维数组中随机采样。参数 replace 控制采样时是否允许重复。
当 replace=True 时，采样是有放回的；当 replace=False 时，采样是无放回的。

具体来说：
replace=True：允许重复采样，即每次从数组中随机选择一个元素，选择后该元素依然可以被再次选择。
这样可能会出现相同的元素被多次选择的情况。
replace=False：不允许重复采样，即每次从数组中随机选择一个元素，选择后该元素会被移除，不再参与后续的采样。
这样每个元素只会被选择一次。
'''
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Development data shape: ', X_dev.shape)
print('Development labels shape: ', y_dev.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

从50000个训练集中选第49000~49999个数据作为验证集X_val，y_val。选第0~49000个数据作为训练集X_train，y_train。同时，在0~49000个训练集中随机选择500个数据作为开发集X_dev，y_dev（好像，在后续的训练中只用到了X_dev，y_dev）。最后，在10000个测试集中，选前1000个样本作为测试集X_test，y_test。

2，5，数据的预处理

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

还记得在课程的PPT中，我要把图像拉成一个向量后才进行计算的吗？

这里预处理的第一步就是把所有的数据中的图像都展开成一个向量。

比如说这里的X_train展开后的维度是49000行x3072列，说明每张32x32x3的图都被展开成了一个行向量，共49000行。

# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data

mean_image = np.mean(X_train_flatten, axis=0)
#axis=0：按列求均值。
#axis=1&#x