Cs231n作业:Q1-4 Two-Layer Neural Network exercise(未完成)

不想你离开。 提交于 2020-02-07 08:01:25

two_layer_net

Implementing a Neural Network

在这个练习中,我们将开发一个具有全连接层的神经网络来执行分类,并在CIFAR-10数据集上进行测试。

# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from cs231n.classifiers.neural_net import TwoLayerNet

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

我们将在cs231n/classifier /neural_net.py文件中使用类TwoLayerNet来表示我们的网络实例。网络参数存储在实例变量self中。其中键是字符串参数名,值是numpy数组。
下面,我们初始化toy data和一个toy model,我们将使用它来开发您的实现。

# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
    np.random.seed(0)
    return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
    np.random.seed(1)
    X = 10 * np.random.randn(num_inputs, input_size)
    y = np.array([0, 1, 2, 2, 1])
    return X, y

net = init_toy_model()
X, y = init_toy_data()

Forward pass: compute scores

打开文件cs231n/classifier/neural_net.py,查看方法TwoLayerNet.loss。这个函数非常类似于您为SVM和Softmax练习编写的损失函数:它获取数据和权重,并计算类分数、损失和参数上的梯度。

    def loss(self, X, y=None, reg=0.0):
        """
        Compute the loss and gradients for a two layer fully connected neural
        network.计算两层完全连通神经网络的损耗和梯度。

        Inputs:
        - X: Input data of shape (N, D). Each X[i] is a training sample.
        - 形状输入数据(N, D),每个X[i]为训练样本。
        - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
          an integer in the range 0 <= y[i] < C. This parameter is optional; if it
          is not passed then we only return scores, and if it is passed then we
          instead return the loss and gradients.
          训练标签向量。y[i]是X[i]的标签,每个y[i]都是0 <= y[i] < c范围内的整数。
          这个参数是可选的;如果它没有通过,那么我们只返回分数,如果它通过了,那么我们反而返回损失和梯度。
        - reg: Regularization strength.

        Returns:
        If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
        the score for class c on input X[i].
        如果y为零,返回一个shape (N, C)的矩阵得分,其中score [i, C]是C类在输入X[i]时的得分。

        If y is not None, instead return a tuple of: 如果y不为None,则返回一个元组
        - loss: Loss (data loss and regularization loss) for this batch of training
          samples.损失:这批训练样本的损失(数据损失和正则化损失)。
        - grads: Dictionary mapping parameter names to gradients of those parameters
          with respect to the loss function; has the same keys as self.params.
          字典将参数名称映射到这些参数相对于损失函数的梯度;具有与self.params相同的键。
        """
        # Unpack variables from the params dictionary
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        N, D = X.shape

        # Compute the forward pass
        scores = None
        #############################################################################
        # TODO: Perform the forward pass, computing the class scores for the input. #
        # 执行向前传递,计算输入的类分数。
        # Store the result in the scores variable, which should be an array of      #
        # shape (N, C).将结果存储在scores变量中,该变量应该是一个shape (N, C)数组。                                                             #
        #############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # 网络构架:输入->h1全连接->ReLu->h2全连接->softmax
        h1 = X.dot(W1) + b1  # NxD * DxH = NxH (5x10)
        relu_h1 = np.maximum(0,h1)  # NxH (5x10)
        scores = relu_h1.dot(W2) + b2  # NxH * HxC = NxC 即5x10 * 10x3 = 5x3

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # If the targets are not given then jump out, we're done
        if y is None:
            return scores

        # Compute the loss
        loss = None
        #############################################################################
        # TODO: Finish the forward pass, and compute the loss. This should include  #
        # both the data loss and L2 regularization for W1 and W2. Store the result  #
        # in the variable loss, which should be a scalar. Use the Softmax           #
        # classifier loss.                                                          #
        # 完成向前传播,计算损失。这应该包括W1和W2的数据丢失和L2正则化。                 #
        # 将结果存储在变量loss中,该变量应该是标量。使用Softmax分类器损失。              #
        #############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        exp_scores = np.exp(scores)  # e^scores NxC (5x3)
        sum_scores = np.sum(exp_scores, axis=1, keepdims=True)  # NxC (5x3)
        
        probability_scores = exp_scores / sum_scores  # 将每个分数转换为概率,NxC (5x3)
        loss_matrix = -np.log(probability_scores[np.arange(N),y])  # 计算样本损失值,N,
        loss = np.sum(loss_matrix)  # 1,
        
        loss /= N
        loss += reg*np.sum(W1*W1) + reg*np.sum(W2*W2)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # Backward pass: compute gradients
        grads = {}
        #############################################################################
        # TODO: Compute the backward pass, computing the derivatives of the weights #
        # and biases. Store the results in the grads dictionary. For example,       #
        # grads['W1'] should store the gradient on W1, and be a matrix of same size #
        #############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # delta3 = dloss / dz3  ,NxC(5x3)
        delta3 = probability_scores # 初始化为各个标签的概率
        delta3[np.arange(N), y] -= 1  # 其中,对应正确标签概率-1
        grads['W2'] = relu_h1.T.dot(delta3)  # HxN * NxC = HxC (10x3)
        grads['W2'] /= N
        grads['W2'] += reg * W2
        
        grads['b2'] = np.ones(N).dot(delta3) / N  # 1x5 * 5x3 = 1x3=(3,)
        
        drelu_dh1 = np.zeros_like(h1)  # NxH 初始化为0 
        drelu_dh1[h1>0] = 1  # 若线性部分>0的,初始化为1,其他的均为0
        delta2 = delta3.dot(W2.T) * drelu_dh1
        
        grads['W1'] = X.T.dot(delta2) / N + reg * W1
        grads['b1'] = np.ones(N).dot(delta2) / N

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        return loss, grads

实现前向传播的第一部分,该部分使用权重和偏差来计算所有输入的分数。

scores = net.loss(X)
print('Your scores:')
print(scores)
print()
print('correct scores:')
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print(correct_scores)
print()

# The difference should be very small. We get < 1e-7
print('Difference between your scores and correct scores:')
print(np.sum(np.abs(scores - correct_scores)))

输出:

Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.6802720496109664e-08

Forward pass: compute loss

在同一函数中,实现了计算数据和正则化损失的第二部分。

loss, _ = net.loss(X, y, reg=0.05)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print('Difference between your loss and correct loss:')
print(np.sum(np.abs(loss - correct_loss)))

输出:

Difference between your loss and correct loss:
1.794120407794253e-13

Backward pass

实现函数的其余部分。这将计算关于变量W1 b1 W2 b2的损失梯度。现在,您(希望!)有一个正确实现的前向传递,您可以调试您的后向传递使用数值梯度检查:

from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.05)

# these should all be less than 1e-8 or so
for param_name in grads:
    f = lambda W: net.loss(X, y, reg=0.05)[0]
    param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
    print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))

输出:

W2 max relative error: 3.333333e-01
b2 max relative error: 3.865039e-11
W1 max relative error: 8.002490e-01
b1 max relative error: 2.738423e-09

Train the network

了训练网络,我们将使用随机梯度下降(SGD),类似于SVM和Softmax分类器。看看函数TwoLayerNet.train并填写缺失的部分,实施训练程序。这应该非常类似于SVM和Softmax分类器的训练过程。您还必须实现TwoLayerNet.predict当网络训练时,训练过程会周期性地进行预测,以保持跟踪的准确性。

打开文件cs231n/classifier/neural_net.py,实现方法TwoLayerNet.trainTwoLayerNet.predict

    def train(self, X, y, X_val, y_val,
              learning_rate=1e-3, learning_rate_decay=0.95,
              reg=5e-6, num_iters=100,
              batch_size=200, verbose=False):
        """
        Train this neural network using stochastic gradient descent.

        Inputs:
        - X: A numpy array of shape (N, D) giving training data.
        - y: A numpy array f shape (N,) giving training labels; y[i] = c means that
          X[i] has label c, where 0 <= c < C.
        - X_val: A numpy array of shape (N_val, D) giving validation data.
        - y_val: A numpy array of shape (N_val,) giving validation labels.
        - learning_rate: Scalar giving learning rate for optimization.
        - learning_rate_decay: Scalar giving factor used to decay the learning rate
          after each epoch.
        - reg: Scalar giving regularization strength.
        - num_iters: Number of steps to take when optimizing.
        - batch_size: Number of training examples to use per step.
        - verbose: boolean; if true print progress during optimization.
        """
        num_train = X.shape[0]
        iterations_per_epoch = max(num_train / batch_size, 1)

        # Use SGD to optimize the parameters in self.model
        loss_history = []
        train_acc_history = []
        val_acc_history = []

        for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO: Create a random minibatch of training data and labels, storing  #
            # them in X_batch and y_batch respectively.                             #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            i = np.random.choice(a=num_train, size=batch_size)  # 随机从num_train中取batch_size大小的数据数目
            X_batch = X[i,:]  # 取对应i个训练数据
            y_batch = y[i]  # 取对应i个标签

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # Compute loss and gradients using the current minibatch
            loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
            loss_history.append(loss)

            #########################################################################
            # TODO: Use the gradients in the grads dictionary to update the         #
            # parameters of the network (stored in the dictionary self.params)      #
            # using stochastic gradient descent. You'll need to use the gradients   #
            # stored in the grads dictionary defined above.                         #
            # 使用梯度字典中的梯度使用随机梯度下降更新网络的参数(存储在字典self.params中) #
            # 您需要使用上面定义的梯度字典中存储的梯度。                                #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            self.params['W1'] -= learning_rate * grads['W1']
            self.params['b1'] -= learning_rate * grads['b1']
            self.params['W2'] -= learning_rate * grads['W2']
            self.params['b2'] -= learning_rate * grads['b2']

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            if verbose and it % 100 == 0:
                print('iteration %d / %d: loss %f' % (it, num_iters, loss))

            # Every epoch, check train and val accuracy and decay learning rate.
            if it % iterations_per_epoch == 0:
                # Check accuracy
                train_acc = (self.predict(X_batch) == y_batch).mean()
                val_acc = (self.predict(X_val) == y_val).mean()
                train_acc_history.append(train_acc)
                val_acc_history.append(val_acc)

                # Decay learning rate
                learning_rate *= learning_rate_decay

        return {
          'loss_history': loss_history,
          'train_acc_history': train_acc_history,
          'val_acc_history': val_acc_history,
        }

    def predict(self, X):
        """
        Use the trained weights of this two-layer network to predict labels for
        data points. For each data point we predict scores for each of the C
        classes, and assign each data point to the class with the highest score.

        Inputs:
        - X: A numpy array of shape (N, D) giving N D-dimensional data points to
          classify.

        Returns:
        - y_pred: A numpy array of shape (N,) giving predicted labels for each of
          the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
          to have class c, where 0 <= c < C.
        """
        y_pred = None

        ###########################################################################
        # TODO: Implement this function; it should be VERY simple!                #
        ###########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        scores = self.loss(X)  # 无y参数,所以返回的是分数 NxC
        y_pred = np.argmax(scores, axis = 1)  # 1xN

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        return y_pred

一旦你实现了这个方法,运行下面的代码来训练一个关于toy数据的双层网络。你的训练损失应该小于0.02。

net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=5e-6,
            num_iters=100, verbose=False)

print('Final training loss: ', stats['loss_history'][-1])

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()

输出:

Final training loss:  0.01714908583773327

在这里插入图片描述

from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    
    # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
    try:
       del X_train, y_train
       del X_test, y_test
       print('Clear previously loaded data.')
    except:
       pass

    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

输出:

Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

Train a network

为了培训我们的网络,我们将使用SGD。此外,随着优化的进行,我们将采用指数学习速率调度来调整学习速率;在每个历元之后,我们将通过将学习率乘以衰减率来降低学习率。

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.25, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('Validation accuracy: ', val_acc)

输出:

iteration 0 / 1000: loss 2.302954
iteration 100 / 1000: loss 2.302550
iteration 200 / 1000: loss 2.297603
iteration 300 / 1000: loss 2.259162
iteration 400 / 1000: loss 2.203468
iteration 500 / 1000: loss 2.117619
iteration 600 / 1000: loss 2.050917
iteration 700 / 1000: loss 1.987152
iteration 800 / 1000: loss 2.005458
iteration 900 / 1000: loss 1.950105
Validation accuracy:  0.287

Debug the training

使用上面提供的默认参数,验证集的验证精度应该在0.29左右。

了解问题所在的一种策略是在优化过程中绘制损失函数和训练和验证集的准确性。

另一种策略是将在网络第一层学到的权重可视化。在大多数以视觉数据为训练对象的神经网络中,第一层权值在可视化时通常显示一些可见的结构。

# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Classification accuracy')
plt.legend()
plt.show()

在这里插入图片描述

from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network
# 可视化网络的权重

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(net)

在这里插入图片描述

Tune your hyperparameters

What’s wrong? .看看上面的可视化,我们看到损失或多或少呈线性下降,这似乎表明学习率可能太低了。此外,训练和验证精度之间没有差距,说明我们使用的模型容量较低,需要增加模型的尺寸。另一方面,对于一个非常大的模型,我们期望看到更多的过拟合,这将表现为训练和验证精度之间的一个非常大的差距。

Tunning. 调优超参数并开发它们如何影响最终性能的直觉是使用神经网络的一个重要部分,所以我们希望您得到大量的实践。下面,您应该试验各种超参数的不同值,包括隐藏层大小、学习率、训练周期数和正则化强度。您还可以考虑调优学习率衰减,但是您应该能够使用缺省值获得良好的性能。

Approximate results. 您的目标应该是在验证集上获得超过48%的分类精度。我们最好的网络在验证集上获得超过52%的分类精度。

Experiment: 在这个练习中,您的目标是使用一个完全连接的神经网络,在CIFAR-10上获得尽可能好的结果(52%可以作为参考)。您可以自由地实现自己的技术(例如PCA来降低维度,或者添加dropout,或者向求解器添加特性,等等)。

Explain your hyperparameter tuning process below.
Your Answer: 略…

best_net = None # store the best model into this 

#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net. 使用验证集调优超参数。将您最好的训练过的模型存储在best_net中    #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
# 为了帮助调试您的网络,可以使用与我们上面使用的类似的可视化;                        #
# 这些可视化将与我们在上面看到的针对调优较差的网络的可视化具有显著的质量差异。         #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
# 手工调整超参数可能很有趣,但是您可能会发现编写代码自动遍历可能的超参数组合很有用,   #
# 就像我们在前面的练习中所做的那样。                                               #
#################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy: ', test_acc)

lnline Question

现在您已经训练了一个神经网络分类器,您可能会发现您的测试精度远远低于训练精度。我们可以用什么方法来缩小这种差距?选择所有应用。

1.Train on a larger dataset. 训练更大的数据集。
2.Add more hidden units. 添加更多隐藏单位。
3.Increase the regularization strength. 增加正则化强度。
4.None of the above. 以上都不是。
Your Answer:
Your Explanation:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!