04 基于神经网络的逻辑回归实现 - 神经网络和深度学习 [Deep Learning Specialization系列]

本文是Deep Learning Specialization系列课程的第1课《Neural Networks and Deep Learning》中Logistic Regression with a Neural Network mindset练习部分的笔记。

在《02 神经网络 - 神经网络和深度学习 [Deep Learning Specialization系列]》中，我们了解了神经网络的大部分理论知识。通过该编程实例，我们能构建一个简答的逻辑回归的分类器来识别猫，以复习神经网路的知识并了解具体的编程实现。

概述

本试验使用的是h5格式的数据集，该数据集包含有标注结果的训练数据和测试数据，通过以下7个步骤来完成神经网络的训练和预测：

数据加载
数据处理
参数初始化
逻辑回归函数的实现（正向传播）
损失/代价函数的实现（正向传播）
梯度递减算法的实现（反向传播）
预测

神经网络的流程图

1. 数据加载

h5格式的数据的读取是通过h5py库来实现的，简单的介绍可以参考我的上一篇文章《h5py - HDF5 for Python的简单入门》。

首先构建一个load_dataset()函数来完成数据的加载，该函数通过h5py.File()函数来读取h5格式的数据文件，将训练数据和测试数据做一个简单处理后，输出train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, 和classes。

在通过相对路径来读取h5文件时，会报错：

OSError: Unable to open file (unable to open file: name = 'datasets/mytestfile.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

目前还没找到比较好的方式，只能在h5py.File()函数中指定该文件的绝对路径来读取。

这是该数据的一些特点：

train_set_x_orig shape: (209, 64, 64, 3)
train_set_y_orig shape: (1, 209)
test_set_x_orig shape: (50, 64, 64, 3)
test_set_y_orig shape: (1, 50)

数据加载：
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

2. 数据处理

数据处理分为两部分：

向量化：需要把三维的图像数据(num_px, num_px, 3)向量化，reshape成一维向量(num_px * num_px * 3, 1)：
标准化：由于图像的像素值是0-255，这里除以255以将其标准化为0-1的数据。

# Vectorization
train_set_x_orig_flatten = train_set_x_orig.reshape(train_set_x_orig[:].shape[0], -1).T 
# Standardization
train_set_x = train_set_x_orig_flatten/255

3. 参数初始化

将参数w和b初始化为0的向量。这里需要注意的是参数的维度，需要与训练数据一致。

def initialize_with_zeros(dim):
    w = np.zeros((dim, 1))
    b = 0

    return w, b

4. 逻辑回归函数的实现

从这里开始就进入了正向传播（Forward Propagation）了。

逻辑回归函数的实现主要包含两部分：

逻辑回归函数： $z^{(i)} = w^T x^{(i)} + b$
激活函数： $\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})$

先实现sigmoid激活函数：

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))
    return s

再计算激活函数的输出：
A = sigmoid(np.dot(w.T, X) + b)

5. 损失/代价函数

通过激活函数的输出A和标注数据Y，来计算代价函数（Cost Function）：
$J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

代码片段为：
cost = (-1/m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

6. 梯度递减算法

这部分是神经网络的反向传播（Back Propagation），包含两部分：

计算参数的梯度dw和db
优化参数w和b

通过激活函数的输出A、训练数据X和标注数据Y来计算dw和db：

$dw：\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$
$db：\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$

将上面正向传播和反向传播放在一起来实现，其中输入参数w、b和训练数据X和Y，输出是梯度和代价函数的值。

def propagate(w, b, X, Y):
    m = X.shape[1]

    # Forward propagation
    # Compute activation
    A = sigmoid(np.dot(w.T, X) + b)

    # Compute cost
    cost = (-1/m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

    # Backward propagation
    dw = (1/m) * np.dot(X, (A - Y).T)
    db = (1/m) * np.sum(A - Y)

    cost = np.squeeze(cost)

    grads = {"dw": dw,
             "db": db}

    return grads, cost

最后通过引入学习率（learning rate）和循环次数（number iteration）来优化参数： $w = w - \alpha dw$

在优化参数的函数中，输出是参数、梯度和代价函数。

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    costs = []

    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)

        dw = grads["dw"]
        db = grads["db"]

        w = w - learning_rate * dw
        b = b - learning_rate * db

        # Record the cost
        if i % 100 == 0:
            costs.append(cost)

        # Print cost every 100 iterations
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" %(i, cost))


    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

7. 预测

在得到了最优参数后，就可以将这些参数应用于实际的预测了，输入参数w、b和数据X。

def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    A = sigmoid(np.dot(w.T, X) + b)

    for i in range(A.shape[1]):
        Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0

    return Y_prediction

通过预测值与标注数据进行对比，就可以计算该算法的精度了：

Y_predict_test = predict(w, b, X_test)
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_predict_test - Y_test)) * 100))

总结

最后，只需把上面各步骤组合起来，就能完成我们算法模型了。

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    # initialize parameters with zeros 
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent 
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples 
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

这里还涉及到一步，即如何将自己的图片导入到该模型中进行测试，这里只需要使用到predict()函数即可，但在输入预测函数前，需要将测试图片通过SciPy进行一些预处理，以满足算法所需的格式。关于这部分内容，留待后续再做更新。

来源：CSDN

作者：puran1218

链接：https://blog.csdn.net/puran1218/article/details/104702763

标签

逻辑回归