Deeplearning.ai 学习笔记--基本概念

对应《神经网络和深度学习.第二周.作业1》

归一化

很多非数学背景的同学不知道归一化有什么卵用。简单来说,向量可以表示方向长度;通过归一化后,向量的长度 = 1,方向不变,使得向量仅代表特征的方向。优点是归一化后梯度下降得更快。(下图)

norm 默认是二范数,也就是平方和开根号 归一化使得x 变成 $ \frac{x}{| x|} $
For example, if $$x =
\begin{bmatrix}
0 & 3 & 4 \\
2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$| x| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
5 \\
\sqrt{56} \\
\end{bmatrix}\tag{4} $$and $$ x_{normalized} = \frac{x}{| x|} = \begin{bmatrix}
0 & \frac{3}{5} & \frac{4}{5} \\
\frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# GRADED FUNCTION: normalizeRows
def normalizeRows(x):
"""
Implement a function that normalizes each row of the matrix x (to have unit length).
Argument:
x -- A numpy matrix of shape (n, m)
Returns:
x -- The normalized (by row) numpy matrix. You are allowed to modify x.
"""
### START CODE HERE ### (≈ 2 lines of code)
# Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
x_norm = np.linalg.norm(x,axis=1,keepdims=True)
# Divide x by its norm.
x = x/x_norm
### END CODE HERE ###
return x

Sigmoid

Sigmoid Function 是机器学习中需要掌握的第一个激活函数。通常会在某些二分类问题的最后一层中有应用

  • 函数 $sigmoid(x) = \frac{1}{1+e^{-x}}$

  • 求一堆数的 sigmoid

$$\text{For } x \in \mathbb{R}^n \text{, } sigmoid(x) = sigmoid\begin{pmatrix}
x_1 \\
x_2 \\
… \\
x_n \\
\end{pmatrix} = \begin{pmatrix}
\frac{1}{1+e^{-x_1}} \\
\frac{1}{1+e^{-x_2}} \\
… \\
\frac{1}{1+e^{-x_n}} \\
\end{pmatrix}\tag{1} $$

  • sigmoid 的导数
    $$sigmoid_{derivative}(x) = \sigma’(x) = \sigma(x) (1 - \sigma(x))= s(1-s)\tag{2}$$

根据链式求导法则,在最终求cost的导数时,需要计算激活函数的导数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# GRADED FUNCTION: sigmoid_derivative
def sigmoid_derivative(x):
"""
Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
Arguments:
x -- A scalar or numpy array
Return:
ds -- Your computed gradient.
"""
### START CODE HERE ### (≈ 2 lines of code)
s = 1 / (1 + np.exp(-x))
ds = s * (1 - s)
### END CODE HERE ###
return ds

Softmax

Softmax(归一化指数函数)使得所有值分布在(0,1),且和为1。Softmax 是逻辑回归Sigmoid的一种推广(二分类==>多分类)

  • $ \text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix}
    x_1 &&
    x_2 &&
    … &&
    x_n
    \end{bmatrix}) = \begin{bmatrix}
    \frac{e^{x_1}}{\sum_{j}e^{x_j}} &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}} &&
    … &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}}
    \end{bmatrix} $

  • $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn}
    \end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
    \end{bmatrix} = \begin{pmatrix}
    softmax\text{(first row of x)} \\
    softmax\text{(second row of x)} \\
    … \\
    softmax\text{(last row of x)} \\
    \end{pmatrix} $$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# GRADED FUNCTION: softmax
def softmax(x):
"""Calculates the softmax for each row of the input x.
Your code should work for a row vector and also for matrices of shape (n, m).
Argument:
x -- A numpy matrix of shape (n,m)
Returns:
s -- A numpy matrix equal to the softmax of x, of shape (n,m)
"""
### START CODE HERE ### (≈ 3 lines of code)
# Apply exp() element-wise to x. Use np.exp(...).
x_exp = np.exp(x)
# Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
x_sum = np.sum(x_exp,axis=1,keepdims=True)
# Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
s = x_exp / x_sum
### END CODE HERE ###
return s

L1、L2 loss(损失函数)

这两个损失函数好像是非凸的,因此不太常用

  • L1 loss is defined as:
    $$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# GRADED FUNCTION: L1
def L1(yhat, y):
"""
Arguments:
yhat -- vector of size m (predicted labels)
y -- vector of size m (true labels)
Returns:
loss -- the value of the L1 loss function defined above
"""
### START CODE HERE ### (≈ 1 line of code)
loss = np.sum(abs(y-yhat))
### END CODE HERE ###
return loss
  • L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# GRADED FUNCTION: L2
def L2(yhat, y):
"""
Arguments:
yhat -- vector of size m (predicted labels)
y -- vector of size m (true labels)
Returns:
loss -- the value of the L2 loss function defined above
"""
### START CODE HERE ### (≈ 1 line of code)
loss = np.dot((y-yhat),(y-yhat)) # np.sum((y-yhat)*(y-yhat)) 看下面的【补充】
### END CODE HERE ###
return loss

补充:np.dot(A, B):对于二维矩阵,计算真正意义上的矩阵乘积,同线性代数中矩阵乘法的定义。对于一维矩阵,计算两者的内积
对应元素相乘 element-wise product: np.multiply(), 或 *