L2 normalization和L1，L2 regulation

L2 Normalization

第二种Normalization对于每个样本缩放到单位范数（每个样本的范数为1），主要有L1-normalization（L1范数）、L2-normalization（L2范数）等

Normalization主要思想是对每个样本计算其p-范数，然后对该样本中每个元素除以该范数，这样处理的结果是使得每个处理后样本的p-范数（比如l1-norm,l2-norm）等于1。
p-范式的计算公式：
$||X||_p=((x_1)^p+(x_2)^p+...+(x_n)^p)^{1/p}$
tensorflow中实现这一方法的函数如下：

tf.nn.l2_normalize(x, 
                  dim, 
                  epsilon=1e-12, 
                  name=None)

上式：
x为输入的向量；
dim为l2范化的维数，dim取值为0或0或1或[0,1]；
epsilon的范化的最小值边界；
下面看个例子：

#-*-coding:utf-8-*-
import tensorflow as tf
input_data = tf.constant([[1.0,2,3],[4.0,5,6],[7.0,8,9]])
output_1 = tf.nn.l2_normalize(input_data, dim=0, epsilon=1e-10, name='nn_l2_norm')
output_2 = tf.nn.l2_normalize(input_data, dim=1, epsilon=1e-10, name='nn_l2_norm')
output_3 = tf.nn.l2_normalize(input_data, dim=[0, 1], epsilon=1e-10, name='nn_l2_norm')


with tf.Session() as sess:
    print(output_1.eval())
    print(output_2.eval())
    print(output_3.eval())

‘’’output:
  [[0.12309149 0.20739034 0.26726127]
 [0.49236596 0.51847583 0.53452253]
 [0.86164045 0.82956135 0.80178374]]

 
[[0.26726124 0.5345225  0.8017837 ]
 [0.45584232 0.5698029  0.6837635 ]
 [0.5025707  0.5743665  0.64616233]]

 
[[0.05923489 0.11846977 0.17770466]
 [0.23693955 0.29617444 0.35540932]
 [0.4146442  0.4738791  0.53311396]]
'''

dim = 0, 为按列进行l2范化
$norm(1) = \sqrt{1^2+4^2+7^2}=\sqrt{66}$
$norm(2) = \sqrt{2^2+5^2+8^2}=\sqrt{93}$
$norm(3) = \sqrt{3^2+6^2+9^2}=\sqrt{126}$

[[1./norm(1), 2./norm(2) , 3./norm(3) ]
[4./norm(1) , 5./norm(2) , 6./norm(3) ]    =
[7./norm(1) , 8./norm(2) , 9./norm(3) ]]
[[0.12309149 0.20739034 0.26726127]
[0.49236596 0.51847583 0.53452253]
[0.86164045 0.82956135 0.80178374]]

dim=1,为按行进行l2范化
$norm(1) = \sqrt{1^2+2^2+3^2}=\sqrt{14}$
$norm(2) = \sqrt{4^2+5^2+6^2}=\sqrt{77}$
$norm(3) = \sqrt{7^2+8^2+9^2}=\sqrt{194}$

[[1./norm(1), 2./norm(1) , 3./norm(1) ]
[4./norm(2) , 5./norm(2) , 6./norm(2) ]    =
[7./norm(3) , 8..norm(3) , 9./norm(3) ]]
[[0.12309149 0.20739034 0.26726127]
[0.49236596 0.51847583 0.53452253]
[0.86164045 0.82956135 0.80178374]]

dim=[1, 2],按行列进行l2范化

$norm=\sqrt{1^2+2^2+3^2+4^2+5^2+6^2+7^2+8^2+9^2}=\sqrt{285}$ 16.1882

[[1./norm, 2./norm , 3./norm ]
[4./norm , 5./norm , 6./norm ]    =
[7./norm , 8./norm , 9./norm ]]

[[0.05923489 0.11846977 0.17770466]
 [0.23693955 0.29617444 0.35540932]
 [0.4146442  0.4738791  0.53311396]]

L1和L2regulation

L1 regulation

线性模型常用来处理回归和分类任务，为了防止模型处于过拟合状态，需要用L1正则化和L2正则化降低模型的复杂度，很多线性回归模型正则化的文章会提到L1是通过稀疏参数（减少参数的数量）来降低复杂度，L2是通过减小参数值的大小来降低复杂度。
L1正则化的损失函数为：
$L(w)=E_D(w)+\frac{\lambda}{n}\sum_i^n|w_i|$
上式中， $E_D(w)$ 是损失函数， $L$ 是加上正则项的损失函数
求 $L(w)$ 的梯度：
$\frac{\partial L(w)}{\partial w}=\frac{\partial E_D(w)}{\partial w}+(\frac{\lambda}{n}\sum_i^n|w|)^`$
更新权重：
$w'=w-\eta(\frac{\partial E_D(w)}{\partial w}+(\frac{\lambda}{n}\sum_i^n|w_i|)^`)$

咱们做个假设，所有的w都大于0，上式变为：
$w'=w-\eta\frac{\partial E_D(w)}{\partial w}-\eta\lambda$

看上式，因为w>0，在式子最后减去一个 $\eta\lambda$ ，这容易使得最后的w趋向于0，如果假设w<0，同理也会有这个效果。所以，当w大于0时，更新的参数w变小；当w小于0时，更新的参数w变大；所以，L1正则化容易使参数变为0，即特征稀疏化。

L2 regulation

L2正则化的损失函数为：在这里插入图片描述
由上式可知，正则化的更新参数相比于未含正则项的更新参数多了

项，当w趋向于0时，参数减小的非常缓慢，因此L2正则化使参数减小到很小的范围，但不为0

来源：https://blog.csdn.net/qq_39068872/article/details/100977808

标签

NoRM

正则化

dim