# Multilayer Perceptron - MLP

2022-08-06 18:08:46

# MLP

It is a relatively simple neural network model,It is the basic model for getting started.

## 神经元

The basic unit of a neural network is the neuron,Commonly called nodes or elements.如上图所示,Each unit will receive input from other units or external input,根据权重,计算输出.

## 结构

MLP结构如上图所示（Hands-on Learning Deep Learning from Li Mu）,主要由三部分组成：

• 输入层（Input Layer）：Accept data action,不进行任何计算,Pass data to the next layer.
• 隐藏层（Hidden Layer）：These nodes compute the information passed by the input layer,传递到输出层.至少有 一个隐藏层.
• 输出层（Output Layer）：负责计算,Get result information.

This model can be seen as an upgraded version of linear regression and logistic regression,The range of regression problems that can be solved is wider.

The overall model is connected together like an intricate network,Each node is equivalent to a linear regression function,with associated weights and biases.

## 反向传播

Calculate the output node error,Backpropagate the error from the output layer to the input layer,The weights of each layer are updated.

## 激活函数

Why the hidden layer must have an activation function？

First, an example is given to explain：

Suppose building a multilayer perceptron with a single hidden layer,Let the hidden layer output be $H$,权重参数$W_{h}$,偏置$b_{h}$,输出层$O$,输入层为$X$.

Although the hidden layer unit is unknown,But it can be solved using matrices,Then the calculation process between them：

\begin{aligned} \boldsymbol{H} &=\boldsymbol{X} \boldsymbol{W}_{h}+\boldsymbol{b}_{h} \\ \boldsymbol{O} &=\boldsymbol{H} \boldsymbol{W}_{o}+\boldsymbol{b}_{o} \end{aligned}

If you combine the two formulas, you will get：

$\boldsymbol{O}=\left(\boldsymbol{X} \boldsymbol{W}_{h}+\boldsymbol{b}_{h}\right) \boldsymbol{W}_{o}+\boldsymbol{b}_{o}=\boldsymbol{X} \boldsymbol{W}_{h} \boldsymbol{W}_{o}+\boldsymbol{b}_{h} \boldsymbol{W}_{o}+\boldsymbol{b}_{o}$

Although there are hidden layers, it is still a linear regression model,It's just that the weights get a little more complicated,Even adding more hidden layers is a linear regression model.

This leads to the question of why hidden layers are needed.

Why do you need to use an activation function？

• From the previous formula, no activation function is used,每一层输出都是上层输入的线性函数,无论神经网络有多少层,输出都是输入的线性组合;
• 使用激活函数,能够给神经元引入非线性因素,使得神经网络可以任意逼近任何非线性函数,In this way, the neural network can be applied to more nonlinear models.

1. sigmoid 函数

• Sigmoid函数的输出映射在(0,1)之间,单调连续,输出范围有限,优化稳定,可以用作输出层.
• 求导容易.

• 由于其软饱和性,容易产生梯度消失,导致训练出现问题.
• 其输出并不是以0为中心的.
$f(x)=\frac{1}{1+e^{-x}}$

1. tanh 函数

• 比Sigmoid函数收敛速度更快.
• 相比Sigmoid函数,其输出以0为中心.

• 还是没有改变Sigmoid函数的最大问题——由于饱和性产生的梯度消失.
$\tanh (x)=\frac{1-e^{-2 x}}{1+e^{-2 x}}$

1. ReLU

• 相比起Sigmoid和tanh,ReLU在SGD中能够快速收敛.
• Sigmoid和tanh涉及了很多很expensive的操作（比如指数）,ReLU可以更加简单的实现.
• 有效缓解了梯度消失的问题.
• 在没有无监督预训练的时候也能有较好的表现.
• 提供了神经网络的稀疏表达能力.

• 随着训练的进行,可能会出现神经元死亡,权重无法更新的情况.如果发生这种情况,那么流经神经元的梯度从这一点开始将永远是0.也就是说,ReLU神经元在训练中不可逆地死亡了.
$y=\left\{\begin{array}{ll} 0 & (x \leq 0) \\ x & (x>0) \end{array}\right.$

1. Mish

$f ( x ) = x ∗ t a n h ( s o f t p l u s ( x ) )$

## 总结

• 选择常见的MLP模型的设计.
• The number of hidden layers is tried starting from a single one,获取最优.