current position:Home>Multilayer Perceptron - MLP

Multilayer Perceptron - MLP

2022-08-06 18:08:46Dumpling adults

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第9天,点击查看活动详情


It is a relatively simple neural network model,It is the basic model for getting started.


The basic unit of a neural network is the neuron,Commonly called nodes or elements.如上图所示,Each unit will receive input from other units or external input,根据权重,计算输出.


其中x1,x2as input to this node,其权重分别为 w1w2.同时,There are also offsets bbias)的输入.其中函数 f 叫做激活函数,Usually a nonlinear function is used,The primary role enables the unit to learn nonlinear functions.



MLP结构如上图所示(Hands-on Learning Deep Learning from Li Mu),主要由三部分组成:

  • 输入层(Input Layer):Accept data action,不进行任何计算,Pass data to the next layer.
  • 隐藏层(Hidden Layer):These nodes compute the information passed by the input layer,传递到输出层.至少有 一个隐藏层.
  • 输出层(Output Layer):负责计算,Get result information.

This model can be seen as an upgraded version of linear regression and logistic regression,The range of regression problems that can be solved is wider.

The overall model is connected together like an intricate network,Each node is equivalent to a linear regression function,with associated weights and biases.


其实 MLP The process of backpropagation with linear regression,Logistic regression models agree,Just become more model,Models are not functional expressions that can be written directly.


其中 1 means bias,w4,w5,w6作为权重.

Calculate the output node error,Backpropagate the error from the output layer to the input layer,The weights of each layer are updated.


Why the hidden layer must have an activation function?

First, an example is given to explain:

Suppose building a multilayer perceptron with a single hidden layer,Let the hidden layer output be H H ,权重参数 W h W_{h} ,偏置 b h b_{h} ,输出层 O O ,输入层为 X X .

Although the hidden layer unit is unknown,But it can be solved using matrices,Then the calculation process between them:

H = X W h + b h O = H W o + b o \begin{aligned} \boldsymbol{H} &=\boldsymbol{X} \boldsymbol{W}_{h}+\boldsymbol{b}_{h} \\ \boldsymbol{O} &=\boldsymbol{H} \boldsymbol{W}_{o}+\boldsymbol{b}_{o} \end{aligned}

If you combine the two formulas, you will get:

O = ( X W h + b h ) W o + b o = X W h W o + b h W o + b o \boldsymbol{O}=\left(\boldsymbol{X} \boldsymbol{W}_{h}+\boldsymbol{b}_{h}\right) \boldsymbol{W}_{o}+\boldsymbol{b}_{o}=\boldsymbol{X} \boldsymbol{W}_{h} \boldsymbol{W}_{o}+\boldsymbol{b}_{h} \boldsymbol{W}_{o}+\boldsymbol{b}_{o}

Although there are hidden layers, it is still a linear regression model,It's just that the weights get a little more complicated,Even adding more hidden layers is a linear regression model.

This leads to the question of why hidden layers are needed.

隐藏层The meaning is to take the characteristics of the input data,抽象到另一个维度空间,来展现其更抽象化的特征,这些特征能更好的进行线性划分.

简单讲:We will input layer information,Do a linear division,Send information further down,Repeat the linear division process,Continuously process the information step by step to get the result.

Why do you need to use an activation function?

  • From the previous formula, no activation function is used,每一层输出都是上层输入的线性函数,无论神经网络有多少层,输出都是输入的线性组合;
  • 使用激活函数,能够给神经元引入非线性因素,使得神经网络可以任意逼近任何非线性函数,In this way, the neural network can be applied to more nonlinear models.


  1. sigmoid 函数


  • Sigmoid函数的输出映射在(0,1)之间,单调连续,输出范围有限,优化稳定,可以用作输出层.
  • 求导容易.


  • 由于其软饱和性,容易产生梯度消失,导致训练出现问题.
  • 其输出并不是以0为中心的.
f ( x ) = 1 1 + e x f(x)=\frac{1}{1+e^{-x}}


  1. tanh 函数


  • 比Sigmoid函数收敛速度更快.
  • 相比Sigmoid函数,其输出以0为中心.


  • 还是没有改变Sigmoid函数的最大问题——由于饱和性产生的梯度消失.
tanh ( x ) = 1 e 2 x 1 + e 2 x \tanh (x)=\frac{1-e^{-2 x}}{1+e^{-2 x}}


  1. ReLU


  • 相比起Sigmoid和tanh,ReLU在SGD中能够快速收敛.
  • Sigmoid和tanh涉及了很多很expensive的操作(比如指数),ReLU可以更加简单的实现.
  • 有效缓解了梯度消失的问题.
  • 在没有无监督预训练的时候也能有较好的表现.
  • 提供了神经网络的稀疏表达能力.


  • 随着训练的进行,可能会出现神经元死亡,权重无法更新的情况.如果发生这种情况,那么流经神经元的梯度从这一点开始将永远是0.也就是说,ReLU神经元在训练中不可逆地死亡了.
y = { 0 ( x 0 ) x ( x > 0 ) y=\left\{\begin{array}{ll} 0 & (x \leq 0) \\ x & (x>0) \end{array}\right.


  1. Mish

经过ReLUSwish、MishComparison of outputs after different activation functions,从中可以发现Mish相对于ReLUSwishAppears smoother.


f ( x ) = x t a n h ( s o f t p l u s ( x ) ) f ( x ) = x ∗ t a n h ( s o f t p l u s ( x ) )



总体来讲MLP其实并不难,But it is an entry-level neural network model,More to understand.As for how to design the hidden layers and the number,I don't have a reliable way,Can only say recommended:

  • 选择常见的MLP模型的设计.
  • The number of hidden layers is tried starting from a single one,获取最优.

copyright notice
author[Dumpling adults],Please bring the original link to reprint, thank you.

Random recommended