current position:Home>GMM Gaussian mixture model

GMM Gaussian mixture model

2022-01-27 05:03:07 Vegetable sheep

This is my participation 8 The fourth of the yuegengwen challenge 14 God , Check out the activity details :8 Yuegengwen challenge

Gaussian mixture model GMM

Gaussian Mixture Model Gaussian mixture model , It belongs to a common generation model .

Mixture Model

A hybrid model is one that can be used to represent the overall distribution (distribution) contains K Probability model of sub distribution , let me put it another way , The mixed model represents the probability distribution of the observed data in the population , It is a result of K A mixed distribution consisting of sub distributions . The hybrid model does not require the observed data to provide information about the sub distribution , To calculate the probability of observation data in the overall distribution .

Gaussian Mixture Model

We know that the multi-dimensional Gaussian distribution follows the following probability density function :

p ( x θ ) = 1 ( 2 π ) n 2 Σ 1 2 e x p ( 1 2 ( x μ ) T Σ 1 ( x μ ) ) p(x\mid \theta)=\dfrac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\dfrac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))

Gaussian mixture model (Gaussian Mixture Model, GMM) Is an extension of a single Gaussian probability density function ,GMM It can smoothly approximate the density distribution of any shape .

Gaussian mixture model can be regarded as composed of K A model composed of a single Gaussian model , this K The sub model is the implicit variable of the mixed model .

We can think that the probability density function of Gaussian mixture model can be determined by its k Weighted by a single Gaussian probability density function .

Suppose our sample data is X X , share x 1 , x 2 , . . . , x N    N x_1,x_2,...,x_N\; N Samples , use α k \alpha_k It means the first one k k The weight factor of a single Gaussian model , G ( x θ ) G(x|\theta) Represents the probability density function of a single Gaussian , Yes :

p ( x θ ) = k = 1 K α k G ( x θ k ) p(x|\theta)=\sum_{k=1}^{K}\alpha_kG(x|\theta_k)

obviously ,GMM The parameters of are a set of θ \theta ,

θ = ( μ k ~ , Σ k ~ , α k ~ ) \theta = (\tilde{\mu_k},\tilde{\Sigma_k},\tilde{\alpha_k})

Look here and you'll find , k k The value of needs to be determined in advance , It's very important , similar k m e a n s k-means Then you need to make sure k k .

Parameter estimation

In the learning of multi-dimensional Gaussian distribution , We know , It can be estimated by maximum likelihood θ \theta Value , The likelihood function is L ( θ ) = p ( x θ ) L(\theta)=p(x|\theta)

θ = a r g m a x L ( θ ) \theta = argmaxL(\theta)

about GMM, We assume that each set of sample data is independent , Then the likelihood function is k k A tired ride , Considering that the probability of a single point is very small , After multiplication, the data will be smaller , It is easy to cause floating point underflow , So we can use log likelihood :

L ( θ ) = k = 1 K p ( x k θ ) l o g L ( θ ) = j = 1 N l o g k = 1 K α k G ( x θ k ) L(\theta)=\prod_{k=1}^{K}p(x_k|\theta) \\ logL(\theta)=\sum_{j=1}^{N}log\sum_{k=1}^{K}\alpha_kG(x|\theta_k)

Use EM Algorithmic solution

First, initialize a set of parameters randomly

θ = ( μ k , Σ k , α k ) ,    k = 1 , 2 , . . . , K \theta=(\mu_k,\Sigma_k,\alpha_k),\;k=1,2,...,K

E Step :

So-called E Namely Expectation, When we know the model parameters , For implicit variables X X Expect , As shown below :

γ j , k = α k G ( x j μ k , Σ k ) k = 1 K α k G ( x j μ k , Σ k ) \gamma_{j,k}=\dfrac{\alpha_k G(x_j|\mu_k,\Sigma_k)}{\sum_{k=1}^{K}\alpha_k G(x_j|\mu_k,\Sigma_k)}

γ j , k \gamma_{j,k} That is to say, data x j x_j Belong to the first k k Probability of sub Gaussian model .

M Step :

Now we have γ j , k \gamma_{j,k} , We can use the maximum likelihood to estimate the parameters of the next iteration :

μ k = j = 1 N ( γ j , k x j ) j = 1 N γ j , k ,    k = 1 , 2 , . . . , K Σ k = j = 1 N γ j , k ( x j μ k ) ( x j μ k ) T j = 1 N γ j , k ,    k = 1 , 2 , . . . , K α k = j = 1 N γ j , k N ,    k = 1 , 2 , . . . , K \mu_k=\dfrac{\sum_{j=1}^{N}(\gamma_{j,k}x_j)}{\sum_{j=1}^{N}\gamma_{j,k}},\; k=1,2,...,K \\ \Sigma_k=\dfrac{\sum_{j=1}^{N}\gamma_{j,k}(x_j-\mu_k)(x_j-\mu_k)^T}{\sum_{j=1}^{N}\gamma_{j,k}},\; k=1,2,...,K \\ \alpha_k = \dfrac{\sum_{j=1}^{N}\gamma_{j,k}}{N},\; k=1,2,...,K

repeat E Step sum M Step until convergence

It should be noted that ,EM The algorithm has convergence , But it is not guaranteed to find the global maximum , It is possible to find the local maximum . The solution is to initialize several different parameters for iteration , The one with the best result .

Reference resources

  1. Mr. Li Hang ——《 Statistical learning method 》

copyright notice
author[Vegetable sheep],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201270503018310.html

Random recommended