current position:Home>PCA and ICA for introduction to machine learning
PCA and ICA for introduction to machine learning
2022-01-26 23:29:42 【Vegetable sheep】
This is my participation 8 The fourth of the yuegengwen challenge 12 God , Check out the activity details :8 Yuegengwen challenge
This article is the sixth in the notes series of Wu Enda's machine learning course , Mainly about the commonly used algorithms for data dimensionality reduction -PCA Principal component analysis algorithm , At the same time, extend another algorithm -ICA Independent component analysis .
Principal component analysis PCA
Before learning principal component analysis , Let's first understand what dimensionality reduction is .
What is dimension reduction
Refer to the definition of watermelon book , Dimension reduction That is, the original high-dimensional attribute space is transformed into a low-dimensional subspace through some mathematical transformation . In this subspace, the sample density is greatly increased , Distance calculation will also become easier .
In fact, dimensionality reduction is generally from high-dimensional projection to low-dimensional projection .
Such as below , about 3 D data , drop to 2 Dimension is to change the data from the original 3 Projection of dimensional space to 2 Dimensional plane , This enables dimensionality reduction .
PCA (Principal Component Analysis)
Principal component analysis is the most common dimensionality reduction algorithm .PCA Can extract main components from redundant features , Without much loss of model quality , Improved model training speed .
Projection error (Projection Error):
The projection error is : Project the data onto a vector passing through the origin ( The direction vector ), The length of the perpendicular line between the eigenvector and the direction vector .
PCA The goal is , We want to find a direction vector (Vector direction), Make the mean square error of this projection error as small as possible .
Obviously, this is an example of reducing to one dimension . that PCA The complete description of is :
Want to n Dimensional data reduced to k dimension , The goal is to find such a set of vectors , The total projection error of all data projected on this set of vectors is minimized . In fact, this set of vectors should be orthogonal in the original space . And this k The new orthogonal feature of dimension is also called principal component , It's in the original n Based on the characteristics of dimension, we reconstruct k Whitman's sign .
Algorithm flow
-
Mean normalization .
-
Calculate the covariance matrix :
-
adopt Singular value decomposition Calculate the covariance matrix Of Eigenvector :
-
from Before you choose Vector , To obtain a The matrix of dimensions , use Express . We want to take the data from Dimension reduction dimension .
-
Calculate the new eigenvector :
obviously , The end result is naturally dimension .
in general , Want to get the principal component , First calculate the covariance matrix of the data matrix , Then the eigenvector of covariance matrix is obtained by singular value decomposition , Then select the one with the largest eigenvalue, that is, the one with the largest variance k A matrix of eigenvectors .
To sum up ,PCA As a dimension reduction method of unsupervised learning , It just needs eigenvalue decomposition , You can compress the data , Denoise . So it's widely used in real scenes .
Independent component analysis ICA
above PCA It's a process of information extraction , Used to reduce the dimension of original data , And the next mentioned Independent component analysis namely ICA(Independent Component Analysis), It is a process of information disambiguation .
Problem introduction
ICA The premise is that the observed variable is a linear combination of several statistically independent components .
Let's start with the classic cocktail party (ocktail party problem) To discuss . This is the problem : There is... In a room Someone is driving party, They can talk at the same time . In different corners of the room Sound receivers , Each receiver can simultaneously collect Overlapping sounds of personal voices . The distance between each receiver and each person is different , Therefore, the overlap of the sound received by each receiver is also different .party After the end , We get A sound sample , Each sample is at a specific time , from A set of sound data collected by a receiver , How to start from this A sample set is separated Each speaker's own voice ?
Let's take a closer look at the problem description , use To represent the sound signal source emitted by everyone at all times , It's a Matrix , Each line represents a person A sequence of sound signals at a time , common That's ok , namely personal .
Is sampled at every moment A linear combination of personal sound data . It's also a Matrix . That is to say A moment , Then a total of Group samples , And each sample is Dimensional . The superscript here It means a certain moment , It's a component , It stands for Everyone who receives it all the time A linear combination of sound signals .
So we have the following model :
among , Is an unknown mixed matrix , obviously yes Dimensional , And must be full rank .
Now the situation is It is unknown. , It is known. , We need to find a way To launch and , This process is also called blind signal separation . That sounds a little amazing, Let's take a slow look at .
Algorithm
Without losing generality , We can assume that both the mixed variable and the independent component have a zero mean ; If the raw data is not zero-mean, We can observe variables Standardization , Make the model zero mean model .
First , Let's make a change , Make , that , It can also be expressed as , So for each source signal component :
then , Let's assume that everyone's voice signal It's independent , And there is a probability density , So at a given moment The joint probability density of the source signal is :
With , We want to get the probability of sampling the signal , How do you get it ?
Let's first recall the knowledge of probability theory , We know that the probability density can be derived from the cumulative distribution function , Let's find the cumulative distribution function first :
Then find a derivative :
So there is :
Based on maximum likelihood estimation
Likelihood function :
Given a training sample , Find the log likelihood :
ICA Classical assumptions and uncertainties
Classic hypothesis
1. The components are independent of each other
This is a ICA One of the most basic and important principles of , It is very interesting that once this assumption is given , We can solve this model in a certain way . The explanation for this is , If any sequence of random variables (x1,x2,…,xn) They are statistically independent of each other , Then this means that we can't get random variables from the rest of the variables xj Any information .
2.ICA A very strong assumption : Independent components obey non Gaussian distribution .
This is because , If the source signal is Gaussian , That is, the independent components are Gaussian , Then their joint probability distribution will be uniform , The density is completely symmetrical , Take the two-dimensional Gaussian distribution in the figure below as an example . You can see , Gaussian variable The distribution of any orthogonal transformation of has the same as Exactly the same distribution . Because random variables with Gaussian distribution have high-order cumulants of 0 Characteristics of , therefore ICA After decomposition A There may be an infinite number of .
If the source signal is non Gaussian , that ICA The decomposition is unique . That's why one Generally, in standard independent component analysis, at most one component is allowed to obey Gaussian distribution .
3. Suppose the mixing matrix A It's the square
This is obvious , It's to make A reversible , Easy to calculate .
uncertainty
Having the same statistical characteristics It may come from two different systems :
It can also be understood that the sampled signal and noise signal are not identifiable nonidentifiable. The above formula shows that , After a linear transformation , Mixed matrix A and s It's not the only one , So we can't uniquely determine the original signal .
ICA Uncertain factors
- The variance of independent components cannot be determined
- The order of independent components cannot be determined
Summary
Whether it's PCA still ICA, You don't need to make specific assumptions about the distribution of the source signal .
If the observed signal is Gaussian , Then the source signal is also Gaussian , At this time PCA and ICA It is equivalent. .
majority ICA Our algorithm needs data preprocessing : First use PCA obtain y, And then y Standardization of each component of ( That is, divide each component by its own standard deviation ) obtain z. After pretreatment z Satisfy the following properties :
- z The components of are not related ;
- z The variance of each component of is 1.
copyright notice
author[Vegetable sheep],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201262329361469.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- Hadoop environment setup (MySQL environment configuration)
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- Anaconda can't open it
guess what you like
-
I don't know how to start this
-
Matlab simulation of transportation optimization algorithm based on PSO
-
MySQL slow log optimization
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
Suzhou computer research
-
After installing SSL Certificate in Windows + tomcat, the domain name request is not successful. Please answer!!
-
Implementation time output and greetings of jQuery instance
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
How to save computer research
Random recommended
- Springboot implements excel import and export, which is easy to use, and poi can be thrown away
- The final examination subjects of a class are mathematical programming, and the scores are sorted and output from high to low
- Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
- C + + recursive knapsack problem
- The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
- PostgreSQL statement query
- Ignition database test
- Context didn't understand why he got a high salary?, Nginxfair principle
- Bootstrap switch switch control user's guide, springcloud actual combat video
- A list that contains only strings. What other search methods can be used except sequential search
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Iinternet network path management system
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- If you want an embedded database (H2, hsql or Derby), please put it on the classpath
- When using stm32g070 Hal library, if you want to write to flash, you must perform an erase. If you don't let it, you can't write continuously.
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Fresh students are about to graduate. Do you choose Java development or big data?
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Titanic passenger survival prediction
- Vectorization of deep learning formula
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- For someone, delete return 1 and return 0
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- How does the database cow optimize SQL?
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Webpack packaging optimization solution
- 5. Operation element
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of springboot default package scanning mechanism and @ componentscan specified scanning path
- How to solve the run-time exception of test times
- Detailed explanation of k8s management tool kubectl
- Android system view memory command