current position:Home>Machine learning -- spectral clustering
Machine learning -- spectral clustering
2022-01-26 23:29:36 【Nuggets user 007】
This is my participation 8 The fourth of the yuegengwen challenge 16 God , Check out the activity details :8 Yuegengwen challenge
Introduction to spectral clustering
Spectral clustering is an algorithm evolved from graph theory , Later, it has been widely used in clustering . Its main idea is to treat all data as points in space , These points can be connected by edges . The edge weight value between two points that are far away is low , The edge weight value between two points close to each other is higher , Cut through the graph composed of all data points , Let the edge weights between different subgraphs be as low as possible , And the edge weight sum in the subgraph is as high as possible , So as to achieve the purpose of clustering .
Spectral clustering principle
Spectral clustering algorithm is an algorithm that is easy to use but not so easy to understand in principle . For spectral clustering algorithm, we can summarize the following steps :
Input : Sample set D=(x1,x2,...,xn), How similar matrices are generated , Dimension after dimension reduction k1, Clustering method , Dimension after clustering k2
Output : Clustering C(c1,c2,...ck2). \
- The similarity matrix of the sample is constructed according to the generation method of the input similarity matrix S\
2) According to the similarity matrix S Building adjacency matrix W, Construct the degree matrix D
3) Calculate the Laplace matrix L
4) Construct the standardized Laplace matrix D−1/2LD−1/2
5) Calculation D−1/2LD−1/2 The smallest k1 The eigenvector corresponding to each eigenvalue f
6) The corresponding eigenvectors f The matrix is normalized by row , The final composition n×k1 The characteristic matrix of dimension F
7) Yes F Each line in the is treated as a k1 V's sample , common n Samples , Use the input clustering method for clustering , The clustering dimension is k2.
8) Get the cluster partition C(c1,c2,...ck2).
The specific principle of spectral clustering algorithm , We need some basis related to graph theory , It's not going to unfold here . For a detailed explanation, please refer to the University of Stanford CS224w The content of lesson 5 of the course , For the courseware of the course, see 【 Reference link 2】, It can also be read 【 Reference link 3】 Understand the content of the courseware .
Spectrum clustering algorithm summary
The main advantages of spectral clustering
- Spectral clustering only needs the similarity matrix between data , Therefore, it is very effective for clustering sparse data . This traditional clustering algorithm, such as K-Means It's hard to do
- Due to the use of dimensionality reduction , Therefore, the complexity of processing high-dimensional data clustering is better than the traditional clustering algorithm .
The main disadvantages of spectral clustering
- If the dimension of the final cluster is very high , Because the range of dimension reduction is not enough , The running speed of spectral clustering and the final clustering effect are not good .
- The clustering effect depends on the similarity matrix , The final clustering effect obtained by different similarity matrices may be very different .
Spectral clustering in Sklearn In the implementation of
stay sklearn The implementation of spectral clustering algorithm is provided in :
sklearn.cluster.SpectralClustering(n_clusters=8, *, eigen_solver=None, n_components=None, random_state=None, n_init=10, gamma=1.0, affinity='rbf', n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1, kernel_params=None, n_jobs=None, verbose=False)
Copy code
Parameters, :
'n_clusters' Is the dimension of shadow space , That is, the number of clusters after clustering is also in the principle k
'eigen_solver' Is the eigenvalue decomposition strategy used , The optional parameters are 'arpack', 'lobpcg', 'amg'.AMG Need to install pyamg. It can be faster on very large sparse problems , But it can also lead to instability . without , You can also use 'arpack'
'random_state' Is a pseudo-random number generator , be used for K-means Eigenvector decomposition in .
'Gamma' It means rbf, poly, sigmoid, laplacian and chi2 kernels Nuclear coefficient of
Here are only some of the parameters used , For complete parameters, please refer to the official document, i.e 【 Reference link 4】 It can also be read 【 Reference link 5】 Understand the meaning and principle of parameters .
copyright notice
author[Nuggets user 007],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201262329339008.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- Hadoop environment setup (MySQL environment configuration)
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- Anaconda can't open it
guess what you like
-
I don't know how to start this
-
Matlab simulation of transportation optimization algorithm based on PSO
-
MySQL slow log optimization
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
Suzhou computer research
-
After installing SSL Certificate in Windows + tomcat, the domain name request is not successful. Please answer!!
-
Implementation time output and greetings of jQuery instance
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
How to save computer research
Random recommended
- Springboot implements excel import and export, which is easy to use, and poi can be thrown away
- The final examination subjects of a class are mathematical programming, and the scores are sorted and output from high to low
- Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
- C + + recursive knapsack problem
- The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
- PostgreSQL statement query
- Ignition database test
- Context didn't understand why he got a high salary?, Nginxfair principle
- Bootstrap switch switch control user's guide, springcloud actual combat video
- A list that contains only strings. What other search methods can be used except sequential search
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Iinternet network path management system
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- If you want an embedded database (H2, hsql or Derby), please put it on the classpath
- When using stm32g070 Hal library, if you want to write to flash, you must perform an erase. If you don't let it, you can't write continuously.
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Fresh students are about to graduate. Do you choose Java development or big data?
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Titanic passenger survival prediction
- Vectorization of deep learning formula
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- For someone, delete return 1 and return 0
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- How does the database cow optimize SQL?
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Webpack packaging optimization solution
- 5. Operation element
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of springboot default package scanning mechanism and @ componentscan specified scanning path
- How to solve the run-time exception of test times
- Detailed explanation of k8s management tool kubectl
- Android system view memory command