current position:Home>Machine learning notes: implementation of KNN algorithm pandas combined with scikit learn
Machine learning notes: implementation of KNN algorithm pandas combined with scikit learn
2022-01-26 22:53:37 【xMathematics】
1、K- Nearest neighbor algorithm :
If a sample is in the feature space K Most of the most similar samples belong to a certain category , Then the sample also belongs to this category
2、KNN Algorithm flow :
1、 Calculates the distance between a point in a known category and the current point
2、 Sort by increasing distance
3、 Select the least distance from the current point K A little bit
4、 Before statistics K The frequency of occurrence of the category of points
5、 Return to the former K The category with the highest frequency of occurrence of points is used as the prediction classification of the current point
3、 Machine learning process :
1、 get data
2、 Basic data processing
3、 Feature Engineering
4、 machine learning
5、 Model to evaluate
4、 Code implementation process :
''' -*- coding: utf-8 -*- @Author : Dongze Xu @Time : 2021/12/15 15:47 @Function: '''
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
''' # 1. Get data set # 2. Basic data processing # 2.1 Narrow the data range # 2.2 Select the time characteristic # 2.3 Get rid of places with less check-in # 2.4 Determine eigenvalues and target values # 2.5 Split the dataset # 3. Feature Engineering -- Feature preprocessing ( Standardization ) # 4. machine learning -- knn+cv # 5. Model to evaluate '''
# 1. Get data set
data = pd.read_csv("./data/FBlocation/train.csv")
# 2. Basic data processing
# 2.1 Narrow the data range
# selection x∈(2, 2.5); y∈(2, 2.5)
partial_data = data.query("x>2.0 & x<2.5 & y>2.0 & y<2.5")
# 2.2 Select the time characteristic
# Convert to time format ( Date form )
time = pd.to_datetime(partial_data["time"], unit="s")
time = pd.DatetimeIndex(time)
# 2.4 Determine eigenvalues and target values
x = partial_data[["x", "y", "accuracy", "hour", "day", "weekday"]]
y = partial_data["place_id"]
# 2.5 Split the dataset
#random_state: Random number seed ,test_size: Divide
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=2, test_size=0.25)
# 3. Feature Engineering -- Feature preprocessing ( Standardization )
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)
# 4. machine learning -- knn+cv
# 4.1 Instantiate a trainer
estimator = KNeighborsClassifier()
# 4.2 Cross validation , Grid search implementation
param_grid = {
"n_neighbors": [3, 5, 7, 9]}
estimator = GridSearchCV(estimator=estimator, param_grid=param_grid, cv=10, n_jobs=4)
# 4.3 model training
estimator.fit(x_train, y_train)
# 5. Model to evaluate
# 5.1 Accuracy output
score_ret = estimator.score(x_test, y_test)
# 5.2 Predicted results
y_pre = estimator.predict(x_test)
# 5.3 Other output results
print(" The best model is :\n", estimator.best_estimator_)
print(" The best result is :\n", estimator.best_score_)
print(" All the results are :\n", estimator.cv_results_)
copyright notice
author[xMathematics],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201262253346670.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- Hadoop environment setup (MySQL environment configuration)
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- Anaconda can't open it
guess what you like
-
I don't know how to start this
-
Matlab simulation of transportation optimization algorithm based on PSO
-
MySQL slow log optimization
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
Suzhou computer research
-
After installing SSL Certificate in Windows + tomcat, the domain name request is not successful. Please answer!!
-
Implementation time output and greetings of jQuery instance
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
How to save computer research
Random recommended
- Springboot implements excel import and export, which is easy to use, and poi can be thrown away
- The final examination subjects of a class are mathematical programming, and the scores are sorted and output from high to low
- Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
- C + + recursive knapsack problem
- The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
- PostgreSQL statement query
- Ignition database test
- Context didn't understand why he got a high salary?, Nginxfair principle
- Bootstrap switch switch control user's guide, springcloud actual combat video
- A list that contains only strings. What other search methods can be used except sequential search
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Iinternet network path management system
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- If you want an embedded database (H2, hsql or Derby), please put it on the classpath
- When using stm32g070 Hal library, if you want to write to flash, you must perform an erase. If you don't let it, you can't write continuously.
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Fresh students are about to graduate. Do you choose Java development or big data?
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Titanic passenger survival prediction
- Vectorization of deep learning formula
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- For someone, delete return 1 and return 0
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- How does the database cow optimize SQL?
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Webpack packaging optimization solution
- 5. Operation element
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of springboot default package scanning mechanism and @ componentscan specified scanning path
- How to solve the run-time exception of test times
- Detailed explanation of k8s management tool kubectl
- Android system view memory command