current position:Home>Titanic passenger survival prediction
Titanic passenger survival prediction
2022-01-27 05:33:56 【Ada_ lake】
Decision tree algorithm -
DecisionTreeClassifier(criterion='entropy')
criterion - standard entropy: Based on information entropy - namely ID3 Algorithm , The actual results are consistent with C4.5 Little difference
gini: The gini coefficient - namely CART Algorithm
The key process of survival prediction
-
Preparation stage
- Data exploration - Analysis data quality
- info - Basic information of the data sheet : Row number 、 Number of columns 、 The data type of each column 、 Data integrity
- describe - Statistics of the data sheet : total 、 The average 、 Standard deviation 、 minimum value 、 Maximum
- describe(include = [ 'O' ] String type ( The digital ) The overall situation of
- head - The first few lines of data
- tail - The last few lines of data
- Data cleaning
- Fill in missing values - mean value / Maximum frequency value
df.fillna(df['XX'].mean(), inplace = True)
df.fillna(df['XX'].value_counts(), inplace = True)
- Fill in missing values - mean value / Maximum frequency value
- feature selection - Data dimension reduction , Facilitate subsequent classification operations
- Filter out meaningless Columns
- Filter out columns with more missing values
- Put the remaining features into the feature vector
- Data conversion - Convert character column to numeric column , Convenient follow-up operation - DictVectorizer Class to convert
DictVectorizer: Symbols are converted into numbers 0/1 To said- Instantiate a converter
devc = DictVectorizer(sparse = False) - sparse=False It means not using sparse matrices , A sparse matrix is a matrix in which a non - 0 Values are expressed by position
one-hot To make the category more equitable , There is no priority between each other - call fit_transform() Method
to_dict(orient='record') - convert to list form
- Instantiate a converter
- Data exploration - Analysis data quality
-
Classification stage
- Decision tree model
1. Import decision tree model- Generate decision tree
- Fitting generates a decision tree
- Model to evaluate & forecast
- forecast - The decision tree outputs the prediction results
- assessment Known predicted values and real results - clf.score( features , Results tab ) Don't know the real prediction results - K Crossover verification - cross_val_score
- Decision tree visualization
- Decision tree model
-
Drawing stage - GraphViz
- Install first graphviz
- Import graphviz package - import graphviz
- sklearn Import export_graphviz
- First use export_graphviz The data to be displayed in the decision tree model
- Reuse graphviz Acquisition data source
- Data presentation
K Crossover verification
Take out most of the samples for training , A small amount is used for classifier verification - do K Secondary cross validation , Every time you select K One third of the data is verified , The rest is for training , take turns K Time , Average.
-
Divide the data set evenly into K Equal parts
-
Use 1 Data as test data , The rest is training data
-
Calculate the test accuracy
-
Use different test sets , repeat 2、3 step
copyright notice
author[Ada_ lake],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201270533544951.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- I don't know how to start this
- MySQL slow log optimization
guess what you like
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
-
C + + recursive knapsack problem
-
The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
-
Ignition database test
-
Context didn't understand why he got a high salary?, Nginxfair principle
-
Bootstrap switch switch control user's guide, springcloud actual combat video
-
A list that contains only strings. What other search methods can be used except sequential search
Random recommended
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of k8s management tool kubectl
- Android system view memory command