current position:Home>Hadoop learning 5-4: Hadoop 3 X new feature -- erasure code (erasure code)
Hadoop learning 5-4: Hadoop 3 X new feature -- erasure code (erasure code)
2022-01-27 01:06:51 【May you be treated warmly by the world】
1 Basic concepts
HDFS
Encode for erasure (EC
) Provided support , To store data more efficiently . Compared with the default three copy mechanism ,EC
Strategy can save about50%
Storage space
However, it can not be ignored that the operation of encoding and decoding will consume CPU resources . The codec performance of erasure correcting code is very important to it inHDFS
The application of in plays a vital role , If you do not use hardware optimization, it is difficult to get the ideal performance . Intel's intelligent storage Accelerator (ISA-L
) It provides the optimization of erasure code coding and decoding , Greatly improves its performance
The erasure code ishadoop3.x New features
, Previoushdfs
All of them adopt replica fault tolerance , By default , A file has3 Copies
, Can tolerate arbitrary 2 Copies (datanode
) Unavailable , This improves the availability of data , But it also brought 2 Times the redundancy overhead . for example3TB
Space , Can only store 1TB Valid data for . The erasure code can be used under the same availability , Save more space , WithRS-6-3-1024K
This erasure strategy is an example ,6 Raw data
, Generated after encoding3 Check data
, altogether9
Copy of the data , As long as there is6
Data exists , You can get the raw data , It can tolerate arbitrary3
Data is not available .
2 Erasure code operation
2.1 Check the erasure code strategy
hdfs ec -listPolicies
There are many of the above strategies , As mentioned above, the policy arrow points to , Here's one of them , other
And so on
RS-6-3-1024k
: UseRS
code , Every time6 Data units
, Generate3 A verification unit
, common9 A unit
, in other words : this9
In units , As long as there is any6
Units exist ( Whether it's a data unit or a verification unit , As long as the total =6
), You can get the raw data . For example, upload a40MB
The data of , Then it will40MB
For the data, press 1024KB
Divide completely into one piece (1024KB
It's alsoThe smallest data unit
). And in the strategy6
To represent a division6
A raw data part , about40MB
The data of , Divided into6 Parts of
, So each part is7MB
,7MB
The data can be seen asBy multiple 1024KB The composition of
, It also uses1024KB
computationally ( Because not every data content can be processed in one6
Integer multiple ) The raw data part is stored in6*7MB=42MB
, Instead, use the original number of copies to store ( What I set up here is3 individual
), Then the memory occupied is120MB
, Although the verification unit of erasure code strategy also occupies memory , But in theory, the space saved by erasure strategy is as high as50%
,
State
: Indicates the status of the policy . Above pictureRS-6-3-1024K
Indicates the open state
In theoryRS-6-3-1024k
need9
platformDataNode
,RS-3-2-1024k
need5
platformDataNode
Support , And so on
2.2 Erasure code policy settings
The erasure code strategy is related to the specific path (
path
) The associated . in other words , If we want to use erasure codes , Then set the erasure code strategy for a specific path , follow-up , All files stored in this directory , Will execute this policy
By default, only on RS-6-3-1024k Strategy support , If you want to use other policies, you need to enable
The following thought input
Directory settings RS-3-2-1024K
For example , Open the erasure code correction strategy , The original copy policy will not be used to store files
1、 Open to RS-3-2-1024k
Strategy support ( This policy can only be used after it is enabled )
# Turn on
hdfs ec -enablePolicy -policy RS-3-2-1024k
# Ban
hdfs ec -disablePolicy -policy RS-3-2-1024k
2、 stay HDFS Create directory , And set the erase policy
# directories creating
hdfs dfs -mkdir /input
# by input Directory setting policy
hdfs ec -setPolicy -path /input -policy RS-3-2-1024k
# Get the directory erasure code strategy
hdfs ec -getPolicy -path /input
3、 Upload files , And check the storage of the encoded file
Upload any file to HDFS
On , And check the number of copies ( The number of replicas set in the current cluster is 3
, And created 5 platform DataNode
, Theoretically RS-3-2-1024K
need 5 platform DataNode
Support )
You can see that the number of copies is 1
, Different from the setting . Click the file to see the storage of data , You can see in the 5
There are data on all machines ,5
The data on a machine is our 3
Data units and 2
Two inspection units , Each unit is on a machine , instead of 5
One unit on one machine . Only one copy of each unit will be saved
View the storage of files through the following files
hdfs fsck /input/aaa.txt -files -blocks -locations
2.3 Erasure code strategy test
According to the characteristics of erasure code strategy , Close one of them here DataNode
, So what's stored 5
In units , One will be missing , Try to get the file normally , Use the following command to get the file to the local
hadoop fs -get input/aaa.txt ./ec
Normally, it will report an error , But the storage is normal , open ec Document meeting
See that the file is completely copied to the local
copyright notice
author[May you be treated warmly by the world],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/01/202201270106479624.html
The sidebar is recommended
- Spring IOC container loading process
- [thinking] the difference between singleton mode and static method - object-oriented programming
- Hadoop environment setup (MySQL environment configuration)
- 10 minutes, using node JS creates a real-time early warning system for bad weather!
- Git tool
- Force deduction algorithm - 92 Reverse linked list II
- What is the sub problem of dynamic programming?
- C / C + +: static keyword summary
- Idea does not have the artifacts option when configuring Tomcat
- Anaconda can't open it
guess what you like
-
I don't know how to start this
-
Matlab simulation of transportation optimization algorithm based on PSO
-
MySQL slow log optimization
-
[Vue] as the window is stretched (larger, smaller, wider and higher), the text will not be displayed
-
Popular Linux distributions for embedded computing
-
Suzhou computer research
-
After installing SSL Certificate in Windows + tomcat, the domain name request is not successful. Please answer!!
-
Implementation time output and greetings of jQuery instance
-
The 72 year old uncle became popular. Wu Jing and Guo fan made his story into a film, which made countless dreamers blush
-
How to save computer research
Random recommended
- Springboot implements excel import and export, which is easy to use, and poi can be thrown away
- The final examination subjects of a class are mathematical programming, and the scores are sorted and output from high to low
- Two pronged approach, Tsinghua Professor Pro code JDK and hotspot source code notes, one-time learning to understand
- C + + recursive knapsack problem
- The use of GIT and GitHub and the latest git tutorial are easy to understand -- Video notes of crazy God speaking
- PostgreSQL statement query
- Ignition database test
- Context didn't understand why he got a high salary?, Nginxfair principle
- Bootstrap switch switch control user's guide, springcloud actual combat video
- A list that contains only strings. What other search methods can be used except sequential search
- [matlab path planning] multi ant colony algorithm grid map path planning [including GUI source code 650]
- [matlab path planning] improved genetic algorithm grid map path planning [including source code phase 525]
- Iinternet network path management system
- Appium settings app is not running after 5000ms
- Reactnative foundation - 07 (background image, status bar, statusbar)
- Reactnative foundation - 04 (custom rpx)
- If you want an embedded database (H2, hsql or Derby), please put it on the classpath
- When using stm32g070 Hal library, if you want to write to flash, you must perform an erase. If you don't let it, you can't write continuously.
- Linux checks where the software is installed and what files are installed
- SQL statement fuzzy query and time interval filtering
- 69. Sqrt (x) (c + + problem solving version with vs runnable source program)
- Fresh students are about to graduate. Do you choose Java development or big data?
- Java project: OA management system (java + SSM + bootstrap + MySQL + JSP)
- Titanic passenger survival prediction
- Vectorization of deep learning formula
- Configuration and use of private image warehouse of microservice architect docker
- Relearn JavaScript events
- For someone, delete return 1 and return 0
- How does Java dynamically obtain what type of data is passed? It is used to judge whether the data is the same, dynamic data type
- How does the database cow optimize SQL?
- [data structure] chain structure of binary tree (pre order traversal) (middle order traversal) (post order traversal) (sequence traversal)
- Webpack packaging optimization solution
- 5. Operation element
- Detailed explanation of red and black trees
- redhat7. 9 install database 19C
- Blue Bridge Cup notes: (the given elements are not repeated) complete arrangement (arrangement cannot be repeated, arrangement can be repeated)
- Detailed explanation of springboot default package scanning mechanism and @ componentscan specified scanning path
- How to solve the run-time exception of test times
- Detailed explanation of k8s management tool kubectl
- Android system view memory command