current position：Home>How is data skew caused in spark?
How is data skew caused in spark?
2022-02-04 16:31:40 【Alibaba cloud Q & A】
Spark How is the data skew caused ？
Take the answer 1：
stay Spark in , The same Stage Different Partition It can be processed in parallel , And the difference with dependency Stage It's serial processing between them . Suppose that one Spark Job It is divided into Stage 0 and Stage 1 Two Stage, And Stage 1 Depend on Stage 0, that Stage 0 It won't be processed until it's completely processed Stage 1. and Stage 0 May contain N individual Task, this N individual Task It can be done in parallel . If one N-1 individual Task All in 10 seconds , And the other Task But it takes time 1 minute , Then Stage The total time is at least 1 minute . let me put it another way , One Stage Time spent , Mainly by the slowest one Task decision . Because of the same Stage In all of the Task Perform the same calculation , On the premise of excluding the difference of computing power of different computing nodes , Different Task The time difference between them is mainly due to Task Determination of the amount of data processed .
author[Alibaba cloud Q & A],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Is there any difference between on and where in SQL?
- What is the migration scheme for DB2 for LUW, the source of DTS?
- What are the differences between MySQL InnoDB and MyISAM?
- Serverless deconstructs the game industry's pain points. What are the data, operation and maintenance and cost control that customers pay attention to?
- What is the difference in principle between MPP and Mr?
- What is the migration scheme for DTS whose source is DB2 for I?
- What is the migration scheme for Teradata, the source of DTS?
- The principle of database index why use B + tree?
- The source of DTS is self built HBase. What is the migration scheme?
- What is Mars?
guess what you like
What is the migration scheme for the third-party cloud as the source of DTS?
Appbarlayout of Android realizes the effect of hovering adsorption and scaling, and Android develops games
Introduction to the filter interface in Android, androidhook mechanism, even simple actual combat will not get high salary
Common design patterns in Android are known only when you become a project manager
Android - customize the title bar, which is a necessary knowledge point for Android advanced development post
Android - teach you how to develop a smile capture artifact on Android in 30 minutes
Android SMS encryption, contact letter sidebar positioning, Android development will be able to use technology
Running time problems when scanf is different
It's hard to start with small problems in C language
Speech emotion analyzer
- On the linear programming problem, please solve it
- When running Vue, it will report an error and download the resource package. That's it
- How to remove 50Hz power frequency interference
- Help, please call me in C language
- C language programming, can someone help me
- How to call CSH command under Linux with Python
- What data can observation cloud collect
- What application scenarios are cloud observation applicable to?
- How to convert this code difference
- What deployment methods does observation cloud support?
- How do you use DEM to extract the geomorphic parameters of river valley width height ratio? Are there any steps? Red envelope paid
- What's wrong with the proportion of MAC display
- Spring project WebSphere annotation web XML filter error
- I want to ask you a question
- How does "observation cloud" perceive abnormal data?
- How should beginners do this problem
- Power supply mode of DP interface 20pin VDD
- For beginners' exercises, help!
- Quartus simulation stops at this interface
- Pseudo code method for solving missionary and savage problems
- Why is the second output different?
- How to do exercise three and exercise four?
- The cells in latex coincide with the contents
- How to install Gnome on Linux
- Questions about errors caused by multithreaded access to bitmap objects that are currently being used elsewhere
- Questions about C: 2 Randomly generate an integer array intarr (6 elements), and then define a string array strarr to convert the elements in the integer array into strings and store them in the string array
- ABAP database table could not be activated.
- Why does Fiddler show that the network is unavailable after the mobile agent is configured for remote connection
- Creating any project with dev C + + will report the following errors. What's going on.
- Which of the following arrays is not 5 in length?
- What's wrong with this code
- How to compile this solution
- Information matrix in visual slam
- How to do the direct router? Please let me know
- For the interface design of mobile application foundation, it is not clear how to add response mode
- C language code problem, what went wrong
- Who knows vs why
- How can things in idea be exported normally
- On Jacobian matrix in visual slam
- What type of wechat applet does the freshman change develop
- What are the main considerations of MapReduce optimization method?