current position:Home>What does RDD in spark mean?

What does RDD in spark mean?

2022-02-04 16:32:02 Alibaba cloud Q & A

spark In the middle of RDD What do you mean ?




Take the answer 1:

RDD yes Spark Soul , Also known as elastic distributed data sets . One RDD Represents a read-only dataset that can be partitioned .RDD There can be many partitions inside (partitions), Each partition has a lot of records (records).Rdd Five characteristics of : 1. dependencies: establish RDD Dependency of , The main RDD There is a wide and narrow dependence between them , Having a narrow dependency RDD Can be in the same stage Calculation in . 2. partition: One RDD There will be several zones , The size of the partition determines the size of this RDD The granularity of computation , Every RDD All of the partitions are calculated in a single task . 3. preferedlocations: according to “ Mobile data is not as good as mobile computing ” principle , stay Spark When scheduling tasks , Priority is given to assigning tasks to locations where data blocks are stored . 4. compute: Spark The calculation in is based on partition ,compute Function just composes iterators , The results of a single calculation are not saved . 5. partitioner: Is only found in (K,V) Type of RDD in , Not (K,V) Type of partitioner The value is None.


copyright notice
author[Alibaba cloud Q & A],Please bring the original link to reprint, thank you.
https://en.cdmana.com/2022/02/202202041632011476.html

Random recommended