public class ShuffleDependency<K,V,C> extends Dependency<scala.Product2<K,V>> implements org.apache.spark.internal.Logging
param: _rdd the parent RDD
param: partitioner partitioner used to partition the shuffle output
param: serializer Serializer
to use. If not set
explicitly then the default serializer, as specified by spark.serializer
config option, will be used.
param: keyOrdering key ordering for RDD's shuffles
param: aggregator map/reduce-side aggregator for RDD's shuffle
param: mapSideCombine whether to perform partial aggregation (also known as map-side combine)
param: shuffleWriterProcessor the processor to control the write behavior in ShuffleMapTask
Constructor and Description |
---|
ShuffleDependency(RDD<? extends scala.Product2<K,V>> _rdd,
Partitioner partitioner,
Serializer serializer,
scala.Option<scala.math.Ordering<K>> keyOrdering,
scala.Option<Aggregator<K,V,C>> aggregator,
boolean mapSideCombine,
org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor,
scala.reflect.ClassTag<K> evidence$1,
scala.reflect.ClassTag<V> evidence$2,
scala.reflect.ClassTag<C> evidence$3) |
Modifier and Type | Method and Description |
---|---|
scala.Option<Aggregator<K,V,C>> |
aggregator() |
scala.collection.Seq<BlockManagerId> |
getMergerLocs() |
scala.Option<scala.math.Ordering<K>> |
keyOrdering() |
boolean |
mapSideCombine() |
void |
newShuffleMergeState() |
Partitioner |
partitioner() |
RDD<scala.Product2<K,V>> |
rdd() |
Serializer |
serializer() |
void |
setMergerLocs(scala.collection.Seq<BlockManagerId> mergerLocs) |
org.apache.spark.shuffle.ShuffleHandle |
shuffleHandle() |
int |
shuffleId() |
boolean |
shuffleMergeEnabled() |
boolean |
shuffleMergeFinalized()
Returns true if push-based shuffle is disabled for this stage or empty RDD,
or if the shuffle merge for this stage is finalized, i.e.
|
int |
shuffleMergeId()
shuffleMergeId is used to uniquely identify merging process of shuffle
by an indeterminate stage attempt.
|
org.apache.spark.shuffle.ShuffleWriteProcessor |
shuffleWriterProcessor() |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public ShuffleDependency(RDD<? extends scala.Product2<K,V>> _rdd, Partitioner partitioner, Serializer serializer, scala.Option<scala.math.Ordering<K>> keyOrdering, scala.Option<Aggregator<K,V,C>> aggregator, boolean mapSideCombine, org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor, scala.reflect.ClassTag<K> evidence$1, scala.reflect.ClassTag<V> evidence$2, scala.reflect.ClassTag<C> evidence$3)
public Partitioner partitioner()
public Serializer serializer()
public scala.Option<scala.math.Ordering<K>> keyOrdering()
public scala.Option<Aggregator<K,V,C>> aggregator()
public boolean mapSideCombine()
public org.apache.spark.shuffle.ShuffleWriteProcessor shuffleWriterProcessor()
public RDD<scala.Product2<K,V>> rdd()
rdd
in class Dependency<scala.Product2<K,V>>
public int shuffleId()
public org.apache.spark.shuffle.ShuffleHandle shuffleHandle()
public boolean shuffleMergeEnabled()
public int shuffleMergeId()
public void setMergerLocs(scala.collection.Seq<BlockManagerId> mergerLocs)
public scala.collection.Seq<BlockManagerId> getMergerLocs()
public boolean shuffleMergeFinalized()
public void newShuffleMergeState()