public class ShuffleStatus
extends Object
MapOutputTrackerMaster
to perform bookkeeping for a single
ShuffleMapStage.
This class maintains a mapping from mapIds to MapStatus
. It also maintains a cache of
serialized map statuses in order to speed up tasks' requests for map output statuses.
All public methods of this class are thread-safe.
Constructor and Description |
---|
ShuffleStatus(int numPartitions) |
Modifier and Type | Method and Description |
---|---|
void |
addMapOutput(int mapId,
org.apache.spark.scheduler.MapStatus status)
Register a map output.
|
scala.collection.Seq<Object> |
findMissingPartitions()
Returns the sequence of partition ids that are missing (i.e.
|
boolean |
hasCachedSerializedBroadcast() |
void |
invalidateSerializedMapOutputStatusCache()
Clears the cached serialized map output statuses.
|
org.apache.spark.scheduler.MapStatus[] |
mapStatuses()
MapStatus for each partition.
|
int |
numAvailableOutputs()
Number of partitions that have shuffle outputs.
|
void |
removeMapOutput(int mapId,
BlockManagerId bmAddress)
Remove the map output which was served by the specified block manager.
|
void |
removeOutputsByFilter(scala.Function1<BlockManagerId,Object> f)
Removes all shuffle outputs which satisfies the filter.
|
void |
removeOutputsOnExecutor(String execId)
Removes all map outputs associated with the specified executor.
|
void |
removeOutputsOnHost(String host)
Removes all shuffle outputs associated with this host.
|
byte[] |
serializedMapStatus(org.apache.spark.broadcast.BroadcastManager broadcastManager,
boolean isLocal,
int minBroadcastSize)
Serializes the mapStatuses array into an efficient compressed format.
|
<T> T |
withMapStatuses(scala.Function1<org.apache.spark.scheduler.MapStatus[],T> f)
Helper function which provides thread-safe access to the mapStatuses array.
|
public org.apache.spark.scheduler.MapStatus[] mapStatuses()
public void addMapOutput(int mapId, org.apache.spark.scheduler.MapStatus status)
mapId
- (undocumented)status
- (undocumented)public void removeMapOutput(int mapId, BlockManagerId bmAddress)
mapId
- (undocumented)bmAddress
- (undocumented)public void removeOutputsOnHost(String host)
host
- (undocumented)public void removeOutputsOnExecutor(String execId)
execId
- (undocumented)public void removeOutputsByFilter(scala.Function1<BlockManagerId,Object> f)
f
- (undocumented)public int numAvailableOutputs()
public scala.collection.Seq<Object> findMissingPartitions()
public byte[] serializedMapStatus(org.apache.spark.broadcast.BroadcastManager broadcastManager, boolean isLocal, int minBroadcastSize)
MapOutputTracker.serializeMapStatuses()
for more details on the serialization format.
This method is designed to be called multiple times and implements caching in order to speed up subsequent requests. If the cache is empty and multiple threads concurrently attempt to serialize the map statuses then serialization will only be performed in a single thread and all other threads will block until the cache is populated.
broadcastManager
- (undocumented)isLocal
- (undocumented)minBroadcastSize
- (undocumented)public boolean hasCachedSerializedBroadcast()
public <T> T withMapStatuses(scala.Function1<org.apache.spark.scheduler.MapStatus[],T> f)
f
- (undocumented)public void invalidateSerializedMapOutputStatusCache()