DAGScheduler (Spark 1.2.1 JavaDoc)

Object
- org.apache.spark.scheduler.DAGScheduler

All Implemented Interfaces:

Logging
```
public class DAGScheduler
extends Object
implements Logging
```
The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a minimal schedule to run the job. It then submits stages as TaskSets to an underlying TaskScheduler implementation that runs them on the cluster.
In addition to coming up with a DAG of stages, this class also determines the preferred locations to run each task on, based on the current cache status, and passes these to the low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task a small number of times before cancelling the whole stage.

Constructor Summary

Constructors
Constructor and Description
`DAGScheduler(SparkContext sc)`
`DAGScheduler(SparkContext sc, TaskScheduler taskScheduler)`
`DAGScheduler(SparkContext sc, TaskScheduler taskScheduler, LiveListenerBus listenerBus, MapOutputTrackerMaster mapOutputTracker, BlockManagerMaster blockManagerMaster, SparkEnv env, Clock clock)`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`abortStage(Stage failedStage, String reason)` Aborts all jobs depending on a particular Stage.
`scala.collection.mutable.HashSet<ActiveJob>`	`activeJobs()`
`void`	`cancelAllJobs()` Cancel all jobs that are running or waiting in the queue.
`void`	`cancelJob(int jobId)` Cancel a job that is running or waiting in the queue.
`void`	`cancelJobGroup(String groupId)`
`void`	`cancelStage(int stageId)` Cancel all jobs associated with a running or scheduled stage.
`void`	`cleanUpAfterSchedulerStop()`
`void`	`doCancelAllJobs()`
`akka.actor.ActorRef`	`eventProcessActor()`
`void`	`executorAdded(String execId, String host)`
`boolean`	`executorHeartbeatReceived(String execId, scala.Tuple4<Object,Object,Object,org.apache.spark.executor.TaskMetrics>[] taskMetrics, BlockManagerId blockManagerId)` Update metrics for in-progress tasks and let the master know that the BlockManager is still alive.
`void`	`executorLost(String execId)`
`scala.collection.mutable.HashSet<Stage>`	`failedStages()`
`scala.collection.Seq<TaskLocation>`	`getPreferredLocs(RDD<?> rdd, int partition)` Synchronized method that might be called from other threads.
`void`	`handleBeginEvent(Task<?> task, TaskInfo taskInfo)`
`void`	`handleExecutorAdded(String execId, String host)`
`void`	`handleExecutorLost(String execId, boolean fetchFailed, scala.Option<Object> maybeEpoch)` Responds to an executor being lost.
`void`	`handleGetTaskResult(TaskInfo taskInfo)`
`void`	`handleJobCancellation(int jobId, String reason)`
`void`	`handleJobGroupCancelled(String groupId)`
`void`	`handleJobSubmitted(int jobId, RDD<?> finalRDD, scala.Function2<TaskContext,scala.collection.Iterator<Object>,?> func, int[] partitions, boolean allowLocal, CallSite callSite, JobListener listener, java.util.Properties properties)`
`void`	`handleStageCancellation(int stageId)`
`void`	`handleTaskCompletion(CompletionEvent event)` Responds to a task finishing.
`void`	`handleTaskSetFailed(TaskSet taskSet, String reason)`
`scala.collection.mutable.HashMap<Object,ActiveJob>`	`jobIdToActiveJob()`
`scala.collection.mutable.HashMap<Object,scala.collection.mutable.HashSet<Object>>`	`jobIdToStageIds()`
`java.util.concurrent.atomic.AtomicInteger`	`nextJobId()`
`int`	`numTotalJobs()`
`static long`	`POLL_TIMEOUT()`
`static scala.concurrent.duration.FiniteDuration`	`RESUBMIT_TIMEOUT()`
`void`	`resubmitFailedStages()` Resubmit any failed stages.
`<T,U,R> PartialResult<R>`	`runApproximateJob(RDD<T> rdd, scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func, ApproximateEvaluator<U,R> evaluator, CallSite callSite, long timeout, java.util.Properties properties)`
`<T,U> void`	`runJob(RDD<T> rdd, scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func, scala.collection.Seq<Object> partitions, CallSite callSite, boolean allowLocal, scala.Function2<Object,U,scala.runtime.BoxedUnit> resultHandler, java.util.Properties properties, scala.reflect.ClassTag<U> evidence$1)`
`scala.collection.mutable.HashSet<Stage>`	`runningStages()`
`SparkContext`	`sc()`
`scala.collection.mutable.HashMap<Object,Stage>`	`shuffleToMapStage()`
`scala.collection.mutable.HashMap<Object,Stage>`	`stageIdToStage()`
`void`	`stop()`
`<T,U> JobWaiter<U>`	`submitJob(RDD<T> rdd, scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func, scala.collection.Seq<Object> partitions, CallSite callSite, boolean allowLocal, scala.Function2<Object,U,scala.runtime.BoxedUnit> resultHandler, java.util.Properties properties)` Submit a job to the job scheduler and get a JobWaiter object back.
`void`	`taskEnded(Task<?> task, TaskEndReason reason, Object result, scala.collection.mutable.Map<Object,Object> accumUpdates, TaskInfo taskInfo, org.apache.spark.executor.TaskMetrics taskMetrics)`
`void`	`taskGettingResult(TaskInfo taskInfo)`
`TaskScheduler`	`taskScheduler()`
`void`	`taskSetFailed(TaskSet taskSet, String reason)`
`void`	`taskStarted(Task<?> task, TaskInfo taskInfo)`
`scala.collection.mutable.HashSet<Stage>`	`waitingStages()`

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

DAGScheduler

public DAGScheduler(SparkContext sc,
            TaskScheduler taskScheduler,
            LiveListenerBus listenerBus,
            MapOutputTrackerMaster mapOutputTracker,
            BlockManagerMaster blockManagerMaster,
            SparkEnv env,
            Clock clock)

DAGScheduler

public DAGScheduler(SparkContext sc,
            TaskScheduler taskScheduler)

DAGScheduler
```
public DAGScheduler(SparkContext sc)
```

Method Detail

RESUBMIT_TIMEOUT

public static scala.concurrent.duration.FiniteDuration RESUBMIT_TIMEOUT()

POLL_TIMEOUT
```
public static long POLL_TIMEOUT()
```

sc
```
public SparkContext sc()
```

taskScheduler
```
public TaskScheduler taskScheduler()
```

nextJobId

public java.util.concurrent.atomic.AtomicInteger nextJobId()

numTotalJobs
```
public int numTotalJobs()
```

jobIdToStageIds

public scala.collection.mutable.HashMap<Object,scala.collection.mutable.HashSet<Object>> jobIdToStageIds()

stageIdToStage

public scala.collection.mutable.HashMap<Object,Stage> stageIdToStage()

shuffleToMapStage

public scala.collection.mutable.HashMap<Object,Stage> shuffleToMapStage()

jobIdToActiveJob

public scala.collection.mutable.HashMap<Object,ActiveJob> jobIdToActiveJob()

waitingStages

public scala.collection.mutable.HashSet<Stage> waitingStages()

runningStages

public scala.collection.mutable.HashSet<Stage> runningStages()

failedStages

public scala.collection.mutable.HashSet<Stage> failedStages()

activeJobs

public scala.collection.mutable.HashSet<ActiveJob> activeJobs()

eventProcessActor

public akka.actor.ActorRef eventProcessActor()

taskStarted

public void taskStarted(Task<?> task,
               TaskInfo taskInfo)

taskGettingResult

public void taskGettingResult(TaskInfo taskInfo)

taskEnded

public void taskEnded(Task<?> task,
             TaskEndReason reason,
             Object result,
             scala.collection.mutable.Map<Object,Object> accumUpdates,
             TaskInfo taskInfo,
             org.apache.spark.executor.TaskMetrics taskMetrics)

executorHeartbeatReceived

public boolean executorHeartbeatReceived(String execId,
                                scala.Tuple4<Object,Object,Object,org.apache.spark.executor.TaskMetrics>[] taskMetrics,
                                BlockManagerId blockManagerId)

Update metrics for in-progress tasks and let the master know that the BlockManager is still alive. Return true if the driver knows about the given block manager. Otherwise, return false, indicating that the block manager should re-register.

executorLost

public void executorLost(String execId)

executorAdded

public void executorAdded(String execId,
                 String host)

taskSetFailed

public void taskSetFailed(TaskSet taskSet,
                 String reason)

submitJob

public <T,U> JobWaiter<U> submitJob(RDD<T> rdd,
                           scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func,
                           scala.collection.Seq<Object> partitions,
                           CallSite callSite,
                           boolean allowLocal,
                           scala.Function2<Object,U,scala.runtime.BoxedUnit> resultHandler,
                           java.util.Properties properties)

Submit a job to the job scheduler and get a JobWaiter object back. The JobWaiter object can be used to block until the the job finishes executing or can be used to cancel the job.

runJob

public <T,U> void runJob(RDD<T> rdd,
                scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func,
                scala.collection.Seq<Object> partitions,
                CallSite callSite,
                boolean allowLocal,
                scala.Function2<Object,U,scala.runtime.BoxedUnit> resultHandler,
                java.util.Properties properties,
                scala.reflect.ClassTag<U> evidence$1)

runApproximateJob

public <T,U,R> PartialResult<R> runApproximateJob(RDD<T> rdd,
                                         scala.Function2<TaskContext,scala.collection.Iterator<T>,U> func,
                                         ApproximateEvaluator<U,R> evaluator,
                                         CallSite callSite,
                                         long timeout,
                                         java.util.Properties properties)

cancelJob
```
public void cancelJob(int jobId)
```
Cancel a job that is running or waiting in the queue.

cancelJobGroup

public void cancelJobGroup(String groupId)

cancelAllJobs
```
public void cancelAllJobs()
```
Cancel all jobs that are running or waiting in the queue.

doCancelAllJobs
```
public void doCancelAllJobs()
```

cancelStage
```
public void cancelStage(int stageId)
```
Cancel all jobs associated with a running or scheduled stage.

resubmitFailedStages
```
public void resubmitFailedStages()
```
Resubmit any failed stages. Ordinarily called after a small amount of time has passed since the last fetch failure.

handleJobGroupCancelled

public void handleJobGroupCancelled(String groupId)

handleBeginEvent

public void handleBeginEvent(Task<?> task,
                    TaskInfo taskInfo)

handleTaskSetFailed

public void handleTaskSetFailed(TaskSet taskSet,
                       String reason)

cleanUpAfterSchedulerStop

public void cleanUpAfterSchedulerStop()

handleGetTaskResult

public void handleGetTaskResult(TaskInfo taskInfo)

handleJobSubmitted

public void handleJobSubmitted(int jobId,
                      RDD<?> finalRDD,
                      scala.Function2<TaskContext,scala.collection.Iterator<Object>,?> func,
                      int[] partitions,
                      boolean allowLocal,
                      CallSite callSite,
                      JobListener listener,
                      java.util.Properties properties)

handleTaskCompletion
```
public void handleTaskCompletion(CompletionEvent event)
```
Responds to a task finishing. This is called inside the event loop so it assumes that it can modify the scheduler's internal state. Use taskEnded() to post a task end event from outside.

handleExecutorLost
```
public void handleExecutorLost(String execId,
                      boolean fetchFailed,
                      scala.Option<Object> maybeEpoch)
```
Responds to an executor being lost. This is called inside the event loop, so it assumes it can modify the scheduler's internal state. Use executorLost() to post a loss event from outside.
We will also assume that we've lost all shuffle blocks associated with the executor if the executor serves its own blocks (i.e., we're not using external shuffle) OR a FetchFailed occurred, in which case we presume all shuffle data related to this executor to be lost.
Optionally the epoch during which the failure was caught can be passed to avoid allowing stray fetch failures from possibly retriggering the detection of a node as lost.

handleExecutorAdded

public void handleExecutorAdded(String execId,
                       String host)

handleStageCancellation

public void handleStageCancellation(int stageId)

handleJobCancellation

public void handleJobCancellation(int jobId,
                         String reason)

abortStage
```
public void abortStage(Stage failedStage,
              String reason)
```
Aborts all jobs depending on a particular Stage. This is called in response to a task set being canceled by the TaskScheduler. Use taskSetFailed() to inject this event from outside.

getPreferredLocs
```
public scala.collection.Seq<TaskLocation> getPreferredLocs(RDD<?> rdd,
                                                  int partition)
```
Synchronized method that might be called from other threads.

Parameters:
rdd - whose partitions are to be looked at
partition - to lookup locality information for

Returns:
list of machines that are preferred by the partition

stop
```
public void stop()
```

Class DAGScheduler

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

DAGScheduler

DAGScheduler

DAGScheduler

Method Detail

RESUBMIT_TIMEOUT

POLL_TIMEOUT

sc

taskScheduler

nextJobId

numTotalJobs

jobIdToStageIds

stageIdToStage

shuffleToMapStage

jobIdToActiveJob

waitingStages

runningStages

failedStages

activeJobs

eventProcessActor

taskStarted

taskGettingResult

taskEnded

executorHeartbeatReceived

executorLost

executorAdded

taskSetFailed

submitJob

runJob

runApproximateJob

cancelJob

cancelJobGroup

cancelAllJobs

doCancelAllJobs

cancelStage

resubmitFailedStages

handleJobGroupCancelled

handleBeginEvent

handleTaskSetFailed

cleanUpAfterSchedulerStop

handleGetTaskResult

handleJobSubmitted

handleTaskCompletion

handleExecutorLost

handleExecutorAdded

handleStageCancellation

handleJobCancellation

abortStage

getPreferredLocs

stop