TaskSchedulerImpl (Spark 1.3.1 JavaDoc)

Object
- org.apache.spark.scheduler.TaskSchedulerImpl

All Implemented Interfaces:

Logging, TaskScheduler
```
public class TaskSchedulerImpl
extends Object
implements TaskScheduler, Logging
```
Schedules tasks for multiple types of clusters by acting through a SchedulerBackend. It can also work with a local setup by using a LocalBackend and setting isLocal to true. It handles common logic, like determining a scheduling order across jobs, waking up to launch speculative tasks, etc.
Clients should first call initialize() and start(), then submit task sets through the runTasks method.
THREADING: SchedulerBackends and task-submitting clients can call this class from multiple threads, so it needs locks in public API methods to maintain its state. In addition, some SchedulerBackends synchronize on themselves when they want to send events here, and then acquire a lock on us, so we need to make sure that we don't try to lock the backend while we are holding a lock on ourselves.

Constructor Summary

Constructors
Constructor and Description

TaskSchedulerImpl(SparkContext sc)

TaskSchedulerImpl(SparkContext sc, int maxTaskFailures, boolean isLocal)

Constructors
Constructor and Description
`TaskSchedulerImpl(SparkContext sc)`
`TaskSchedulerImpl(SparkContext sc, int maxTaskFailures, boolean isLocal)`

Method Summary

Methods
Modifier and Type	Method and Description
`scala.collection.mutable.HashSet<String>`	`activeExecutorIds()`
`scala.collection.mutable.HashMap<String,TaskSetManager>`	`activeTaskSets()`
`String`	`applicationId()` Get an application ID associated with the job.
`SchedulerBackend`	`backend()`
`void`	`cancelTasks(int stageId, boolean interruptThread)`
`void`	`checkSpeculatableTasks()`
`SparkConf`	`conf()`
`int`	`CPUS_PER_TASK()`
`TaskSetManager`	`createTaskSetManager(TaskSet taskSet, int maxTaskFailures)`
`DAGScheduler`	`dagScheduler()`
`int`	`defaultParallelism()`
`void`	`error(String message)`
`void`	`executorAdded(String execId, String host)`
`boolean`	`executorHeartbeatReceived(String execId, scala.Tuple2<Object,org.apache.spark.executor.TaskMetrics>[] taskMetrics, BlockManagerId blockManagerId)` Update metrics for in-progress tasks and let the master know that the BlockManager is still alive.
`void`	`executorLost(String executorId, ExecutorLossReason reason)`
`scala.Option<scala.collection.immutable.Set<String>>`	`getExecutorsAliveOnHost(String host)`
`scala.Option<String>`	`getRackForHost(String value)`
`void`	`handleFailedTask(TaskSetManager taskSetManager, long tid, scala.Enumeration.Value taskState, TaskEndReason reason)`
`void`	`handleSuccessfulTask(TaskSetManager taskSetManager, long tid, DirectTaskResult<?> taskResult)`
`void`	`handleTaskGettingResult(TaskSetManager taskSetManager, long tid)`
`boolean`	`hasExecutorsAliveOnHost(String host)`
`boolean`	`hasHostAliveOnRack(String rack)`
`void`	`initialize(SchedulerBackend backend)`
`boolean`	`isExecutorAlive(String execId)`
`MapOutputTracker`	`mapOutputTracker()`
`int`	`maxTaskFailures()`
`long`	`newTaskId()`
`java.util.concurrent.atomic.AtomicLong`	`nextTaskId()`
`void`	`postStartHook()`
`static <K,T> scala.collection.immutable.List<T>`	`prioritizeContainers(scala.collection.mutable.HashMap<K,scala.collection.mutable.ArrayBuffer<T>> map)` Used to balance containers across hosts.
`scala.collection.Seq<scala.collection.Seq<TaskDescription>>`	`resourceOffers(scala.collection.Seq<WorkerOffer> offers)` Called by cluster manager to offer resources on slaves.
`Pool`	`rootPool()`
`SparkContext`	`sc()`
`SchedulableBuilder`	`schedulableBuilder()`
`scala.Enumeration.Value`	`schedulingMode()`
`void`	`setDAGScheduler(DAGScheduler dagScheduler)`
`long`	`SPECULATION_INTERVAL()`
`void`	`start()`
`long`	`STARVATION_TIMEOUT()`
`void`	`statusUpdate(long tid, scala.Enumeration.Value state, java.nio.ByteBuffer serializedData)`
`void`	`stop()`
`void`	`submitTasks(TaskSet taskSet)`
`scala.collection.mutable.HashMap<Object,String>`	`taskIdToExecutorId()`
`scala.collection.mutable.HashMap<Object,String>`	`taskIdToTaskSetId()`
`TaskResultGetter`	`taskResultGetter()`
`void`	`taskSetFinished(TaskSetManager manager)` Called to indicate that all task attempts (including speculated tasks) associated with the given TaskSetManager have completed, so state associated with the TaskSetManager should be cleaned up.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.scheduler.TaskScheduler
appId

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

TaskSchedulerImpl

public TaskSchedulerImpl(SparkContext sc,
                 int maxTaskFailures,
                 boolean isLocal)

TaskSchedulerImpl

public TaskSchedulerImpl(SparkContext sc)

Method Detail

prioritizeContainers
```
public static <K,T> scala.collection.immutable.List<T> prioritizeContainers(scala.collection.mutable.HashMap<K,scala.collection.mutable.ArrayBuffer<T>> map)
```
Used to balance containers across hosts.
Accepts a map of hosts to resource offers for that host, and returns a prioritized list of resource offers representing the order in which the offers should be used. The resource offers are ordered such that we'll allocate one container on each host before allocating a second container on any host, and so on, in order to reduce the damage if a host fails.
For example, given , , , returns [o1, o5, o4, 02, o6, o3]

sc
```
public SparkContext sc()
```

maxTaskFailures
```
public int maxTaskFailures()
```

conf
```
public SparkConf conf()
```

SPECULATION_INTERVAL
```
public long SPECULATION_INTERVAL()
```

STARVATION_TIMEOUT
```
public long STARVATION_TIMEOUT()
```

CPUS_PER_TASK
```
public int CPUS_PER_TASK()
```

activeTaskSets

public scala.collection.mutable.HashMap<String,TaskSetManager> activeTaskSets()

taskIdToTaskSetId

public scala.collection.mutable.HashMap<Object,String> taskIdToTaskSetId()

taskIdToExecutorId

public scala.collection.mutable.HashMap<Object,String> taskIdToExecutorId()

nextTaskId

public java.util.concurrent.atomic.AtomicLong nextTaskId()

activeExecutorIds

public scala.collection.mutable.HashSet<String> activeExecutorIds()

dagScheduler
```
public DAGScheduler dagScheduler()
```

backend
```
public SchedulerBackend backend()
```

mapOutputTracker

public MapOutputTracker mapOutputTracker()

schedulableBuilder

public SchedulableBuilder schedulableBuilder()

rootPool
```
public Pool rootPool()
```
Specified by:

rootPool in interface TaskScheduler

schedulingMode
```
public scala.Enumeration.Value schedulingMode()
```
Specified by:

schedulingMode in interface TaskScheduler

taskResultGetter

public TaskResultGetter taskResultGetter()

setDAGScheduler
```
public void setDAGScheduler(DAGScheduler dagScheduler)
```
Specified by:

setDAGScheduler in interface TaskScheduler

initialize

public void initialize(SchedulerBackend backend)

newTaskId
```
public long newTaskId()
```

start
```
public void start()
```
Specified by:

start in interface TaskScheduler

postStartHook
```
public void postStartHook()
```
Specified by:

postStartHook in interface TaskScheduler

submitTasks
```
public void submitTasks(TaskSet taskSet)
```
Specified by:

submitTasks in interface TaskScheduler

createTaskSetManager

public TaskSetManager createTaskSetManager(TaskSet taskSet,
                                  int maxTaskFailures)

cancelTasks

public void cancelTasks(int stageId,
               boolean interruptThread)

Specified by:: cancelTasks in interface TaskScheduler

taskSetFinished
```
public void taskSetFinished(TaskSetManager manager)
```
Called to indicate that all task attempts (including speculated tasks) associated with the given TaskSetManager have completed, so state associated with the TaskSetManager should be cleaned up.

resourceOffers
```
public scala.collection.Seq<scala.collection.Seq<TaskDescription>> resourceOffers(scala.collection.Seq<WorkerOffer> offers)
```
Called by cluster manager to offer resources on slaves. We respond by asking our active task sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so that tasks are balanced across the cluster.

statusUpdate

public void statusUpdate(long tid,
                scala.Enumeration.Value state,
                java.nio.ByteBuffer serializedData)

executorHeartbeatReceived
```
public boolean executorHeartbeatReceived(String execId,
                                scala.Tuple2<Object,org.apache.spark.executor.TaskMetrics>[] taskMetrics,
                                BlockManagerId blockManagerId)
```
Update metrics for in-progress tasks and let the master know that the BlockManager is still alive. Return true if the driver knows about the given block manager. Otherwise, return false, indicating that the block manager should re-register.

Specified by:

executorHeartbeatReceived in interface TaskScheduler

handleTaskGettingResult

public void handleTaskGettingResult(TaskSetManager taskSetManager,
                           long tid)

handleSuccessfulTask

public void handleSuccessfulTask(TaskSetManager taskSetManager,
                        long tid,
                        DirectTaskResult<?> taskResult)

handleFailedTask

public void handleFailedTask(TaskSetManager taskSetManager,
                    long tid,
                    scala.Enumeration.Value taskState,
                    TaskEndReason reason)

error
```
public void error(String message)
```

stop
```
public void stop()
```
Specified by:

stop in interface TaskScheduler

defaultParallelism
```
public int defaultParallelism()
```
Specified by:

defaultParallelism in interface TaskScheduler

checkSpeculatableTasks
```
public void checkSpeculatableTasks()
```

executorLost

public void executorLost(String executorId,
                ExecutorLossReason reason)

executorAdded

public void executorAdded(String execId,
                 String host)

getExecutorsAliveOnHost

public scala.Option<scala.collection.immutable.Set<String>> getExecutorsAliveOnHost(String host)

hasExecutorsAliveOnHost

public boolean hasExecutorsAliveOnHost(String host)

hasHostAliveOnRack

public boolean hasHostAliveOnRack(String rack)

isExecutorAlive

public boolean isExecutorAlive(String execId)

getRackForHost

public scala.Option<String> getRackForHost(String value)

applicationId
```
public String applicationId()
```
Description copied from interface: TaskScheduler

Get an application ID associated with the job.

Specified by:

applicationId in interface TaskScheduler

Returns:
An application ID

Class TaskSchedulerImpl

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.scheduler.TaskScheduler

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

TaskSchedulerImpl

TaskSchedulerImpl

Method Detail

prioritizeContainers

sc

maxTaskFailures

conf

SPECULATION_INTERVAL

STARVATION_TIMEOUT

CPUS_PER_TASK

activeTaskSets

taskIdToTaskSetId

taskIdToExecutorId

nextTaskId

activeExecutorIds

dagScheduler

backend

mapOutputTracker

schedulableBuilder

rootPool

schedulingMode

taskResultGetter

setDAGScheduler

initialize

newTaskId

start

postStartHook

submitTasks

createTaskSetManager

cancelTasks

taskSetFinished

resourceOffers

statusUpdate

executorHeartbeatReceived

handleTaskGettingResult

handleSuccessfulTask

handleFailedTask

error

stop

defaultParallelism

checkSpeculatableTasks

executorLost

executorAdded

getExecutorsAliveOnHost

hasExecutorsAliveOnHost

hasHostAliveOnRack

isExecutorAlive

getRackForHost

applicationId