Partitioner (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark
Class Partitioner

Object
  org.apache.spark.Partitioner

All Implemented Interfaces:: java.io.Serializable

Direct Known Subclasses:: HashPartitioner, RangePartitioner

public abstract class Partitioner
extends Object
implements scala.Serializable
extends Object
implements scala.Serializable

An object that defines how the elements in a key-value pair RDD are partitioned by key. Maps each key to a partition ID, from 0 to numPartitions - 1.

See Also:: Serialized Form

Constructor Summary
`Partitioner()`

Method Summary
`static Partitioner`	`defaultPartitioner(RDD<?> rdd, scala.collection.Seq<RDD<?>> others)` Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
`abstract int`	`getPartition(Object key)`
`abstract int`	`numPartitions()`

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Partitioner

public Partitioner()

Method Detail

defaultPartitioner

public static Partitioner defaultPartitioner(RDD<?> rdd,
                                             scala.collection.Seq<RDD<?>> others)

Choose a partitioner to use for a cogroup-like operation between a number of RDDs.

If any of the RDDs already has a partitioner, choose that one.

Otherwise, we use a default HashPartitioner. For the number of partitions, if spark.default.parallelism is set, then we'll use the value from SparkContext defaultParallelism, otherwise we'll use the max number of upstream partitions.

Unless spark.default.parallelism is set, the number of partitions will be the same as the number of partitions in the largest upstream RDD, as this should be least likely to cause out-of-memory errors.

We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD.

Parameters:: rdd - (undocumented); others - (undocumented)
Returns:: (undocumented)

numPartitions

public abstract int numPartitions()

getPartition

public abstract int getPartition(Object key)