CoGroupedRDD (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.rdd
Class CoGroupedRDD<K>

Object
  org.apache.spark.rdd.RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
      org.apache.spark.rdd.CoGroupedRDD<K>

All Implemented Interfaces:: java.io.Serializable, Logging

public class CoGroupedRDD<K>
extends RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
extends RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>

:: DeveloperApi :: A RDD that cogroups its parents. For each key k in parent RDDs, the resulting RDD contains a tuple with the list of values for that key.

Note: This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly.

param: rdds parent RDDs. param: part partitioner used to partition the shuffle output

See Also:: Serialized Form

Constructor Summary
`CoGroupedRDD(scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds, Partitioner part)`

Method Summary
`void`	`clearDependencies()` Clears the dependencies of this RDD.
`scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>`	`compute(Partition s, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`scala.collection.Seq<Dependency<?>>`	`getDependencies()` Implemented by subclasses to return how this RDD depends on parent RDDs.
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.Some<Partitioner>`	`partitioner()` Optionally overridden by subclasses to specify how they are partitioned.
`scala.collection.Seq<RDD<? extends scala.Product2<K,?>>>`	`rdds()`
`CoGroupedRDD<K>`	`setSerializer(Serializer serializer)` Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class org.apache.spark.rdd.RDD

aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Constructor Detail

CoGroupedRDD

public CoGroupedRDD(scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds,
                    Partitioner part)

Method Detail

rdds

public scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds()

setSerializer

public CoGroupedRDD<K> setSerializer(Serializer serializer)

Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)