NewHadoopRDD (Spark 1.2.2 JavaDoc)

Object
- org.apache.spark.rdd.RDD<scala.Tuple2<K,V>>
- - org.apache.spark.rdd.NewHadoopRDD<K,V>

All Implemented Interfaces:

java.io.Serializable, Logging, SparkHadoopMapReduceUtil

Direct Known Subclasses:

BinaryFileRDD, WholeTextFileRDD
```
public class NewHadoopRDD<K,V>
extends RDD<scala.Tuple2<K,V>>
implements SparkHadoopMapReduceUtil, Logging
```
:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce).
Note: Instantiating this class directly is not recommended, please use org.apache.spark.SparkContext.newAPIHadoopRDD()

See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`NewHadoopRDD.NewHadoopMapPartitionsWithSplitRDD<U,T>` Analogous to `MapPartitionsRDD`, but passes in an InputSplit to the given function rather than the index of the partition.
`static class`	`NewHadoopRDD.NewHadoopMapPartitionsWithSplitRDD$`

Constructor Summary

Constructors
Constructor and Description
`NewHadoopRDD(SparkContext sc, Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, org.apache.hadoop.conf.Configuration conf)`

Method Summary

Methods
Modifier and Type	Method and Description
`InterruptibleIterator<scala.Tuple2<K,V>>`	`compute(Partition theSplit, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`org.apache.hadoop.conf.Configuration`	`getConf()`
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.collection.Seq<String>`	`getPreferredLocations(Partition hsplit)` Optionally overridden by subclasses to specify placement preferences.
`<U> RDD<U>`	`mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$1)` Maps over a partition, providing the InputSplit that was used as the base of the partition.

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doCheckpoint, elementClassTag, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getCreationSite, getNarrowAncestors, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, markCheckpointed, max, min, name, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, reduce, repartition, retag, retag, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.mapreduce.SparkHadoopMapReduceUtil
firstAvailableClass, newJobContext, newTaskAttemptContext, newTaskAttemptID

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

NewHadoopRDD

public NewHadoopRDD(SparkContext sc,
            Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass,
            Class<K> keyClass,
            Class<V> valueClass,
            org.apache.hadoop.conf.Configuration conf)

Method Detail

getPartitions
```
public Partition[] getPartitions()
```
Description copied from class: RDD

Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.

compute

public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit,
                                               TaskContext context)

Description copied from class: RDD

:: DeveloperApi :: Implemented by subclasses to compute a given partition.

Specified by:: compute in class RDD<scala.Tuple2<K,V>>

mapPartitionsWithInputSplit

public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
                                     boolean preservesPartitioning,
                                     scala.reflect.ClassTag<U> evidence$1)

Maps over a partition, providing the InputSplit that was used as the base of the partition.

getPreferredLocations
```
public scala.collection.Seq<String> getPreferredLocations(Partition hsplit)
```
Description copied from class: RDD

Optionally overridden by subclasses to specify placement preferences.

getConf

public org.apache.hadoop.conf.Configuration getConf()

Class NewHadoopRDD<K,V>

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Methods inherited from interface org.apache.spark.mapreduce.SparkHadoopMapReduceUtil

Methods inherited from interface org.apache.spark.Logging