NewHadoopRDD

java.lang.Object
- org.apache.spark.rdd.RDD<scala.Tuple2<K,V>>
- - org.apache.spark.rdd.NewHadoopRDD<K,V>

All Implemented Interfaces:

java.io.Serializable, Logging
```
public class NewHadoopRDD<K,V>
extends RDD<scala.Tuple2<K,V>>
implements Logging
```
:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce).
Note: Instantiating this class directly is not recommended, please use org.apache.spark.SparkContext.newAPIHadoopRDD()
param: sc The SparkContext to associate the RDD with. param: inputFormatClass Storage format of the data to be read. param: keyClass Class of the key associated with the inputFormatClass. param: valueClass Class of the value associated with the inputFormatClass.

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description
`NewHadoopRDD(SparkContext sc, java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, java.lang.Class<K> keyClass, java.lang.Class<V> valueClass, org.apache.hadoop.conf.Configuration _conf)`

Method Summary

Methods
Modifier and Type	Method and Description
`InterruptibleIterator<scala.Tuple2<K,V>>`	`compute(Partition theSplit, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`static java.lang.Object`	`CONFIGURATION_INSTANTIATION_LOCK()` Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
`org.apache.hadoop.conf.Configuration`	`getConf()`
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`scala.collection.Seq<java.lang.String>`	`getPreferredLocations(Partition hsplit)` Optionally overridden by subclasses to specify placement preferences.
`protected org.apache.hadoop.mapreduce.JobID`	`jobId()`
`<U> RDD<U>`	`mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$1)` Maps over a partition, providing the InputSplit that was used as the base of the partition.
`NewHadoopRDD<K,V>`	`persist(StorageLevel storageLevel)` Set this RDD's storage level to persist its values across operations after the first time it is computed.

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, clearDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, firstParent, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getDependencies, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, parent, partitioner, partitions, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - NewHadoopRDD
```
public NewHadoopRDD(SparkContext sc,
            java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass,
            java.lang.Class<K> keyClass,
            java.lang.Class<V> valueClass,
            org.apache.hadoop.conf.Configuration _conf)
```
- Method Detail
  - CONFIGURATION_INSTANTIATION_LOCK
```
public static java.lang.Object CONFIGURATION_INSTANTIATION_LOCK()
```
    Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456). Therefore, we synchronize on this lock before calling new Configuration().
    
    Returns:
    (undocumented)
  - jobId
```
protected org.apache.hadoop.mapreduce.JobID jobId()
```
  - getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```
  - getPartitions
```
public Partition[] getPartitions()
```
    Description copied from class: RDD
    
    Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.
    
    Specified by:
    
    getPartitions in class RDD<scala.Tuple2<K,V>>
    
    Returns:
    (undocumented)
  - compute
```
public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit,
                                               TaskContext context)
```
    Description copied from class: RDD
    
    :: DeveloperApi :: Implemented by subclasses to compute a given partition.
    
    Specified by:
    
    compute in class RDD<scala.Tuple2<K,V>>
    
    Parameters:
    theSplit - (undocumented)
    context - (undocumented)
    
    Returns:
    (undocumented)
  - mapPartitionsWithInputSplit
```
public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
                                     boolean preservesPartitioning,
                                     scala.reflect.ClassTag<U> evidence$1)
```
    Maps over a partition, providing the InputSplit that was used as the base of the partition.
  - getPreferredLocations
```
public scala.collection.Seq<java.lang.String> getPreferredLocations(Partition hsplit)
```
    Description copied from class: RDD
    
    Optionally overridden by subclasses to specify placement preferences.
    
    Overrides:
    
    getPreferredLocations in class RDD<scala.Tuple2<K,V>>
    
    Parameters:
    hsplit - (undocumented)
    
    Returns:
    (undocumented)
  - persist
```
public NewHadoopRDD<K,V> persist(StorageLevel storageLevel)
```
    Description copied from class: RDD
    
    Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. Local checkpointing is an exception.
    
    Overrides:
    
    persist in class RDD<scala.Tuple2<K,V>>
    
    Parameters:
    storageLevel - (undocumented)
    
    Returns:
    (undocumented)

Class NewHadoopRDD<K,V>

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

NewHadoopRDD

Method Detail

CONFIGURATION_INSTANTIATION_LOCK

jobId

getConf

getPartitions

compute

mapPartitionsWithInputSplit

getPreferredLocations

persist