public class NewHadoopRDD<K,V> extends RDD<scala.Tuple2<K,V>> implements Logging
org.apache.hadoop.mapreduce
).
Note: Instantiating this class directly is not recommended, please use
org.apache.spark.SparkContext.newAPIHadoopRDD()
param: sc The SparkContext to associate the RDD with. param: inputFormatClass Storage format of the data to be read. param: keyClass Class of the key associated with the inputFormatClass. param: valueClass Class of the value associated with the inputFormatClass.
Constructor and Description |
---|
NewHadoopRDD(SparkContext sc,
java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass,
java.lang.Class<K> keyClass,
java.lang.Class<V> valueClass,
org.apache.hadoop.conf.Configuration _conf) |
Modifier and Type | Method and Description |
---|---|
InterruptibleIterator<scala.Tuple2<K,V>> |
compute(Partition theSplit,
TaskContext context)
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
|
static java.lang.Object |
CONFIGURATION_INSTANTIATION_LOCK()
Configuration's constructor is not threadsafe (see SPARK-1097 and HADOOP-10456).
|
org.apache.hadoop.conf.Configuration |
getConf() |
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD.
|
scala.collection.Seq<java.lang.String> |
getPreferredLocations(Partition hsplit)
Optionally overridden by subclasses to specify placement preferences.
|
protected org.apache.hadoop.mapreduce.JobID |
jobId() |
<U> RDD<U> |
mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
boolean preservesPartitioning,
scala.reflect.ClassTag<U> evidence$1)
Maps over a partition, providing the InputSplit that was used as the base of the partition.
|
NewHadoopRDD<K,V> |
persist(StorageLevel storageLevel)
Set this RDD's storage level to persist its values across operations after the first time
it is computed.
|
aggregate, cache, cartesian, checkpoint, checkpointData, clearDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, firstParent, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getDependencies, getNumPartitions, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, parent, partitioner, partitions, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public NewHadoopRDD(SparkContext sc, java.lang.Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, java.lang.Class<K> keyClass, java.lang.Class<V> valueClass, org.apache.hadoop.conf.Configuration _conf)
public static java.lang.Object CONFIGURATION_INSTANTIATION_LOCK()
protected org.apache.hadoop.mapreduce.JobID jobId()
public org.apache.hadoop.conf.Configuration getConf()
public Partition[] getPartitions()
RDD
getPartitions
in class RDD<scala.Tuple2<K,V>>
public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit, TaskContext context)
RDD
public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f, boolean preservesPartitioning, scala.reflect.ClassTag<U> evidence$1)
public scala.collection.Seq<java.lang.String> getPreferredLocations(Partition hsplit)
RDD
getPreferredLocations
in class RDD<scala.Tuple2<K,V>>
hsplit
- (undocumented)public NewHadoopRDD<K,V> persist(StorageLevel storageLevel)
RDD