SchemaRDD (Spark 1.0.2 JavaDoc)

Object
- org.apache.spark.rdd.RDD<org.apache.spark.sql.catalyst.expressions.Row>
- - org.apache.spark.sql.SchemaRDD

All Implemented Interfaces:: java.io.Serializable, Logging

public class SchemaRDD
extends RDD<org.apache.spark.sql.catalyst.expressions.Row>

:: AlphaComponent :: An RDD of Row objects that has an associated schema. In addition to standard RDD functions, SchemaRDDs can be used in relational queries, as shown in the examples below.

Importing a SQLContext brings an implicit into scope that automatically converts a standard RDD whose elements are scala case classes into a SchemaRDD. This conversion can also be done explicitly using the createSchemaRDD function on a SQLContext.

A SchemaRDD can also be created by loading data in from external sources. Examples are loading data from Parquet files by using by using the parquetFile method on SQLContext, and loading JSON datasets by using jsonFile and jsonRDD methods on SQLContext.

== SQL Queries == A SchemaRDD can be registered as a table in the SQLContext that was used to create it. Once an RDD has been registered as a table, it can be used in the FROM clause of SQL statements.


  // One method for defining the schema of an RDD is to make a case class with the desired column
  // names and types.
  case class Record(key: Int, value: String)

  val sc: SparkContext // An existing spark context.
  val sqlContext = new SQLContext(sc)

  // Importing the SQL context gives access to all the SQL functions and implicit conversions.
  import sqlContext._

  val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i")))
  // Any RDD containing case classes can be registered as a table.  The schema of the table is
  // automatically inferred using scala reflection.
  rdd.registerAsTable("records")

  val results: SchemaRDD = sql("SELECT * FROM records")

== Language Integrated Queries ==



  case class Record(key: Int, value: String)

  val sc: SparkContext // An existing spark context.
  val sqlContext = new SQLContext(sc)

  // Importing the SQL context gives access to all the SQL functions and implicit conversions.
  import sqlContext._

  val rdd = sc.parallelize((1 to 100).map(i => Record(i, "val_" + i)))

  // Example of language integrated queries.
  rdd.where('key === 1).orderBy('value.asc).select('key).collect()

See Also:: Serialized Form

Constructor Summary

Constructors
Constructor and Description

SchemaRDD(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan)

Constructors
Constructor and Description
`SchemaRDD(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan)`

Method Summary

Methods
Modifier and Type	Method and Description
`SchemaRDD`	`aggregate(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> aggregateExprs)` Performs an aggregation over all Rows in this RDD.
`SchemaRDD`	`as(scala.Symbol alias)` Applies a qualifier to the attributes of this relation.
`org.apache.spark.sql.catalyst.plans.logical.LogicalPlan`	`baseLogicalPlan()`
`SchemaRDD`	`baseSchemaRDD()`
`SchemaRDD`	`coalesce(int numPartitions, boolean shuffle, scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)` Return a new RDD that is reduced into `numPartitions` partitions.
`org.apache.spark.sql.catalyst.expressions.Row[]`	`collect()` Return an array that contains all of the elements in this RDD.
`scala.collection.Iterator<org.apache.spark.sql.catalyst.expressions.Row>`	`compute(Partition split, TaskContext context)` :: DeveloperApi :: Implemented by subclasses to compute a given partition.
`long`	`count()` :: Experimental :: Return the number of elements in the RDD.
`SchemaRDD`	`distinct()` Return a new RDD containing the distinct elements in this RDD.
`SchemaRDD`	`distinct(int numPartitions, scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)` Return a new RDD containing the distinct elements in this RDD.
`SchemaRDD`	`filter(scala.Function1<org.apache.spark.sql.catalyst.expressions.Row,Object> f)` Return a new RDD containing only the elements that satisfy a predicate.
`SchemaRDD`	`generate(org.apache.spark.sql.catalyst.expressions.Generator generator, boolean join, boolean outer, scala.Option<String> alias)` :: Experimental :: Applies the given Generator, or table generating function, to this relation.
`Partition[]`	`getPartitions()` Implemented by subclasses to return the set of partitions in this RDD.
`SchemaRDD`	`groupBy(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> aggregateExprs)` Performs a grouping followed by an aggregation.
`void`	`insertInto(String tableName)` :: Experimental :: Appends the rows from this RDD to the specified table.
`void`	`insertInto(String tableName, boolean overwrite)` :: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
`SchemaRDD`	`intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other)` Return the intersection of this RDD and another one.
`SchemaRDD`	`intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other, int numPartitions)` Return the intersection of this RDD and another one.
`SchemaRDD`	`intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other, Partitioner partitioner, scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)` Return the intersection of this RDD and another one.
`SchemaRDD`	`join(SchemaRDD otherPlan, org.apache.spark.sql.catalyst.plans.JoinType joinType, scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> on)` Performs a relational join on two SchemaRDDs
`SchemaRDD`	`limit(org.apache.spark.sql.catalyst.expressions.Expression limitExpr)`
`SchemaRDD`	`limit(int limitNum)` Limits the results by the given integer.
`org.apache.spark.sql.catalyst.plans.logical.LogicalPlan`	`logicalPlan()`
`SchemaRDD`	`orderBy(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.SortOrder> sortExprs)` Sorts the results by the given expressions.
`void`	`printSchema()` Prints out the schema in the tree format.
`org.apache.spark.sql.SQLContext.QueryExecution`	`queryExecution()` :: DeveloperApi :: A lazily computed query execution workflow.
`void`	`registerAsTable(String tableName)` Registers this RDD as a temporary table using the given name.
`SchemaRDD`	`repartition(int numPartitions, scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)` Return a new RDD that has exactly numPartitions partitions.
`SchemaRDD`	`sample(boolean withReplacement, double fraction, long seed)` :: Experimental :: Returns a sampled version of the underlying dataset.
`void`	`saveAsParquetFile(String path)` Saves the contents of this `SchemaRDD` as a parquet file, preserving the schema.
`void`	`saveAsTable(String tableName)` :: Experimental :: Creates a table from the the contents of this SchemaRDD.
`String`	`schemaString()` Returns the output schema in the tree format.
`SchemaRDD`	`select(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> exprs)` Changes the output of this relation to the given expressions, similar to the `SELECT` clause in SQL.
`SQLContext`	`sqlContext()`
`SchemaRDD`	`subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other)` Return an RDD with the elements from `this` that are not in `other`.
`SchemaRDD`	`subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other, int numPartitions)` Return an RDD with the elements from `this` that are not in `other`.
`SchemaRDD`	`subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other, Partitioner p, scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)` Return an RDD with the elements from `this` that are not in `other`.
`org.apache.spark.sql.catalyst.expressions.Row[]`	`take(int num)` Take the first num elements of the RDD.
`JavaSchemaRDD`	`toJavaSchemaRDD()` Returns this RDD as a JavaSchemaRDD.
`SchemaRDD`	`toSchemaRDD()` Returns this RDD as a SchemaRDD.
`String`	`toString()`
`SchemaRDD`	`unionAll(SchemaRDD otherPlan)` Combines the tuples of two RDDs with the same schema, keeping duplicates.
`SchemaRDD`	`where(org.apache.spark.sql.catalyst.expressions.Expression condition)` Filters the output, only returning those rows where `condition` evaluates to true.
`SchemaRDD`	`where(scala.Function1<org.apache.spark.sql.catalyst.expressions.DynamicRow,Object> dynamicUdf)` :: Experimental :: Filters tuples using a function over a `Dynamic` version of a given Row.
`<T1> SchemaRDD`	`where(scala.Symbol arg1, scala.Function1<T1,Object> udf)` Filters tuples using a function over the value of the specified column.

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, collect, context, countApprox, countApproxDistinct, countByValue, countByValueApprox, creationSiteInfo, dependencies, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, reduce, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sparkContext, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.Logging
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - SchemaRDD
```
public SchemaRDD(SQLContext sqlContext,
         org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan)
```
- Method Detail
  - sqlContext
```
public SQLContext sqlContext()
```
  - baseLogicalPlan
```
public org.apache.spark.sql.catalyst.plans.logical.LogicalPlan baseLogicalPlan()
```
  - baseSchemaRDD
```
public SchemaRDD baseSchemaRDD()
```
  - compute
```
public scala.collection.Iterator<org.apache.spark.sql.catalyst.expressions.Row> compute(Partition split,
                                                                               TaskContext context)
```
    Description copied from class: RDD
    
    :: DeveloperApi :: Implemented by subclasses to compute a given partition.
    
    Specified by:
    
    compute in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - getPartitions
```
public Partition[] getPartitions()
```
    Description copied from class: RDD
    
    Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.
  - select
```
public SchemaRDD select(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> exprs)
```
    Changes the output of this relation to the given expressions, similar to the SELECT clause in SQL.
```
   schemaRDD.select('a, 'b + 'c, 'd as 'aliasedName)
 
```
    Parameters:
    exprs - a set of logical expression that will be evaluated for each input row.
  - where
```
public SchemaRDD where(org.apache.spark.sql.catalyst.expressions.Expression condition)
```
    Filters the output, only returning those rows where condition evaluates to true.
```
   schemaRDD.where('a === 'b)
   schemaRDD.where('a === 1)
   schemaRDD.where('a + 'b > 10)
 
```
  - join
```
public SchemaRDD join(SchemaRDD otherPlan,
             org.apache.spark.sql.catalyst.plans.JoinType joinType,
             scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> on)
```
    Performs a relational join on two SchemaRDDs
    
    Parameters:
    otherPlan - the SchemaRDD that should be joined with this one.
    joinType - One of Inner, LeftOuter, RightOuter, or FullOuter. Defaults to Inner.
    on - An optional condition for the join operation. This is equivalent to the ON clause in standard SQL. In the case of Inner joins, specifying a condition is equivalent to adding where clauses after the join.
  - orderBy
```
public SchemaRDD orderBy(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.SortOrder> sortExprs)
```
    Sorts the results by the given expressions.
```
   schemaRDD.orderBy('a)
   schemaRDD.orderBy('a, 'b)
   schemaRDD.orderBy('a.asc, 'b.desc)
 
```
  - limit
```
public SchemaRDD limit(org.apache.spark.sql.catalyst.expressions.Expression limitExpr)
```
  - limit
```
public SchemaRDD limit(int limitNum)
```
    Limits the results by the given integer.
```
   schemaRDD.limit(10)
 
```
  - groupBy
```
public SchemaRDD groupBy(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
                scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> aggregateExprs)
```
    Performs a grouping followed by an aggregation.
```
   schemaRDD.groupBy('year)(Sum('sales) as 'totalSales)
 
```
  - aggregate
```
public SchemaRDD aggregate(scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> aggregateExprs)
```
    Performs an aggregation over all Rows in this RDD. This is equivalent to a groupBy with no grouping expressions.
```
   schemaRDD.aggregate(Sum('sales) as 'totalSales)
 
```
  - as
```
public SchemaRDD as(scala.Symbol alias)
```
    Applies a qualifier to the attributes of this relation. Can be used to disambiguate attributes with the same name, for example, when performing self-joins.
```
   val x = schemaRDD.where('a === 1).as('x)
   val y = schemaRDD.where('a === 2).as('y)
   x.join(y).where("x.a".attr === "y.a".attr),
 
```
  - unionAll
```
public SchemaRDD unionAll(SchemaRDD otherPlan)
```
    Combines the tuples of two RDDs with the same schema, keeping duplicates.
  - where
```
public <T1> SchemaRDD where(scala.Symbol arg1,
                   scala.Function1<T1,Object> udf)
```
    Filters tuples using a function over the value of the specified column.
```
   schemaRDD.sfilter('a)((a: Int) => ...)
 
```
  - where
```
public SchemaRDD where(scala.Function1<org.apache.spark.sql.catalyst.expressions.DynamicRow,Object> dynamicUdf)
```
    :: Experimental :: Filters tuples using a function over a Dynamic version of a given Row. DynamicRows use scala's Dynamic trait to emulate an ORM of in a dynamically typed language. Since the type of the column is not known at compile time, all attributes are converted to strings before being passed to the function.
```
   schemaRDD.where(r => r.firstName == "Bob" && r.lastName == "Smith")
 
```
  - sample
```
public SchemaRDD sample(boolean withReplacement,
               double fraction,
               long seed)
```
    :: Experimental :: Returns a sampled version of the underlying dataset.
    
    Overrides:
    
    sample in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - count
```
public long count()
```
    :: Experimental :: Return the number of elements in the RDD. Unlike the base RDD implementation of count, this implementation leverages the query optimizer to compute the count on the SchemaRDD, which supports features such as filter pushdown.
    
    Overrides:
    
    count in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - generate
```
public SchemaRDD generate(org.apache.spark.sql.catalyst.expressions.Generator generator,
                 boolean join,
                 boolean outer,
                 scala.Option<String> alias)
```
    :: Experimental :: Applies the given Generator, or table generating function, to this relation.
    
    Parameters:
    generator - A table generating function. The API for such functions is likely to change in future releases
    join - when set to true, each output row of the generator is joined with the input row that produced it.
    outer - when set to true, at least one row will be produced for each input row, similar to an OUTER JOIN in SQL. When no output rows are produced by the generator for a given row, a single row will be output, with NULL values for each of the generated columns.
    alias - an optional alias that can be used as qualifier for the attributes that are produced by this generate operation.
  - toSchemaRDD
```
public SchemaRDD toSchemaRDD()
```
    Returns this RDD as a SchemaRDD. Intended primarily to force the invocation of the implicit conversion from a standard RDD to a SchemaRDD.
  - toJavaSchemaRDD
```
public JavaSchemaRDD toJavaSchemaRDD()
```
    Returns this RDD as a JavaSchemaRDD.
  - collect
```
public org.apache.spark.sql.catalyst.expressions.Row[] collect()
```
    Description copied from class: RDD
    
    Return an array that contains all of the elements in this RDD.
    
    Overrides:
    
    collect in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - take
```
public org.apache.spark.sql.catalyst.expressions.Row[] take(int num)
```
    Description copied from class: RDD
    
    Take the first num elements of the RDD. It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit.
    
    Overrides:
    
    take in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - coalesce
```
public SchemaRDD coalesce(int numPartitions,
                 boolean shuffle,
                 scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)
```
    Description copied from class: RDD
    
    Return a new RDD that is reduced into numPartitions partitions.
    This results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
    However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1). To avoid this, you can pass shuffle = true. This will add a shuffle step, but means the current upstream partitions will be executed in parallel (per whatever the current partitioning is).
    Note: With shuffle = true, you can actually coalesce to a larger number of partitions. This is useful if you have a small number of partitions, say 100, potentially with a few partitions being abnormally large. Calling coalesce(1000, shuffle = true) will result in 1000 partitions with the data distributed using a hash partitioner.
    
    Overrides:
    
    coalesce in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - distinct
```
public SchemaRDD distinct()
```
    Description copied from class: RDD
    
    Return a new RDD containing the distinct elements in this RDD.
    
    Overrides:
    
    distinct in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - distinct
```
public SchemaRDD distinct(int numPartitions,
                 scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)
```
    Description copied from class: RDD
    
    Return a new RDD containing the distinct elements in this RDD.
    
    Overrides:
    
    distinct in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - filter
```
public SchemaRDD filter(scala.Function1<org.apache.spark.sql.catalyst.expressions.Row,Object> f)
```
    Description copied from class: RDD
    
    Return a new RDD containing only the elements that satisfy a predicate.
    
    Overrides:
    
    filter in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - intersection
```
public SchemaRDD intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other)
```
    Description copied from class: RDD
    
    Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
    Note that this method performs a shuffle internally.
    
    Overrides:
    
    intersection in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - intersection
```
public SchemaRDD intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other,
                     Partitioner partitioner,
                     scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)
```
    Description copied from class: RDD
    
    Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
    Note that this method performs a shuffle internally.
    
    Overrides:
    
    intersection in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
    
    partitioner - Partitioner to use for the resulting RDD
  - intersection
```
public SchemaRDD intersection(RDD<org.apache.spark.sql.catalyst.expressions.Row> other,
                     int numPartitions)
```
    Description copied from class: RDD
    
    Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. Performs a hash partition across the cluster
    Note that this method performs a shuffle internally.
    
    Overrides:
    
    intersection in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
    
    numPartitions - How many partitions to use in the resulting RDD
  - repartition
```
public SchemaRDD repartition(int numPartitions,
                    scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)
```
    Description copied from class: RDD
    
    Return a new RDD that has exactly numPartitions partitions.
    Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data.
    If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.
    
    Overrides:
    
    repartition in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - subtract
```
public SchemaRDD subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other)
```
    Description copied from class: RDD
    
    Return an RDD with the elements from this that are not in other.
    Uses this partitioner/partition size, because even if other is huge, the resulting RDD will be <= us.
    
    Overrides:
    
    subtract in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - subtract
```
public SchemaRDD subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other,
                 int numPartitions)
```
    Description copied from class: RDD
    
    Return an RDD with the elements from this that are not in other.
    
    Overrides:
    
    subtract in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - subtract
```
public SchemaRDD subtract(RDD<org.apache.spark.sql.catalyst.expressions.Row> other,
                 Partitioner p,
                 scala.math.Ordering<org.apache.spark.sql.catalyst.expressions.Row> ord)
```
    Description copied from class: RDD
    
    Return an RDD with the elements from this that are not in other.
    
    Overrides:
    
    subtract in class RDD<org.apache.spark.sql.catalyst.expressions.Row>
  - queryExecution
```
public org.apache.spark.sql.SQLContext.QueryExecution queryExecution()
```
    :: DeveloperApi :: A lazily computed query execution workflow. All other RDD operations are passed through to the RDD that is produced by this workflow. This workflow is produced lazily because invoking the whole query optimization pipeline can be expensive.
    The query execution is considered a Developer API as phases may be added or removed in future releases. This execution is only exposed to provide an interface for inspecting the various phases for debugging purposes. Applications should not depend on particular phases existing or producing any specific output, even for exactly the same query.
    Additionally, the RDD exposed by this execution is not designed for consumption by end users. In particular, it does not contain any schema information, and it reuses Row objects internally. This object reuse improves performance, but can make programming against the RDD more difficult. Instead end users should perform RDD operations on a SchemaRDD directly.
  - logicalPlan
```
public org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan()
```
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
  - saveAsParquetFile
```
public void saveAsParquetFile(String path)
```
    Saves the contents of this SchemaRDD as a parquet file, preserving the schema. Files that are written out using this method can be read back in as a SchemaRDD using the parquetFile function.
  - registerAsTable
```
public void registerAsTable(String tableName)
```
    Registers this RDD as a temporary table using the given name. The lifetime of this temporary table is tied to the SQLContext that was used to create this SchemaRDD.
  - insertInto
```
public void insertInto(String tableName,
              boolean overwrite)
```
    :: Experimental :: Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
  - insertInto
```
public void insertInto(String tableName)
```
    :: Experimental :: Appends the rows from this RDD to the specified table.
  - saveAsTable
```
public void saveAsTable(String tableName)
```
    :: Experimental :: Creates a table from the the contents of this SchemaRDD. This will fail if the table already exists.
    Note that this currently only works with SchemaRDDs that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
  - schemaString
```
public String schemaString()
```
    Returns the output schema in the tree format.
  - printSchema
```
public void printSchema()
```
    Prints out the schema in the tree format.

Class SchemaRDD

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class Object

Methods inherited from interface org.apache.spark.Logging

Constructor Detail

SchemaRDD

Method Detail

sqlContext

baseLogicalPlan

baseSchemaRDD

compute

getPartitions

select

where

join

orderBy

limit

limit

groupBy

aggregate

as

unionAll

where

where

sample

count

generate

toSchemaRDD

toJavaSchemaRDD

collect

take

coalesce

distinct

distinct

filter

intersection

intersection

intersection

repartition

subtract

subtract

subtract

queryExecution

logicalPlan

toString

saveAsParquetFile

registerAsTable

insertInto

insertInto

saveAsTable

schemaString

printSchema