public class Column
extends Object
implements org.apache.spark.internal.Logging
DataFrame
.
A new column can be constructed based on the input columns present in a DataFrame:
df("columnName") // On a specific `df` DataFrame.
col("columnName") // A generic column not yet associated with a DataFrame.
col("columnName.field") // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName" // Scala short hand for a named column.
Column
objects can be composed to form complex expressions:
$"a" + 1
$"a" === $"b"
expr
, but this method is for
debugging purposes only and can change in any future Spark releases.
Constructor and Description |
---|
Column(org.apache.spark.sql.catalyst.expressions.Expression expr) |
Column(String name) |
Modifier and Type | Method and Description |
---|---|
Column |
alias(String alias)
Gives the column an alias.
|
Column |
and(Column other)
Boolean AND.
|
Column |
apply(Object extraction)
Extracts a value or values from a complex type.
|
<U> TypedColumn<Object,U> |
as(Encoder<U> evidence$1)
Provides a type hint about the expected return value of this column.
|
Column |
as(scala.collection.Seq<String> aliases)
(Scala-specific) Assigns the given aliases to the results of a table generating function.
|
Column |
as(String alias)
Gives the column an alias.
|
Column |
as(String[] aliases)
Assigns the given aliases to the results of a table generating function.
|
Column |
as(String alias,
Metadata metadata)
Gives the column an alias with metadata.
|
Column |
as(scala.Symbol alias)
Gives the column an alias.
|
Column |
asc_nulls_first()
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
|
Column |
asc_nulls_last()
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
|
Column |
asc()
Returns a sort expression based on ascending order of the column.
|
Column |
between(Object lowerBound,
Object upperBound)
True if the current column is between the lower bound and upper bound, inclusive.
|
Column |
bitwiseAND(Object other)
Compute bitwise AND of this expression with another expression.
|
Column |
bitwiseOR(Object other)
Compute bitwise OR of this expression with another expression.
|
Column |
bitwiseXOR(Object other)
Compute bitwise XOR of this expression with another expression.
|
Column |
cast(DataType to)
Casts the column to a different data type.
|
Column |
cast(String to)
Casts the column to a different data type, using the canonical string representation
of the type.
|
Column |
contains(Object other)
Contains the other element.
|
Column |
desc_nulls_first()
Returns a sort expression based on the descending order of the column,
and null values appear before non-null values.
|
Column |
desc_nulls_last()
Returns a sort expression based on the descending order of the column,
and null values appear after non-null values.
|
Column |
desc()
Returns a sort expression based on the descending order of the column.
|
Column |
divide(Object other)
Division this expression by another expression.
|
Column |
dropFields(scala.collection.Seq<String> fieldNames)
An expression that drops fields in
StructType by name. |
Column |
endsWith(Column other)
String ends with.
|
Column |
endsWith(String literal)
String ends with another string literal.
|
Column |
eqNullSafe(Object other)
Equality test that is safe for null values.
|
boolean |
equals(Object that) |
Column |
equalTo(Object other)
Equality test.
|
void |
explain(boolean extended)
Prints the expression to the console for debugging purposes.
|
org.apache.spark.sql.catalyst.expressions.Expression |
expr() |
Column |
geq(Object other)
Greater than or equal to an expression.
|
Column |
getField(String fieldName)
An expression that gets a field by name in a
StructType . |
Column |
getItem(Object key)
An expression that gets an item at position
ordinal out of an array,
or gets a value by key key in a MapType . |
Column |
gt(Object other)
Greater than.
|
int |
hashCode() |
Column |
isin(Object... list)
A boolean expression that is evaluated to true if the value of this expression is contained
by the evaluated values of the arguments.
|
Column |
isin(scala.collection.Seq<Object> list)
A boolean expression that is evaluated to true if the value of this expression is contained
by the evaluated values of the arguments.
|
Column |
isInCollection(scala.collection.Iterable<?> values)
A boolean expression that is evaluated to true if the value of this expression is contained
by the provided collection.
|
Column |
isInCollection(Iterable<?> values)
A boolean expression that is evaluated to true if the value of this expression is contained
by the provided collection.
|
Column |
isNaN()
True if the current expression is NaN.
|
Column |
isNotNull()
True if the current expression is NOT null.
|
Column |
isNull()
True if the current expression is null.
|
Column |
leq(Object other)
Less than or equal to.
|
Column |
like(String literal)
SQL like expression.
|
Column |
lt(Object other)
Less than.
|
Column |
minus(Object other)
Subtraction.
|
Column |
mod(Object other)
Modulo (a.k.a.
|
Column |
multiply(Object other)
Multiplication of this expression and another expression.
|
Column |
name(String alias)
Gives the column a name (alias).
|
Column |
notEqual(Object other)
Inequality test.
|
Column |
or(Column other)
Boolean OR.
|
Column |
otherwise(Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
Column |
over()
Defines an empty analytic clause.
|
Column |
over(WindowSpec window)
Defines a windowing column.
|
Column |
plus(Object other)
Sum of this expression and another expression.
|
Column |
rlike(String literal)
SQL RLIKE expression (LIKE with Regex).
|
Column |
startsWith(Column other)
String starts with.
|
Column |
startsWith(String literal)
String starts with another string literal.
|
Column |
substr(Column startPos,
Column len)
An expression that returns a substring.
|
Column |
substr(int startPos,
int len)
An expression that returns a substring.
|
String |
toString() |
static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> |
unapply(Column col) |
Column |
when(Column condition,
Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
Column |
withField(String fieldName,
Column col)
An expression that adds/replaces field in
StructType by name. |
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public Column(org.apache.spark.sql.catalyst.expressions.Expression expr)
public Column(String name)
public static scala.Option<org.apache.spark.sql.catalyst.expressions.Expression> unapply(Column col)
public Column isin(Object... list)
Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"
list
- (undocumented)public org.apache.spark.sql.catalyst.expressions.Expression expr()
public String toString()
toString
in class Object
public boolean equals(Object that)
equals
in class Object
public int hashCode()
hashCode
in class Object
public <U> TypedColumn<Object,U> as(Encoder<U> evidence$1)
select
on a Dataset
to automatically convert the
results into the correct JVM types.evidence$1
- (undocumented)public Column apply(Object extraction)
extraction
- (undocumented)public Column equalTo(Object other)
// Scala:
df.filter( df("colA") === df("colB") )
// Java
import static org.apache.spark.sql.functions.*;
df.filter( col("colA").equalTo(col("colB")) );
other
- (undocumented)public Column notEqual(Object other)
// Scala:
df.select( df("colA") !== df("colB") )
df.select( !(df("colA") === df("colB")) )
// Java:
import static org.apache.spark.sql.functions.*;
df.filter( col("colA").notEqual(col("colB")) );
other
- (undocumented)public Column gt(Object other)
// Scala: The following selects people older than 21.
people.select( people("age") > lit(21) )
// Java:
import static org.apache.spark.sql.functions.*;
people.select( people.col("age").gt(21) );
other
- (undocumented)public Column lt(Object other)
// Scala: The following selects people younger than 21.
people.select( people("age") < 21 )
// Java:
people.select( people.col("age").lt(21) );
other
- (undocumented)public Column leq(Object other)
// Scala: The following selects people age 21 or younger than 21.
people.select( people("age") <= 21 )
// Java:
people.select( people.col("age").leq(21) );
other
- (undocumented)public Column geq(Object other)
// Scala: The following selects people age 21 or older than 21.
people.select( people("age") >= 21 )
// Java:
people.select( people.col("age").geq(21) )
other
- (undocumented)public Column eqNullSafe(Object other)
other
- (undocumented)public Column when(Column condition, Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
condition
- (undocumented)value
- (undocumented)public Column otherwise(Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
value
- (undocumented)public Column between(Object lowerBound, Object upperBound)
lowerBound
- (undocumented)upperBound
- (undocumented)public Column isNaN()
public Column isNull()
public Column isNotNull()
public Column or(Column other)
// Scala: The following selects people that are in school or employed.
people.filter( people("inSchool") || people("isEmployed") )
// Java:
people.filter( people.col("inSchool").or(people.col("isEmployed")) );
other
- (undocumented)public Column and(Column other)
// Scala: The following selects people that are in school and employed at the same time.
people.select( people("inSchool") && people("isEmployed") )
// Java:
people.select( people.col("inSchool").and(people.col("isEmployed")) );
other
- (undocumented)public Column plus(Object other)
// Scala: The following selects the sum of a person's height and weight.
people.select( people("height") + people("weight") )
// Java:
people.select( people.col("height").plus(people.col("weight")) );
other
- (undocumented)public Column minus(Object other)
// Scala: The following selects the difference between people's height and their weight.
people.select( people("height") - people("weight") )
// Java:
people.select( people.col("height").minus(people.col("weight")) );
other
- (undocumented)public Column multiply(Object other)
// Scala: The following multiplies a person's height by their weight.
people.select( people("height") * people("weight") )
// Java:
people.select( people.col("height").multiply(people.col("weight")) );
other
- (undocumented)public Column divide(Object other)
// Scala: The following divides a person's height by their weight.
people.select( people("height") / people("weight") )
// Java:
people.select( people.col("height").divide(people.col("weight")) );
other
- (undocumented)public Column mod(Object other)
other
- (undocumented)public Column isin(scala.collection.Seq<Object> list)
Note: Since the type of the elements in the list are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"
list
- (undocumented)public Column isInCollection(scala.collection.Iterable<?> values)
Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"
values
- (undocumented)public Column isInCollection(Iterable<?> values)
Note: Since the type of the elements in the collection are inferred only during the run time, the elements will be "up-casted" to the most common type for comparison. For eg: 1) In the case of "Int vs String", the "Int" will be up-casted to "String" and the comparison will look like "String vs String". 2) In the case of "Float vs Double", the "Float" will be up-casted to "Double" and the comparison will look like "Double vs Double"
values
- (undocumented)public Column like(String literal)
literal
- (undocumented)public Column rlike(String literal)
literal
- (undocumented)public Column getItem(Object key)
ordinal
out of an array,
or gets a value by key key
in a MapType
.
key
- (undocumented)public Column withField(String fieldName, Column col)
StructType
by name.
val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
df.select($"struct_col".withField("c", lit(3)))
// result: {"a":1,"b":2,"c":3}
val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
df.select($"struct_col".withField("b", lit(3)))
// result: {"a":1,"b":3}
val df = sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
df.select($"struct_col".withField("c", lit(3)))
// result: null of type struct<a:int,b:int,c:int>
val df = sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
df.select($"struct_col".withField("b", lit(100)))
// result: {"a":1,"b":100,"b":100}
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".withField("a.c", lit(3)))
// result: {"a":{"a":1,"b":2,"c":3}}
val df = sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col")
df.select($"struct_col".withField("a.c", lit(3)))
// result: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields
This method supports adding/replacing nested fields directly e.g.
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".withField("a.c", lit(3)).withField("a.d", lit(4)))
// result: {"a":{"a":1,"b":2,"c":3,"d":4}}
However, if you are going to add/replace multiple nested fields, it is more optimal to extract out the nested struct before adding/replacing multiple fields e.g.
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".withField("a", $"struct_col.a".withField("c", lit(3)).withField("d", lit(4))))
// result: {"a":{"a":1,"b":2,"c":3,"d":4}}
fieldName
- (undocumented)col
- (undocumented)public Column dropFields(scala.collection.Seq<String> fieldNames)
StructType
by name.
This is a no-op if schema doesn't contain field name(s).
val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
df.select($"struct_col".dropFields("b"))
// result: {"a":1}
val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
df.select($"struct_col".dropFields("c"))
// result: {"a":1,"b":2}
val df = sql("SELECT named_struct('a', 1, 'b', 2, 'c', 3) struct_col")
df.select($"struct_col".dropFields("b", "c"))
// result: {"a":1}
val df = sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
df.select($"struct_col".dropFields("a", "b"))
// result: org.apache.spark.sql.AnalysisException: cannot resolve 'update_fields(update_fields(`struct_col`))' due to data type mismatch: cannot drop all fields in struct
val df = sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
df.select($"struct_col".dropFields("b"))
// result: null of type struct<a:int>
val df = sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
df.select($"struct_col".dropFields("b"))
// result: {"a":1}
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".dropFields("a.b"))
// result: {"a":{"a":1}}
val df = sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col")
df.select($"struct_col".dropFields("a.c"))
// result: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields
This method supports dropping multiple nested fields directly e.g.
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".dropFields("a.b", "a.c"))
// result: {"a":{"a":1}}
However, if you are going to drop multiple nested fields, it is more optimal to extract out the nested struct before dropping multiple fields from it e.g.
val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
df.select($"struct_col".withField("a", $"struct_col.a".dropFields("b", "c")))
// result: {"a":{"a":1}}
fieldNames
- (undocumented)public Column getField(String fieldName)
StructType
.
fieldName
- (undocumented)public Column substr(Column startPos, Column len)
startPos
- expression for the starting position.len
- expression for the length of the substring.
public Column substr(int startPos, int len)
startPos
- starting position.len
- length of the substring.
public Column contains(Object other)
other
- (undocumented)public Column startsWith(Column other)
other
- (undocumented)public Column startsWith(String literal)
literal
- (undocumented)public Column endsWith(Column other)
other
- (undocumented)public Column endsWith(String literal)
literal
- (undocumented)public Column alias(String alias)
as
.
// Renames colA to colB in select output.
df.select($"colA".alias("colB"))
alias
- (undocumented)public Column as(String alias)
// Renames colA to colB in select output.
df.select($"colA".as("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use the API as(alias: String, metadata: Metadata)
with explicit metadata.
alias
- (undocumented)public Column as(scala.collection.Seq<String> aliases)
// Renames colA to colB in select output.
df.select(explode($"myMap").as("key" :: "value" :: Nil))
aliases
- (undocumented)public Column as(String[] aliases)
// Renames colA to colB in select output.
df.select(explode($"myMap").as("key" :: "value" :: Nil))
aliases
- (undocumented)public Column as(scala.Symbol alias)
// Renames colA to colB in select output.
df.select($"colA".as("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use the API as(alias: String, metadata: Metadata)
with explicit metadata.
alias
- (undocumented)public Column as(String alias, Metadata metadata)
val metadata: Metadata = ...
df.select($"colA".as("colB", metadata))
alias
- (undocumented)metadata
- (undocumented)public Column name(String alias)
// Renames colA to colB in select output.
df.select($"colA".name("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use the API as(alias: String, metadata: Metadata)
with explicit metadata.
alias
- (undocumented)public Column cast(DataType to)
// Casts colA to IntegerType.
import org.apache.spark.sql.types.IntegerType
df.select(df("colA").cast(IntegerType))
// equivalent to
df.select(df("colA").cast("int"))
to
- (undocumented)public Column cast(String to)
string
, boolean
, byte
, short
, int
, long
,
float
, double
, decimal
, date
, timestamp
.
// Casts colA to integer.
df.select(df("colA").cast("int"))
to
- (undocumented)public Column desc()
// Scala
df.sort(df("age").desc)
// Java
df.sort(df.col("age").desc());
public Column desc_nulls_first()
// Scala: sort a DataFrame by age column in descending order and null values appearing first.
df.sort(df("age").desc_nulls_first)
// Java
df.sort(df.col("age").desc_nulls_first());
public Column desc_nulls_last()
// Scala: sort a DataFrame by age column in descending order and null values appearing last.
df.sort(df("age").desc_nulls_last)
// Java
df.sort(df.col("age").desc_nulls_last());
public Column asc()
// Scala: sort a DataFrame by age column in ascending order.
df.sort(df("age").asc)
// Java
df.sort(df.col("age").asc());
public Column asc_nulls_first()
// Scala: sort a DataFrame by age column in ascending order and null values appearing first.
df.sort(df("age").asc_nulls_first)
// Java
df.sort(df.col("age").asc_nulls_first());
public Column asc_nulls_last()
// Scala: sort a DataFrame by age column in ascending order and null values appearing last.
df.sort(df("age").asc_nulls_last)
// Java
df.sort(df.col("age").asc_nulls_last());
public void explain(boolean extended)
extended
- (undocumented)public Column bitwiseOR(Object other)
df.select($"colA".bitwiseOR($"colB"))
other
- (undocumented)public Column bitwiseAND(Object other)
df.select($"colA".bitwiseAND($"colB"))
other
- (undocumented)public Column bitwiseXOR(Object other)
df.select($"colA".bitwiseXOR($"colB"))
other
- (undocumented)public Column over(WindowSpec window)
val w = Window.partitionBy("name").orderBy("id")
df.select(
sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)),
avg("price").over(w.rowsBetween(Window.currentRow, 4))
)
window
- (undocumented)public Column over()
df.select(
sum("price").over(),
avg("price").over()
)