any(expr) |
Returns true if at least one value of `expr` is true. |
approx_count_distinct(expr[, relativeSD]) |
Returns the estimated cardinality by HyperLogLog++.
`relativeSD` defines the maximum relative standard deviation allowed. |
approx_percentile(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric
column `col` which is the smallest value in the ordered `col` values (sorted from least to
greatest) such that no more than `percentage` of `col` values is less than the value
or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy
at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is
the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
avg(expr) |
Returns the mean calculated from values of a group. |
bit_or(expr) |
Returns the bitwise OR of all non-null input values, or null if none. |
bit_xor(expr) |
Returns the bitwise XOR of all non-null input values, or null if none. |
bool_and(expr) |
Returns true if all values of `expr` are true. |
bool_or(expr) |
Returns true if at least one value of `expr` is true. |
collect_list(expr) |
Collects and returns a list of non-unique elements. |
collect_set(expr) |
Collects and returns a set of unique elements. |
corr(expr1, expr2) |
Returns Pearson coefficient of correlation between a set of number pairs. |
count(*) |
Returns the total number of retrieved rows, including rows containing null. |
count(expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are all non-null. |
count(DISTINCT expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are unique and non-null. |
count_if(expr) |
Returns the number of `TRUE` values for the expression. |
count_min_sketch(col, eps, confidence, seed) |
Returns a count-min sketch of a column with the given esp,
confidence and seed. The result is an array of bytes, which can be deserialized to a
`CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for
cardinality estimation using sub-linear space. |
covar_pop(expr1, expr2) |
Returns the population covariance of a set of number pairs. |
covar_samp(expr1, expr2) |
Returns the sample covariance of a set of number pairs. |
every(expr) |
Returns true if all values of `expr` are true. |
first(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
first_value(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
kurtosis(expr) |
Returns the kurtosis value calculated from values of a group. |
last(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
last_value(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
max(expr) |
Returns the maximum value of `expr`. |
max_by(x, y) |
Returns the value of `x` associated with the maximum value of `y`. |
mean(expr) |
Returns the mean calculated from values of a group. |
min(expr) |
Returns the minimum value of `expr`. |
min_by(x, y) |
Returns the value of `x` associated with the minimum value of `y`. |
percentile(col, percentage [, frequency]) |
Returns the exact percentile value of numeric column
`col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The
value of frequency should be positive integral |
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) |
Returns the exact
percentile value array of numeric column `col` at the given percentage(s). Each value
of the percentage array must be between 0.0 and 1.0. The value of frequency should be
positive integral |
percentile_approx(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric
column `col` which is the smallest value in the ordered `col` values (sorted from least to
greatest) such that no more than `percentage` of `col` values is less than the value
or equal to that value. The value of percentage must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy
at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is
the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
skewness(expr) |
Returns the skewness value calculated from values of a group. |
some(expr) |
Returns true if at least one value of `expr` is true. |
std(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev_pop(expr) |
Returns the population standard deviation calculated from values of a group. |
stddev_samp(expr) |
Returns the sample standard deviation calculated from values of a group. |
sum(expr) |
Returns the sum calculated from values of a group. |
var_pop(expr) |
Returns the population variance calculated from values of a group. |
var_samp(expr) |
Returns the sample variance calculated from values of a group. |
variance(expr) |
Returns the sample variance calculated from values of a group. |