any(expr) |
Returns true if at least one value of `expr` is true. |
approx_count_distinct(expr[, relativeSD]) |
Returns the estimated cardinality by HyperLogLog++.
`relativeSD` defines the maximum relative standard deviation allowed. |
approx_percentile(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric or
ansi interval column `col` which is the smallest value in the ordered `col` values (sorted
from least to greatest) such that no more than `percentage` of `col` values is less than
the value or equal to that value. The value of percentage must be between 0.0 and 1.0.
The `accuracy` parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
array_agg(expr) |
Collects and returns a list of non-unique elements. |
avg(expr) |
Returns the mean calculated from values of a group. |
bit_and(expr) |
Returns the bitwise AND of all non-null input values, or null if none. |
bit_or(expr) |
Returns the bitwise OR of all non-null input values, or null if none. |
bit_xor(expr) |
Returns the bitwise XOR of all non-null input values, or null if none. |
bool_and(expr) |
Returns true if all values of `expr` are true. |
bool_or(expr) |
Returns true if at least one value of `expr` is true. |
collect_list(expr) |
Collects and returns a list of non-unique elements. |
collect_set(expr) |
Collects and returns a set of unique elements. |
corr(expr1, expr2) |
Returns Pearson coefficient of correlation between a set of number pairs. |
count(*) |
Returns the total number of retrieved rows, including rows containing null. |
count(expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are all non-null. |
count(DISTINCT expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are unique and non-null. |
count_if(expr) |
Returns the number of `TRUE` values for the expression. |
count_min_sketch(col, eps, confidence, seed) |
Returns a count-min sketch of a column with the given esp,
confidence and seed. The result is an array of bytes, which can be deserialized to a
`CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for
cardinality estimation using sub-linear space. |
covar_pop(expr1, expr2) |
Returns the population covariance of a set of number pairs. |
covar_samp(expr1, expr2) |
Returns the sample covariance of a set of number pairs. |
every(expr) |
Returns true if all values of `expr` are true. |
first(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
first_value(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
grouping(col) |
indicates whether a specified column in a GROUP BY is aggregated or
not, returns 1 for aggregated or 0 for not aggregated in the result set.", |
grouping_id([col1[, col2 ..]]) |
returns the level of grouping, equals to
`(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)` |
histogram_numeric(expr, nb) |
Computes a histogram on numeric 'expr' using nb bins.
The return value is an array of (x,y) pairs representing the centers of the
histogram's bins. As the value of 'nb' is increased, the histogram approximation
gets finer-grained, but may yield artifacts around outliers. In practice, 20-40
histogram bins appear to work well, with more bins being required for skewed or
smaller datasets. Note that this function creates a histogram with non-uniform
bin widths. It offers no guarantees in terms of the mean-squared-error of the
histogram, but in practice is comparable to the histograms produced by the R/S-Plus
statistical computing packages. Note: the output type of the 'x' field in the return value is
propagated from the input value consumed in the aggregate function. |
kurtosis(expr) |
Returns the kurtosis value calculated from values of a group. |
last(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
last_value(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
max(expr) |
Returns the maximum value of `expr`. |
max_by(x, y) |
Returns the value of `x` associated with the maximum value of `y`. |
mean(expr) |
Returns the mean calculated from values of a group. |
min(expr) |
Returns the minimum value of `expr`. |
min_by(x, y) |
Returns the value of `x` associated with the minimum value of `y`. |
percentile(col, percentage [, frequency]) |
Returns the exact percentile value of numeric
column `col` at the given percentage. The value of percentage must be
between 0.0 and 1.0. The value of frequency should be positive integral |
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) |
Returns the exact
percentile value array of numeric column `col` at the given percentage(s). Each value
of the percentage array must be between 0.0 and 1.0. The value of frequency should be
positive integral |
percentile_approx(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric or
ansi interval column `col` which is the smallest value in the ordered `col` values (sorted
from least to greatest) such that no more than `percentage` of `col` values is less than
the value or equal to that value. The value of percentage must be between 0.0 and 1.0.
The `accuracy` parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
regr_avgx(y, x) |
Returns the average of the independent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_avgy(y, x) |
Returns the average of the dependent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_count(y, x) |
Returns the number of non-null number pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_r2(y, x) |
Returns the coefficient of determination for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
skewness(expr) |
Returns the skewness value calculated from values of a group. |
some(expr) |
Returns true if at least one value of `expr` is true. |
std(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev_pop(expr) |
Returns the population standard deviation calculated from values of a group. |
stddev_samp(expr) |
Returns the sample standard deviation calculated from values of a group. |
sum(expr) |
Returns the sum calculated from values of a group. |
try_avg(expr) |
Returns the mean calculated from values of a group and the result is null on overflow. |
try_sum(expr) |
Returns the sum calculated from values of a group and the result is null on overflow. |
var_pop(expr) |
Returns the population variance calculated from values of a group. |
var_samp(expr) |
Returns the sample variance calculated from values of a group. |
variance(expr) |
Returns the sample variance calculated from values of a group. |