any(expr) |
Returns true if at least one value of `expr` is true. |
any_value(expr[, isIgnoreNull]) |
Returns some value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
approx_count_distinct(expr[, relativeSD]) |
Returns the estimated cardinality by HyperLogLog++.
`relativeSD` defines the maximum relative standard deviation allowed. |
approx_percentile(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric or
ansi interval column `col` which is the smallest value in the ordered `col` values (sorted
from least to greatest) such that no more than `percentage` of `col` values is less than
the value or equal to that value. The value of percentage must be between 0.0 and 1.0.
The `accuracy` parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
array_agg(expr) |
Collects and returns a list of non-unique elements. |
avg(expr) |
Returns the mean calculated from values of a group. |
bit_and(expr) |
Returns the bitwise AND of all non-null input values, or null if none. |
bit_or(expr) |
Returns the bitwise OR of all non-null input values, or null if none. |
bit_xor(expr) |
Returns the bitwise XOR of all non-null input values, or null if none. |
bitmap_construct_agg(child) |
Returns a bitmap with the positions of the bits set from all the values from
the child expression. The child expression will most likely be bitmap_bit_position(). |
bitmap_or_agg(child) |
Returns a bitmap that is the bitwise OR of all of the bitmaps from the child
expression. The input should be bitmaps created from bitmap_construct_agg(). |
bool_and(expr) |
Returns true if all values of `expr` are true. |
bool_or(expr) |
Returns true if at least one value of `expr` is true. |
collect_list(expr) |
Collects and returns a list of non-unique elements. |
collect_set(expr) |
Collects and returns a set of unique elements. |
corr(expr1, expr2) |
Returns Pearson coefficient of correlation between a set of number pairs. |
count(*) |
Returns the total number of retrieved rows, including rows containing null. |
count(expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are all non-null. |
count(DISTINCT expr[, expr...]) |
Returns the number of rows for which the supplied expression(s) are unique and non-null. |
count_if(expr) |
Returns the number of `TRUE` values for the expression. |
count_min_sketch(col, eps, confidence, seed) |
Returns a count-min sketch of a column with the given esp,
confidence and seed. The result is an array of bytes, which can be deserialized to a
`CountMinSketch` before usage. Count-min sketch is a probabilistic data structure used for
cardinality estimation using sub-linear space. |
covar_pop(expr1, expr2) |
Returns the population covariance of a set of number pairs. |
covar_samp(expr1, expr2) |
Returns the sample covariance of a set of number pairs. |
every(expr) |
Returns true if all values of `expr` are true. |
first(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
first_value(expr[, isIgnoreNull]) |
Returns the first value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values. |
grouping(col) |
indicates whether a specified column in a GROUP BY is aggregated or
not, returns 1 for aggregated or 0 for not aggregated in the result set.", |
grouping_id([col1[, col2 ..]]) |
returns the level of grouping, equals to
`(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)` |
histogram_numeric(expr, nb) |
Computes a histogram on numeric 'expr' using nb bins.
The return value is an array of (x,y) pairs representing the centers of the
histogram's bins. As the value of 'nb' is increased, the histogram approximation
gets finer-grained, but may yield artifacts around outliers. In practice, 20-40
histogram bins appear to work well, with more bins being required for skewed or
smaller datasets. Note that this function creates a histogram with non-uniform
bin widths. It offers no guarantees in terms of the mean-squared-error of the
histogram, but in practice is comparable to the histograms produced by the R/S-Plus
statistical computing packages. Note: the output type of the 'x' field in the return value is
propagated from the input value consumed in the aggregate function. |
hll_sketch_agg(expr, lgConfigK) |
Returns the HllSketch's updatable binary representation.
`lgConfigK` (optional) the log-base-2 of K, with K is the number of buckets or
slots for the HllSketch. |
hll_union_agg(expr, allowDifferentLgConfigK) |
Returns the estimated number of unique values.
`allowDifferentLgConfigK` (optional) Allow sketches with different lgConfigK values
to be unioned (defaults to false). |
kurtosis(expr) |
Returns the kurtosis value calculated from values of a group. |
last(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
last_value(expr[, isIgnoreNull]) |
Returns the last value of `expr` for a group of rows.
If `isIgnoreNull` is true, returns only non-null values |
max(expr) |
Returns the maximum value of `expr`. |
max_by(x, y) |
Returns the value of `x` associated with the maximum value of `y`. |
mean(expr) |
Returns the mean calculated from values of a group. |
median(col) |
Returns the median of numeric or ANSI interval column `col`. |
min(expr) |
Returns the minimum value of `expr`. |
min_by(x, y) |
Returns the value of `x` associated with the minimum value of `y`. |
mode(col[, deterministic]) |
Returns the most frequent value for the values within `col`. NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL.
When multiple values have the same greatest frequency then either any of values is returned if `deterministic` is false or is not defined, or the lowest value is returned if `deterministic` is true. |
mode() WITHIN GROUP (ORDER BY col) |
Returns the most frequent value for the values within `col` (specified in ORDER BY clause). NULL values are ignored.
If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned.
The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency. |
percentile(col, percentage [, frequency]) |
Returns the exact percentile value of numeric
or ANSI interval column `col` at the given percentage. The value of percentage must be
between 0.0 and 1.0. The value of frequency should be positive integral |
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) |
Returns the exact
percentile value array of numeric column `col` at the given percentage(s). Each value
of the percentage array must be between 0.0 and 1.0. The value of frequency should be
positive integral |
percentile_approx(col, percentage [, accuracy]) |
Returns the approximate `percentile` of the numeric or
ansi interval column `col` which is the smallest value in the ordered `col` values (sorted
from least to greatest) such that no more than `percentage` of `col` values is less than
the value or equal to that value. The value of percentage must be between 0.0 and 1.0.
The `accuracy` parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is the relative error of the approximation.
When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` at the given
percentage array. |
percentile_cont(percentage) WITHIN GROUP (ORDER BY col) |
Return a percentile value based on a continuous distribution of numeric or ANSI interval column `col` at the given `percentage` (specified in ORDER BY clause). |
percentile_disc(percentage) WITHIN GROUP (ORDER BY col) |
Return a percentile value based on a discrete distribution of numeric or ANSI interval column `col` at the given `percentage` (specified in ORDER BY clause). |
regr_avgx(y, x) |
Returns the average of the independent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_avgy(y, x) |
Returns the average of the dependent variable for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_count(y, x) |
Returns the number of non-null number pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_intercept(y, x) |
Returns the intercept of the univariate linear regression line for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_r2(y, x) |
Returns the coefficient of determination for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_slope(y, x) |
Returns the slope of the linear regression line for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_sxx(y, x) |
Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_sxy(y, x) |
Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
regr_syy(y, x) |
Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where `y` is the dependent variable and `x` is the independent variable. |
skewness(expr) |
Returns the skewness value calculated from values of a group. |
some(expr) |
Returns true if at least one value of `expr` is true. |
std(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev(expr) |
Returns the sample standard deviation calculated from values of a group. |
stddev_pop(expr) |
Returns the population standard deviation calculated from values of a group. |
stddev_samp(expr) |
Returns the sample standard deviation calculated from values of a group. |
sum(expr) |
Returns the sum calculated from values of a group. |
try_avg(expr) |
Returns the mean calculated from values of a group and the result is null on overflow. |
try_sum(expr) |
Returns the sum calculated from values of a group and the result is null on overflow. |
var_pop(expr) |
Returns the population variance calculated from values of a group. |
var_samp(expr) |
Returns the sample variance calculated from values of a group. |
variance(expr) |
Returns the sample variance calculated from values of a group. |