pyspark.sql.
GroupedData
A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().
DataFrame
DataFrame.groupBy()
New in version 1.3.
Methods
agg(*exprs)
agg
Compute aggregates and returns the result as a DataFrame.
apply(udf)
apply
It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.
pyspark.sql.GroupedData.applyInPandas()
pyspark.sql.functions.pandas_udf()
applyInPandas(func, schema)
applyInPandas
Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.
avg(*cols)
avg
Computes average values for each numeric columns for each group.
cogroup(other)
cogroup
Cogroups this group with another group so that we can run cogrouped operations.
count()
count
Counts the number of records for each group.
max(*cols)
max
Computes the max value for each numeric columns for each group.
mean(*cols)
mean
min(*cols)
min
Computes the min value for each numeric column for each group.
pivot(pivot_col[, values])
pivot
Pivots a column of the current DataFrame and perform the specified aggregation.
sum(*cols)
sum
Compute the sum for each numeric columns for each group.