pyspark.pandas.Series.spark.transform#

spark.transform(func)#

Applies a function that takes and returns a Spark column. It allows natively applying a Spark function and column APIs with the Spark column internally used in Series or Index. The output length of the Spark column should be the same as input’s.

Note

It requires to have the same input and output length; therefore, the aggregate Spark functions such as count does not work.

Parameters
funcfunction

Function to use for transforming the data by using Spark columns.

Returns
Series or Index
Raises
ValueErrorIf the output from the function is not a Spark column.

Examples

>>> from pyspark.sql.functions import log
>>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"])
>>> df
   a  b
0  1  4
1  2  5
2  3  6
>>> df.a.spark.transform(lambda c: log(c))
0    0.000000
1    0.693147
2    1.098612
Name: a, dtype: float64
>>> df.index.spark.transform(lambda c: c + 10)
Index([10, 11, 12], dtype='int64')
>>> df.a.spark.transform(lambda c: c + df.b.spark.column)
0    5
1    7
2    9
Name: a, dtype: int64