pyspark.pandas.Series.spark.transform#

spark.transform(func)#

Applies a function that takes and returns a Spark column. It allows natively applying a Spark function and column APIs with the Spark column internally used in Series or Index. The output length of the Spark column should be the same as input’s.

Note

It requires to have the same input and output length; therefore, the aggregate Spark functions such as count does not work.

Parameters

funcfunction: Function to use for transforming the data by using Spark columns.

Returns

Series or Index

Raises

ValueErrorIf the output from the function is not a Spark column.

Examples

>>> from pyspark.sql.functions import log
>>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"])
>>> df
   a  b
0  1  4
1  2  5
2  3  6

>>> df.a.spark.transform(lambda c: log(c))
0    0.000000
1    0.693147
2    1.098612
Name: a, dtype: float64

>>> df.index.spark.transform(lambda c: c + 10)
Index([10, 11, 12], dtype='int64')

>>> df.a.spark.transform(lambda c: c + df.b.spark.column)
0    5
1    7
2    9
Name: a, dtype: int64