pyspark.pandas.Series.spark.transform#
- spark.transform(func)#
Applies a function that takes and returns a Spark column. It allows natively applying a Spark function and column APIs with the Spark column internally used in Series or Index. The output length of the Spark column should be the same as input’s.
Note
It requires to have the same input and output length; therefore, the aggregate Spark functions such as count does not work.
- Parameters
- funcfunction
Function to use for transforming the data by using Spark columns.
- Returns
- Series or Index
- Raises
- ValueErrorIf the output from the function is not a Spark column.
Examples
>>> from pyspark.sql.functions import log >>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"]) >>> df a b 0 1 4 1 2 5 2 3 6
>>> df.a.spark.transform(lambda c: log(c)) 0 0.000000 1 0.693147 2 1.098612 Name: a, dtype: float64
>>> df.index.spark.transform(lambda c: c + 10) Index([10, 11, 12], dtype='int64')
>>> df.a.spark.transform(lambda c: c + df.b.spark.column) 0 5 1 7 2 9 Name: a, dtype: int64