pyspark.RDD.mapValues

RDD.mapValues(f: Callable[[V], U]) → pyspark.rdd.RDD[Tuple[K, U]][source]

Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD’s partitioning.

Examples

>>> x = sc.parallelize([("a", ["apple", "banana", "lemon"]), ("b", ["grapes"])])
>>> def f(x): return len(x)
>>> x.mapValues(f).collect()
[('a', 3), ('b', 1)]