pyspark.pandas.DataFrame.idxmin¶

DataFrame.idxmin(axis: Union[int, str] = 0) → Series[source]¶

Return index of first occurrence of minimum over requested axis. NA/null values are excluded.

Note

This API collect all rows with minimum value using to_pandas() because we suppose the number of rows with min values are usually small in general.

Parameters

axis0 or ‘index’: Can only be set to 0 at the moment.

Returns

Series

See also

Series.idxmin

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf
   a    b    c
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmin()
a    0
b    3
c    1
dtype: int64

For Multi-column Index

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
   a    b    c
   x    y    z
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmin()
a  x    0
b  y    3
c  z    1
dtype: int64

pyspark.pandas.DataFrame.idxmax pyspark.pandas.DataFrame.loc