Understanding pandas.Series.argsort: Sorting Series by Values
What is pandas.Series.argsort?
In pandas, a Series is a one-dimensional labeled array capable of holding various data types. The argsort
method is a function associated with Series objects that helps you reorder (sort) the Series based on its values.
What does argsort
do?
Instead of directly returning the sorted Series, argsort
provides a new Series containing the original indices rearranged to match the sorted order of the values. In simpler terms, it tells you for each element in the Series, which index it would have if the Series were sorted.
How to use argsort
import pandas as pd
Create a pandas Series
data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1} fruits = pd.Series(data)
Apply argsort
sorted_indices = fruits.argsort() print(sorted_indices)
This will output:
date 2 banana 1 apple 0 cherry 3 dtype: int64
Understanding the output
- Each value in
sorted_indices
indicates the original position of the corresponding value infruits
when sorted.- For example,
date
has a value of 2 insorted_indices
, meaning it would be at the second position (index 1) iffruits
were sorted in ascending order.
- For example,
- The new Series
sorted_indices
has the same index as the original Seriesfruits
.
Important points
To sort in descending order, use the
kind
parameter ofargsort
:descending_indices = fruits.argsort(kind='descending')
You can use
.iloc
withsorted_indices
to access the corresponding sorted values:sorted_fruits = fruits.iloc[sorted_indices] print(sorted_fruits)
This will print the Series sorted in ascending order.
argsort
sorts the values in place (does not modify the original Series).
Sorting in Descending Order
import pandas as pd
data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)
# Sort in descending order (largest to smallest)
descending_indices = fruits.argsort(kind='descending')
print(descending_indices)
This code will output:
cherry 3
apple 0
date 2
banana 1
dtype: int64
Sorting with Missing Values
import pandas as pd
import numpy as np
data = {'apple': 10, 'banana': np.nan, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)
# Sort, excluding missing values (NaN)
sorted_indices = fruits.argsort(na_position='ignore')
print(sorted_indices)
date 2
apple 0
cherry 3
banana NaN # Missing value remains at the end
dtype: float64
Applying argsort with .iloc for Sorted Values
import pandas as pd
data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)
# Get indices for ascending order
sorted_indices = fruits.argsort()
# Access sorted values using indices
sorted_fruits = fruits.iloc[sorted_indices]
print(sorted_fruits)
This code will print the Series sorted in ascending order:
date 1
banana 5
apple 10
cherry 15
dtype: int64
import pandas as pd
data = {'apple': 'red', 'banana': 'yellow', 'cherry': 'red', 'date': 'brown'}
fruits = pd.Series(data)
# Define custom order for colors
color_order = {'red': 0, 'yellow': 1, 'brown': 2}
# Sort based on custom order (using a lambda function)
sorted_indices = fruits.argsort(key=lambda x: color_order[x])
print(sorted_indices)
pandas.Series.sort_values
- It offers more control over the sorting behavior, including:
- Ascending or descending order (
ascending
parameter) - Sorting by multiple columns (
by
parameter) - Handling missing values (
na_position
parameter)
- Ascending or descending order (
- It directly returns a new Series with the values sorted according to your specifications.
- This is the most common and recommended alternative to
argsort
.
Example:
import pandas as pd
data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)
sorted_fruits = fruits.sort_values()
print(sorted_fruits)
List Comprehension with sorted
- It involves creating a list of tuples (value, index), sorting the list, and then extracting the desired information.
- This approach is less efficient for large Series compared to
sort_values
.
import pandas as pd
data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)
sorted_values = sorted(zip(fruits.values, fruits.index))
sorted_indices = [x[1] for x in sorted_values]
print(sorted_indices)
Numba (for advanced users)
- You can use
numba
to write a custom sorting function for Series, but it requires more effort and expertise. - Numba is a just-in-time (JIT) compiler that can potentially speed up Python functions.
- Numba is only recommended for advanced users who need to optimize sorting for very large Series.
- List comprehension with
sorted
can be used for small Series but should be avoided for large datasets due to performance concerns. - For most cases,
pandas.Series.sort_values
is the recommended approach as it provides a new sorted Series and offers more control over sorting behavior. - If you simply need the indices for the sorted Series,
argsort
might be sufficient. However, it's generally less intuitive.