Understanding pandas.Series.argsort: Sorting Series by Values


What is pandas.Series.argsort?

In pandas, a Series is a one-dimensional labeled array capable of holding various data types. The argsort method is a function associated with Series objects that helps you reorder (sort) the Series based on its values.

What does argsort do?

Instead of directly returning the sorted Series, argsort provides a new Series containing the original indices rearranged to match the sorted order of the values. In simpler terms, it tells you for each element in the Series, which index it would have if the Series were sorted.

How to use argsort

  1. import pandas as pd
    
  2. Create a pandas Series

    data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
    fruits = pd.Series(data)
    
  3. Apply argsort

    sorted_indices = fruits.argsort()
    print(sorted_indices)
    

    This will output:

    date     2
    banana   1
    apple    0
    cherry   3
    dtype: int64
    

Understanding the output

  • Each value in sorted_indices indicates the original position of the corresponding value in fruits when sorted.
    • For example, date has a value of 2 in sorted_indices, meaning it would be at the second position (index 1) if fruits were sorted in ascending order.
  • The new Series sorted_indices has the same index as the original Series fruits.

Important points

  • To sort in descending order, use the kind parameter of argsort:

    descending_indices = fruits.argsort(kind='descending')
    
  • You can use .iloc with sorted_indices to access the corresponding sorted values:

    sorted_fruits = fruits.iloc[sorted_indices]
    print(sorted_fruits)
    

    This will print the Series sorted in ascending order.

  • argsort sorts the values in place (does not modify the original Series).



Sorting in Descending Order

import pandas as pd

data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)

# Sort in descending order (largest to smallest)
descending_indices = fruits.argsort(kind='descending')
print(descending_indices)

This code will output:

cherry   3
apple    0
date     2
banana   1
dtype: int64

Sorting with Missing Values

import pandas as pd
import numpy as np

data = {'apple': 10, 'banana': np.nan, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)

# Sort, excluding missing values (NaN)
sorted_indices = fruits.argsort(na_position='ignore')
print(sorted_indices)
date     2
apple    0
cherry   3
banana   NaN  # Missing value remains at the end
dtype: float64

Applying argsort with .iloc for Sorted Values

import pandas as pd

data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)

# Get indices for ascending order
sorted_indices = fruits.argsort()

# Access sorted values using indices
sorted_fruits = fruits.iloc[sorted_indices]
print(sorted_fruits)

This code will print the Series sorted in ascending order:

date     1
banana   5
apple    10
cherry   15
dtype: int64
import pandas as pd

data = {'apple': 'red', 'banana': 'yellow', 'cherry': 'red', 'date': 'brown'}
fruits = pd.Series(data)

# Define custom order for colors
color_order = {'red': 0, 'yellow': 1, 'brown': 2}

# Sort based on custom order (using a lambda function)
sorted_indices = fruits.argsort(key=lambda x: color_order[x])
print(sorted_indices)


pandas.Series.sort_values

  • It offers more control over the sorting behavior, including:
    • Ascending or descending order (ascending parameter)
    • Sorting by multiple columns (by parameter)
    • Handling missing values (na_position parameter)
  • It directly returns a new Series with the values sorted according to your specifications.
  • This is the most common and recommended alternative to argsort.

Example:

import pandas as pd

data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)

sorted_fruits = fruits.sort_values()
print(sorted_fruits)

List Comprehension with sorted

  • It involves creating a list of tuples (value, index), sorting the list, and then extracting the desired information.
  • This approach is less efficient for large Series compared to sort_values.
import pandas as pd

data = {'apple': 10, 'banana': 5, 'cherry': 15, 'date': 1}
fruits = pd.Series(data)

sorted_values = sorted(zip(fruits.values, fruits.index))
sorted_indices = [x[1] for x in sorted_values]
print(sorted_indices)

Numba (for advanced users)

  • You can use numba to write a custom sorting function for Series, but it requires more effort and expertise.
  • Numba is a just-in-time (JIT) compiler that can potentially speed up Python functions.
  • Numba is only recommended for advanced users who need to optimize sorting for very large Series.
  • List comprehension with sorted can be used for small Series but should be avoided for large datasets due to performance concerns.
  • For most cases, pandas.Series.sort_values is the recommended approach as it provides a new sorted Series and offers more control over sorting behavior.
  • If you simply need the indices for the sorted Series, argsort might be sufficient. However, it's generally less intuitive.