Exploring Alternatives to pandas.DataFrame.round for Tailored Rounding


Purpose

  • Offers flexibility to round different columns to different precision levels.
  • Rounds the numerical values in a DataFrame to a specified number of decimal places.

How it Works

    • decimals: This argument determines the rounding behavior.
      • If decimals is an integer (e.g., 2), all columns are rounded to that many decimal places.
      • If decimals is a dictionary-like object (e.g., {'col1': 1, 'col2': 3}), columns are rounded based on the corresponding values in the dictionary. Column names must be keys in the dictionary.
      • If decimals is a pandas Series, columns are rounded according to the values in the Series. The Series index must match the column names in the DataFrame.
  1. Rounding

    • pandas.DataFrame.round uses "banker's rounding" by default. This means:
      • Values closer to the midpoint between two decimals are rounded to the even number (e.g., 1.5 rounds to 2, 2.5 rounds to 2).
      • If exactly halfway between two decimals, the rounding direction depends on the last digit before the decimal (e.g., 0.05 rounds to 0.0, 1.05 rounds to 1.1).
  2. Output

    • Returns a new DataFrame with the rounded values. The original DataFrame remains unchanged.

Example

import pandas as pd

data = {'col1': [1.2345, 5.6789, 9.0123], 'col2': [2.5, 3.5, 4.5]}
df = pd.DataFrame(data)

# Round all columns to 2 decimal places
df_rounded_all = df.round(2)
print(df_rounded_all)

# Round specific columns to different precisions
df_rounded_specific = df.round({'col1': 1, 'col2': 0})
print(df_rounded_specific)

This will output two DataFrames with the rounded values based on the specified decimals arguments.

Key Points

  • For more control over rounding behavior beyond banker's rounding, consider using the numpy.round function with a custom rounding mode.
  • pandas.DataFrame.round modifies a copy of the DataFrame, not the original.


Rounding Specific Columns with Different Precisions

import pandas as pd

data = {'price': [12.3456, 56.7890, 90.1234],
        'quantity': [10, 25, 15],
        'discount': [0.05, 0.10, 0.15]}
df = pd.DataFrame(data)

# Round 'price' to 2 decimals, 'quantity' to no decimals, and 'discount' to 1 decimal
rounded_df = df.round({'price': 2, 'quantity': 0, 'discount': 1})
print(rounded_df)

Rounding While Handling Missing Values

import pandas as pd
import numpy as np

data = {'value': [1.234, np.nan, 5.678]}
df = pd.DataFrame(data)

# Round 'value' to 2 decimals, replacing NaN with 'NA'
rounded_df = df.round(2).fillna('NA')
print(rounded_df)
import pandas as pd

data = {'col1': [1.2345, 5.6789, 9.0123],
        'col2': [2.5, 3.5, 4.5],
        'col3': ['text', 'another_text', 'data']}
df = pd.DataFrame(data)

# Create a Series to specify rounding for each column
rounding_series = pd.Series([2, 0, None], index=df.columns)  # None for 'col3' (text)

# Round based on the Series (ignores 'col3' as it's not numeric)
rounded_df = df.round(rounding_series)
print(rounded_df)


List Comprehension with round function

  • Offers more control over rounding behavior by specifying the rounding mode as an additional argument to round.
  • This approach iterates through the DataFrame and rounds each value individually using the built-in round function.

Example

import pandas as pd

data = {'col1': [1.2345, 5.6789, 9.0123], 'col2': [2.5, 3.5, 4.5]}
df = pd.DataFrame(data)

def round_to_two(value):
  # Custom rounding function (example: round to nearest even number)
  return round(value, 2) if value % 2 == 0.5 else round(value - 0.5, 2)  # Rounds down for .5

rounded_df = pd.DataFrame([[round_to_two(val) for val in row] for row in df.values], columns=df.columns)
print(rounded_df)

numpy.around function

  • Provides more rounding options beyond banker's rounding (e.g., rounding up, down, towards zero).
  • This NumPy function offers similar rounding functionality to pandas.DataFrame.round.

Example

import pandas as pd
import numpy as np

data = {'col1': [1.2345, 5.6789, 9.0123], 'col2': [2.5, 3.5, 4.5]}
df = pd.DataFrame(data)

rounded_df = pd.DataFrame(np.around(df.values, decimals=2), columns=df.columns)
print(rounded_df)

# Rounding down (towards zero)
rounded_down_df = pd.DataFrame(np.around(df.values, decimals=2, rounding_mode='floor'), columns=df.columns)
print(rounded_down_df)

pd.Series.apply with custom rounding function

  • Allows for more complex rounding logic based on specific conditions.
  • This approach applies a custom rounding function to each Series in the DataFrame using apply.

Example

import pandas as pd

def custom_round(value):
  if value < 5:
    return round(value, 1)
  else:
    return round(value, 0)

data = {'col1': [1.234, 5.678, 9.012], 'col2': [2.5, 3.5, 4.5]}
df = pd.DataFrame(data)

rounded_df = df.apply(custom_round, axis=0)
print(rounded_df)
  • Performance considerations (list comprehension can be slower for large DataFrames).
  • The complexity of your rounding logic.
  • The level of control you need over rounding behavior (custom rounding modes).