Working with Time Series Data in pandas: PeriodIndex vs Alternatives


Creating a PeriodIndex

  • From existing data
    You can pass a list or NumPy array containing period-like data (e.g., dates, strings representing periods) along with a frequency specification (e.g., 'D' for daily, 'M' for monthly) to the pandas.PeriodIndex constructor.
import pandas as pd

dates = ['2023-01-01', '2023-02-01', '2023-03-01']
periods = pd.PeriodIndex(dates, freq='M')
print(periods)
  • From DatetimeIndex
    If you have a DatetimeIndex, you can convert it to a PeriodIndex using the to_period method with the desired frequency.
datetime_index = pd.to_datetime(dates)
period_index = datetime_index.to_period(freq='M')
print(period_index)

Using PeriodIndex in pandas Data Structures

  • Assigning to Series/DataFrame
    Once you have a PeriodIndex, you can use it as the index for a pandas Series or DataFrame.
data = [100, 150, 200]
series = pd.Series(data, index=periods)
print(series)


Example 1: Creating PeriodIndex from Dates with Different Frequencies

import pandas as pd

# Daily Periods
daily_dates = ['2024-06-15', '2024-06-16', '2024-06-17']
daily_periods = pd.PeriodIndex(daily_dates, freq='D')  # Daily frequency

# Monthly Periods (starting from June 2024)
monthly_periods = pd.PeriodRange('2024-06', periods=3, freq='M')  # 3 months starting from June

print("Daily Periods:")
print(daily_periods)

print("\nMonthly Periods:")
print(monthly_periods)

This code creates two PeriodIndex objects:

  • monthly_periods: Represents monthly periods for June, July, and August 2024 (since periods=3).
  • daily_periods: Represents daily periods starting from June 15th, 2024.
import pandas.util.testing as tm

# Sample data as a dictionary
data = {'Sales': [120, 150, 180], 'Customers': [10, 12, 15]}

# Create a DataFrame with a PeriodIndex as the index
dates = ['2023-01', '2023-02', '2023-03']
period_index = pd.PeriodIndex(dates, freq='M')
df = pd.DataFrame(data, index=period_index)

# Access data by period
jan_sales = df.loc['2023-01', 'Sales']  # Access sales for January 2023

# Print the DataFrame
print(df)


pandas.DatetimeIndex

  • Disadvantages
    • May be less efficient for representing regularly spaced time intervals (e.g., daily, monthly).
    • Requires additional manipulation for resampling or aggregating data at specific frequencies.
  • Advantages
    • More widely used and familiar for many users.
    • Offers greater flexibility for representing individual timestamps.
    • Integrates seamlessly with other pandas time-based functionalities.
  • Description
    This is the most common alternative. It represents specific points in time with high precision (down to nanoseconds).

Custom Index with Dates

  • Disadvantages
    • Requires manual handling of time-based operations (e.g., resampling, differencing).
    • Less integrated with pandas time series functionality.
  • Advantages
    • Offers complete control over the index data type.
    • May be suitable for simpler time series with limited functionality needs.
  • Description
    You can create a regular Python list containing date objects (e.g., from datetime module) as the index.

Third-party libraries

  • Disadvantages
    • Adds an additional dependency to your codebase.
    • Might require learning a new API for time manipulation.
  • Advantages
    • May provide specific features not available in pandas (e.g., handling time zones).
  • Libraries
    Libraries like dateutil or pytz offer advanced time manipulation functionalities.
  • For simpler scenarios or specific time zone handling, consider custom solutions or third-party libraries.
  • If efficiency for regular time intervals is crucial, pandas.PeriodIndex remains the best option.
  • If you need high precision time representation and flexibility, pandas.DatetimeIndex is a good choice.