Pandas Series.dt.start_time で時系列データを可視化: 見やすく分かりやすいグラフ作成

pandas.Series.dt.start_time は、Pandasライブラリにおける Series オブジェクトに含まれる時刻データの開始時刻を取得するための属性です。これは、データ分析や可視化において、時刻データの開始時点を特定する際に非常に役立ちます。

使い方

pandas.Series.dt.start_time を使用するには、まず Series オブジェクトに時刻データが含まれていることを確認する必要があります。時刻データは、pandas.Timestamp 形式で格納されている必要があります。

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    pd.Timestamp('2023-01-01 09:00:00'),
    pd.Timestamp('2023-01-02 10:00:00'),
    pd.Timestamp('2023-01-03 11:00:00')
])

次に、dt.start_time 属性を使用して、各時刻データの開始時刻を取得できます。

start_times = data.dt.start_time

print(start_times)

上記のコードを実行すると、以下のような出力が出力されます。

0   2023-01-01 09:00:00
1   2023-01-02 10:00:00
2   2023-01-03 11:00:00
dtype: datetime64[ns]

各時刻データの開始時刻が、Timestamp 形式で取得されています。

応用例

pandas.Series.dt.start_time は、以下のような様々な場面で活用できます。

時刻データの可視化
時刻データの差分を計算
時刻データを日付ごとにグループ化
特定の時刻以降のデータのみを抽出

pandas.Series.dt.start_time は、単一のタイムゾーンのみをサポートしています。複数のタイムゾーンを扱う場合は、tz_localize や tz_convert などのメソッドを使用する必要があります。
pandas.Series.dt.start_time は、pandas バージョン 0.23.0 以降で使用可能です。

特定の時刻以降のデータのみを抽出

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    pd.Timestamp('2023-01-01 09:00:00'),
    pd.Timestamp('2023-01-02 10:00:00'),
    pd.Timestamp('2023-01-03 11:00:00')
])

# 2023年1月2日以降のデータのみを抽出
filtered_data = data[data.dt.start_time >= '2023-01-02']

print(filtered_data)

1   2023-01-02 10:00:00
2   2023-01-03 11:00:00
dtype: datetime64[ns]

2023年1月2日以降の時刻データのみが抽出されています。

時刻データを日付ごとにグループ化

以下のコードは、pandas.Series.dt.start_time を使用して、時刻データを日付ごとにグループ化し、各グループの平均時刻を計算する例です。

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    pd.Timestamp('2023-01-01 09:00:00'),
    pd.Timestamp('2023-01-02 10:00:00'),
    pd.Timestamp('2023-01-02 11:00:00'),
    pd.Timestamp('2023-01-03 12:00:00')
])

# 日付ごとにグループ化し、平均時刻を計算
grouped_data = data.groupby(data.dt.start_time.dt.date).mean()

print(grouped_data)

                0
date
2023-01-01    09:00:00
2023-01-02    10:30:00
2023-01-03    12:00:00
dtype: datetime64[ns]

時刻データが日付ごとにグループ化され、各グループの平均時刻が計算されています。

時刻データの差分を計算

以下のコードは、pandas.Series.dt.start_time を使用して、時刻データの差分を計算する例です。

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    pd.Timestamp('2023-01-01 09:00:00'),
    pd.Timestamp('2023-01-02 10:00:00'),
    pd.Timestamp('2023-01-03 11:00:00'),
    pd.Timestamp('2023-01-04 12:00:00')
])

# 時刻データの差分を計算
time_diffs = data.dt.start_time.diff()

print(time_diffs)

0   1D 0H 0M 0S
1   1D 1H 0M 0S
2   1D 1H 0M 0S
3   1D 1H 0M 0S
dtype: timedelta64[ns]

各時刻データの差分が、timedelta 形式で計算されています。

以下のコードは、pandas.Series.dt.start_time を使用して、時刻データを折れ線グラフで可視化する例です。

import pandas as pd
import matplotlib.pyplot as plt

# 時

dt.index 属性を使用する

pandas.Series.dt.index 属性は、Series オブジェクトのインデックス情報にアクセスするための属性です。インデックスが時刻データの場合は、dt.start_time 属性と同様に、各データの開始時刻を取得することができます。

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    10, 20, 30
], index=pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03']))

# dt.index 属性を使用して開始時刻を取得
start_times = data.index.start_time

print(start_times)

0   2023-01-01 00:00:00
1   2023-01-02 00:00:00
2   2023-01-03 00:00:00
dtype: datetime64[ns]

df.apply() 関数を使用する

pandas.DataFrame.apply() 関数は、DataFrameの各行または列に対して関数を適用するための関数です。この関数を用いて、lambda 式で開始時刻を計算する関数を作成し、各データに適用することができます。

import pandas as pd

# 時刻データを含む Series オブジェクトを作成
data = pd.Series([
    10, 20, 30
], index=pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03']))

# lambda 式で開始時刻を計算する関数
def get_start_time(row):
    return row.index.start_time

# apply 関数を使用して開始時刻を取得
start_times = data.apply(get_start_time)

print(start_times)

0   2023-01-01 00:00:00
1   2023-01-02 00:00:00
2   2023-01-03 00:00:00
dtype: datetime64[ns]

上記以外にも、状況に応じて様々な方法で開始時刻を取得することができます。例えば、以下の方法も考えられます。

カスタム関数を作成する
strftime() メソッドを使用する