PyTorch QuantizationのConvertCustomConfigを使いこなす：詳細ガイドとサンプルコード

torch.ao.quantization.fx.custom_config.ConvertCustomConfig は、PyTorch Quantization の FX Graph Mode における変換処理をカスタマイズするための設定クラスです。このクラスを用いることで、以下のことが可能になります。

保持する属性を指定する
観察された属性から量子化属性へのマッピングを指定する

主な属性と機能

preserved_attributes: 保持する属性名のリスト。このリストに含まれる属性は、変換処理において量子化されずにそのまま保持されます。
observed_to_quantized_mapping: 観察された属性名と量子化属性名の辞書。この辞書を用いて、変換処理においてどの属性を量子化属性に置き換えるかを指定します。

使用方法

ConvertCustomConfig は、convert_fx() 関数に custom_config 引数として渡すことで使用できます。

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict={'fc1': {'dtype': torch.qint8}, 'fc2': {'dtype': torch.qint8}})

# カスタム設定を定義
custom_config = ConvertCustomConfig(
    observed_to_quantized_mapping={'fc1.weight': 'fc1_weight_quantized'},
    preserved_attributes=['fc1.bias'],
)

# モデルを変換
quantized_model = quantized_model.convert(custom_config=custom_config)

上記の例では、fc1.weight 属性は fc1_weight_quantized 属性に量子化され、fc1.bias 属性は量子化されずに保持されます。

ConvertCustomConfig は、主に古いモデルの量子化との互換性を維持するために使用されます。新しいモデルを量子化する場合には、通常は set_observed_to_quantized_mapping() や set_preserved_attributes() などのメソッドを用いる代わりに、qconfig_dict 引数を使用して量子化設定を直接指定することをお勧めします。

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict={'fc1': {'dtype': torch.qint8}, 'fc2': {'dtype': torch.qint8}})

# カスタム設定を定義
custom_config = ConvertCustomConfig(
    observed_to_quantized_mapping={'fc1.weight': 'fc1_weight_quantized'},
)

# モデルを変換
quantized_model = quantized_model.convert(custom_config=custom_config)

この例では、fc1.weight 属性は fc1_weight_quantized 属性に量子化されます。

例 2: 保持する属性を指定する

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict={'fc1': {'dtype': torch.qint8}, 'fc2': {'dtype': torch.qint8}})

# カスタム設定を定義
custom_config = ConvertCustomConfig(
    preserved_attributes=['fc1.bias'],
)

# モデルを変換
quantized_model = quantized_model.convert(custom_config=custom_config)

この例では、fc1.bias 属性は量子化されずに保持されます。

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict={'fc1': {'dtype': torch.qint8}, 'fc2': {'dtype': torch.qint8}})

# カスタム設定を定義
custom_config = ConvertCustomConfig(
    observed_to_quantized_mapping={'fc1.weight': 'fc1_weight_quantized'},
    preserved_attributes=['fc1.bias'],
)

# モデルを変換
quantized_model = quantized_model.convert(custom_config=custom_config)

ConvertCustomConfig を使用する場合には、量子化処理の挙動が複雑になる可能性があることに注意する必要があります。
上記の例はあくまで基本的な使用方法を示したものです。実際の使用例では、モデルや量子化設定に合わせて、適切に設定を調整する必要があります。

qconfig_dict を使用した属性名による個別の設定

従来の ConvertCustomConfig で使用していた observed_to_quantized_mapping と preserved_attributes の機能は、qconfig_dict を使用して属性名ごとに個別に設定することが可能です。

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()

# qconfig_dict を使用して属性ごとの設定を定義
qconfig_dict = {
    'fc1': {
        'dtype': torch.qint8,
        'observed_to_quantized_mapping': {'weight': 'fc1_weight_quantized'},
        'preserved_attributes': ['bias'],
    },
    'fc2': {'dtype': torch.qint8},
}

# モデルを量子化
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict=qconfig_dict)

get_quantized_config 関数による属性レベルの設定

get_quantized_config 関数を使用して、各属性レベルで量子化設定を個別に取得し、それを修正してから quantize_fx 関数に渡す方法もあります。

import torch
import torch.nn as nn
import torch.quantization

# モデルを定義
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 10)

# モデルを準備
model = MyModel()

# get_quantized_config 関数を使用して量子化設定を取得
qconfig_dict = torch.quantization.get_quantized_config(model)

# fc1 レイヤーの設定を修正
qconfig_dict['fc1']['observed_to_quantized_mapping']['weight'] = 'fc1_weight_quantized'
qconfig_dict['fc1']['preserved_attributes'] = ['bias']

# モデルを量子化
quantized_model = torch.quantization.quantize_fx(model, qconfig_dict=qconfig_dict)

上記の例は、qconfig_dict を直接編集する方法とほぼ同じですが、get_quantized_config 関数を使用して既存の設定をベースに修正する点が異なります。

CustomQuantizer クラスを使用した高度なカスタマイズ

より高度なカスタマイズが必要な場合は、CustomQuantizer クラスを使用して独自の量子化ロジックを実装することができます。これは、複雑なモデルや特殊な量子化要件を持つ場合に役立ちます。

PyTorch QuantizationにおけるObserverBaseの役割と代替方法とは？

PyTorch Quantization は、機械学習モデルの推論速度とメモリ効率を向上させるための手法です。モデルを低精度整数形式に変換することで、計算コストを削減し、モデルサイズを縮小することができます。torch. ao. quantization

PyTorch Quantizationにおけるオブザーバークラス生成：ObserverBase.with_callable_args vs 代替方法

torch. ao. quantization. observer. ObserverBase. with_callable_args() は、PyTorch Quantization において、オブザーバークラスのインスタンス生成時に、コンストラクタ引数を動的に設定するためのメソッドです。オブザーバーは、量化処理において、モデル内のテンソルの値を観察し、統計情報を収集するための重要な役割を担います。

【保存版】PyTorch Quantization：PerChannelMinMaxObserver.reset_min_max_vals()のしくみとサンプルコード

torch. ao. quantization. observer. PerChannelMinMaxObserver. reset_min_max_vals()は、PyTorch Quantizationにおいて、PerChannelMinMaxObserverが保持する最小値と最大値の統計情報をリセットするためのメソッドです。

PyTorch Quantizationにおける「torch.ao.quantization.qconfig.default_qat_qconfig」の解説

torch. ao. quantization. qconfig. default_qat_qconfig は、PyTorch Quantizationにおいて、Quantization Aware Training (QAT) を行う際に使用するデフォルトの QConfig です。

PyTorch Quantizationで`torch.ao.quantization.qconfig.default_qat_qconfig_v2`を使いこなす：詳細解説とサンプルコード

torch. ao. quantization. qconfig. default_qat_qconfig_v2 は、PyTorch の Quantization において、動的量子化を行う際のデフォルト設定を定義する QConfig オブジェクトです。この設定は、モデルの精度を維持しながら計算量を削減するために、モデルのトレーニングと推論の両方の段階で使用されます。

【初心者向け】PyTorch Quantizationのper_channel_dynamic_qconfigの使い方を分かりやすく解説

この設定は、以下の利点があります。メモリ効率: 動的量子化は、静的量子化よりもメモリ効率が高くなります。柔軟性: モデルの各チャネルに対して異なる量子化スケールとゼロポイントを設定することができ、より柔軟な量子化が可能になります。精度向上: モデルの各チャネルに対して個別に量子化することで、モデル全体の精度を向上させることができます。

PyTorch Quantizationにおける「torch.ao.quantization.qconfig_mapping.get_default_qat_qconfig_mapping」の解説

torch. ao. quantization. qconfig_mapping. get_default_qat_qconfig_mapping は、PyTorch Quantizationにおいて、量子化認識トレーニングのためのデフォルトの QConfigMapping を取得するための関数です。量子化認識トレーニングは、モデルを訓練中に量子化する方法の一つであり、モデルの精度を維持しながら計算量を削減することができます。

PyTorch Quantizationにおける「torch.ao.quantization.qconfig_mapping.QConfigMapping.set_module_name()」の詳細解説

torch. ao. quantization. qconfig_mapping. QConfigMapping. set_module_name() は、PyTorch Quantizationにおいて、モデル内の特定のモジュールに対して個別の量子化設定を適用するためのメソッドです。このメソッドを使用することで、モデル全体に対して均一な量子化設定を適用するだけでなく、個々のモジュールの特性や役割に応じて、最適な量子化設定を個別に設定することができます。

PyTorch Quantizationのprepare_qat_fx：モデルの軽量化と高速化を叶える最強ツール

この関数は、以下の3つの引数を取ります。example_inputs: モデルの入力となるサンプルデータqconfig_mapping: 量子化設定を定義するマッピングmodel: 量子化対象のモデルこの関数の処理は以下の通りです。モデルをトレースし、各モジュールの入出力形状を記録します。

PyTorch Quantization で QAT を使う際のサンプルコード

torch. ao. quantization. quantize_qat は、PyTorch で quantization-aware training (QAT) を実行するための API です。QAT は、モデルのトレーニング中に量化情報を収集し、その情報に基づいてモデルを量化する方法です。量化は、モデルの精度を維持しながら、メモリの使用量と計算量を削減することができます。