Packaging PyTorch Models with Explicit Dependency Control: Using torch.package.PackageExporter.extern()


Purpose

  • It instructs the packaging process to not include a specific module within the package itself.
  • In PyTorch package creation, extern() is used to manage external dependencies for your exported model.

How it Works

  1. Instantiation
    You create a PackageExporter object to manage the packaging process.
  2. Marking External Dependencies
    Call extern() on the PackageExporter object, passing the name(s) of the module(s) you want to treat as external. You can use:
    • Single module name (e.g., extern("numpy"))
    • Wildcard notation for submodules (e.g., extern("PIL.**") for all submodules under PIL)

Benefits of Using extern()

  • Flexibility
    Allows users to have their preferred versions of the external modules installed.
  • Dependency Management
    Makes it clear that the package relies on these external modules to function correctly.
  • Reduced Package Size
    Excluding unnecessary modules keeps your package leaner and more efficient to distribute.

Example

import torch.package

with torch.package.PackageExporter(io.BytesIO()) as exp:
    # ... (your model code) ...

    # Mark "numpy" as an external dependency
    exp.extern("numpy")

    # Save your model (e.g., as "model.pkl")
    exp.save_pickle("model", "model.pkl", your_model)
  • Consider using version management tools like conda or pipenv to ensure consistency in external dependencies.
  • Users need to have compatible versions of these external modules installed to run the packaged model.
  • The external modules won't be included in the package file.


import torch.package
import io

# Example 1: Marking a single external module
with torch.package.PackageExporter(io.BytesIO()) as exp:
    # ... (your model code using NumPy) ...

    # Mark "numpy" as an external dependency
    exp.extern("numpy")

    # Save your model (e.g., as "model.pkl")
    exp.save_pickle("model", "model.pkl", your_model)

# Example 2: Marking multiple external modules
with torch.package.PackageExporter(io.BytesIO()) as exp:
    # ... (your model code using NumPy and Pillow) ...

    # Mark "numpy" and "PIL" as external dependencies
    exp.extern("numpy")
    exp.extern("PIL")

    # Save your model (e.g., as "model_with_pillow.pkl")
    exp.save_pickle("model", "model_with_pillow.pkl", your_model)

# Example 3: Marking all submodules within a package
with torch.package.PackageExporter(io.BytesIO()) as exp:
    # ... (your model code using various submodules from torchvision) ...

    # Mark all submodules under "torchvision" as external
    exp.extern("torchvision.**")  # Wildcard notation

    # Save your model (e.g., as "model_with_torchvision.pkl")
    exp.save_pickle("model", "model_with_torchvision.pkl", your_model)
  • These examples use io.BytesIO() for demonstration purposes. You'd typically save the package to a file.
  • Replace your_model with your actual model definition.
  • The extern() calls mark different external dependencies:
    • Example 1: Single module (numpy)
    • Example 2: Multiple modules (numpy, PIL)
    • Example 3: All submodules within a package (torchvision.**)
  • Each example creates a PackageExporter instance to manage the packaging process.
  • When deploying the packaged model, provide clear instructions (or use tools like conda) to ensure users have the required external dependencies installed.
  • Ensure your model code doesn't directly import the external modules but uses the imported functionality (e.g., functions, classes).


Dependency Management Tools

  • Tooling Built for PyTorch
    Consider libraries like torchserve (official tool) or mlflow that are designed for packaging and deploying PyTorch models. They might offer built-in mechanisms for handling dependencies.
  • conda or pipenv
    These package managers create virtual environments, allowing you to manage specific versions of dependencies needed for your project. This ensures consistent behavior when running the model, even if users have different packages installed globally.

Explicit Module Imports

  • Instead of using extern(), you can explicitly import the external modules within your model code. This avoids the need to declare external dependencies but might increase package size. However, ensure your model code doesn't rely on specific versions of the imported modules to maintain compatibility across environments.
  • If possible, refactor your code to avoid relying on external modules entirely. This could involve implementing the functionality yourself or finding alternative libraries that don't require additional dependencies. This approach offers the most control but might require more effort.
ApproachProsCons
conda or pipenvConsistent dependencies, easier deploymentRequires setting up virtual environments
PyTorch Deployment ToolsStreamlined packaging and deploymentMight require additional tools to learn and manage
Explicit ImportsNo need for external declarationsIncreased package size, potential versioning issues
Code RefactoringMost control, eliminates external dependenciesRequires code changes, might not be feasible for all functionalities