Beyond Variance: Alternative Measures of Spread for Half-Cauchy Distributions in PyTorch


Half-Cauchy Distribution and Variance

  • The variance of a probability distribution measures how spread out its values are from the mean. A higher variance indicates a wider spread, while a lower variance signifies values concentrated closer to the mean.
  • The HalfCauchy distribution in PyTorch represents a random variable that follows a half-Cauchy distribution. This distribution is similar to the Cauchy distribution but restricted to non-negative values. It's characterized by a long, heavy tail on the positive side.

HalfCauchy.variance Function

  • The HalfCauchy.variance function doesn't directly compute the variance of the half-Cauchy distribution. This is because the half-Cauchy distribution does not have a finite (well-defined) variance. Its tails extend infinitely towards positive values, making the variance approach infinity.

Workaround: Alternative Measures of Spread

  • Since HalfCauchy.variance isn't suitable, alternative measures of spread can be used to understand the distribution's behavior:
    • Median Absolute Deviation (MAD)
      This metric calculates the median of the absolute deviations from the median. It's less sensitive to outliers compared to standard deviation.
    • Interquartile Range (IQR)
      This represents the range between the first quartile (Q1) and the third quartile (Q3) of the distribution. It captures the middle 50% of the data.

Code Example (Using Median Absolute Deviation)

import torch
from torch.distributions import HalfCauchy

# Create a HalfCauchy distribution
scale = 1.0  # Adjust scale as needed
dist = HalfCauchy(scale)

# Generate samples
samples = dist.sample((1000,))

# Calculate median absolute deviation (MAD) using PyTorch functions
median = torch.median(samples)
abs_devs = torch.abs(samples - median)
mad = torch.median(abs_devs)

print("Median Absolute Deviation (MAD):", mad.item())

This code snippet demonstrates how to calculate the MAD of a sample drawn from a half-Cauchy distribution.

  • The provided code example showcases how to calculate MAD using PyTorch functions.
  • Consider alternative measures of spread like MAD or IQR to understand the spread of the half-Cauchy distribution.
  • HalfCauchy.variance is not suitable for calculating variance due to the infinite tails of the distribution.


Interquartile Range (IQR)

import torch
from torch.distributions import HalfCauchy

# Create a HalfCauchy distribution
scale = 1.0  # Adjust scale as needed
dist = HalfCauchy(scale)

# Generate samples
samples = dist.sample((1000,))

# Calculate quartiles (Q1 and Q3)
q1, q3 = torch.quantile(samples, [0.25, 0.75])

# Calculate Interquartile Range (IQR)
iqr = q3 - q1

print("Interquartile Range (IQR):", iqr.item())

This code calculates the first (Q1) and third (Q3) quartiles using torch.quantile and then computes the IQR by subtracting Q1 from Q3.

Custom Function for Median Absolute Deviation (MAD)

import torch

def mad(samples):
  """
  Calculates the Median Absolute Deviation (MAD) of a tensor.

  Args:
      samples: A torch tensor containing the data samples.

  Returns:
      A torch tensor representing the MAD value.
  """
  median = torch.median(samples)
  abs_devs = torch.abs(samples - median)
  return torch.median(abs_devs)

# Create a HalfCauchy distribution and generate samples (same as previous examples)
scale = 1.0
dist = HalfCauchy(scale)
samples = dist.sample((1000,))

# Calculate MAD using the custom function
mad_value = mad(samples)

print("Median Absolute Deviation (MAD):", mad_value.item())

This code defines a reusable mad function that calculates the MAD for any tensor. It then uses this function to compute the MAD of the generated samples.



Median Absolute Deviation (MAD)

  • This metric focuses on the median, making it less susceptible to outliers compared to standard deviation. It calculates the median of the absolute deviations from the median.

Interquartile Range (IQR)

  • This measure captures the middle 50% of the data. It represents the range between the first quartile (Q1) and the third quartile (Q3) of the distribution.

Percentiles

  • You can calculate specific percentiles (e.g., 10th percentile, 90th percentile) to gauge the spread of values across different portions of the distribution.
import torch
from torch.distributions import HalfCauchy

# Create a HalfCauchy distribution
scale = 1.0  # Adjust scale as needed
dist = HalfCauchy(scale)

# Generate samples
samples = dist.sample((1000,))

# Calculate Median Absolute Deviation (MAD)
median = torch.median(samples)
abs_devs = torch.abs(samples - median)
mad = torch.median(abs_devs)

# Calculate Interquartile Range (IQR)
q1, q3 = torch.quantile(samples, [0.25, 0.75])
iqr = q3 - q1

print("Median Absolute Deviation (MAD):", mad.item())
print("Interquartile Range (IQR):", iqr.item())

Choosing the Right Measure

  • Percentiles provide insight into specific regions of the distribution's tail behavior.
  • IQR focuses on the middle 50% of the data, filtering out extreme outliers.
  • MAD is a good choice for overall spread, considering both positive and negative deviations from the median.
  • The best choice depends on your specific needs and what aspect of the spread you want to emphasize.
  • If you absolutely need a variance-like measure, you might explore alternative heavy-tailed distributions with well-defined variances in PyTorch's torch.distributions module, such as the Student's T-distribution (StudentT).
  • While these alternatives offer valuable insights, it's important to remember that the half-Cauchy distribution has heavy tails, meaning the spread can extend infinitely in the positive direction. So, these measures might not capture the entire picture, but they provide a better understanding compared to variance.