Demystifying `base` in NumPy C-API: When Arrays Share Secrets


PyObject: This refers to the base type for all objects in Python. It's a generic pointer that can hold any Python object.

base: This member of the PyArrayObject structure points to another Python object. But what kind of object?

There are two main scenarios where base is used:

  • Wrapper for other data types: NumPy can sometimes wrap other data structures like memory-mapped files or Fortran arrays as NumPy arrays. In these cases, base might point to the underlying data structure being wrapped.

  • Views of other arrays: When you create a view of another NumPy array, the base member of the view will point to the original array it's referencing. This allows the view to share the underlying data buffer with the original array, making operations on the view affect the original data as well.

Understanding base is important for

  • Handling wrapped data: If you suspect you're dealing with a wrapped array, checking the base member can help you determine the underlying data type and handle it appropriately.

  • Knowing when data is shared: If you're working with views, you need to be aware that modifying the view will also modify the original array since they share the same data buffer pointed to by base.



#include <Python.h>
#include <numpy/arrayobject.h>

int main() {
  // Create a simple NumPy array
  PyObject *arr1 = PyArray_arange(10, NPY_DOUBLE);
  PyArrayObject *array1 = (PyArrayObject*)arr1;

  // Check if the base is NULL (meaning no underlying data shared)
  if (array1->base == NULL) {
    printf("array1 owns its own data\n");
  } else {
    printf("array1 might be a view or wrapped data\n");
  }

  // Create a view of the first array
  PyObject *view = PyArray_View(array1, NPY_ARRAY_C_CONTIGUOUS, NULL);
  PyArrayObject *view_array = (PyArrayObject*)view;

  // Check the base of the view. It should point to the original array
  if (view_array->base == array1) {
    printf("view shares data with array1\n");
  }

  Py_DECREF(arr1);
  Py_DECREF(view);
  return 0;
}
  1. We include necessary header files for Python and NumPy C-API.
  2. We create a simple 1D NumPy array (array1) of doubles using PyArray_arange.
  3. We check if array1->base is NULL. If it is, it means array1 owns its own data and isn't a view.
  4. We create a view (view) of array1 with C-contiguous memory layout.
  5. We check the base member of the view (view_array). It should point back to the original array (array1) since they share the underlying data buffer.
  6. Finally, we decrement the reference counts of the Python objects using Py_DECREF to avoid memory leaks.


  1. Using flags: NumPy arrays have various flags associated with them that can provide information about the data ownership and memory layout. You can access these flags using functions like PyArray_FLAGS(array). Checking flags like NPY_ARRAY_OWNDATA can tell you if the array owns its own data (similar to checking base being NULL).

  2. Managing Memory Yourself: If you're creating NumPy arrays from your own pre-allocated memory, you wouldn't need base as you're explicitly managing the data source. You can use functions like PyArray_SimpleNewFromData to create arrays from existing data pointers.

  3. Higher-level NumPy Functions: Many functionalities handled by manipulating base can often be achieved using higher-level NumPy functions in Python. For instance, creating views can be done using functions like arr.view(...) instead of directly manipulating base in C.

  • Higher-level Functions: More Pythonic and easier to use, but may not be suitable for all low-level operations.
  • Managing Memory Yourself: Offers full control over memory, but requires careful management to avoid leaks.
  • Flags: Easier to use, but provides less detailed information about the underlying data source compared to base.