A Tiny Engine, a Big Lesson: Building Neural Nets with karpathy/micrograd


A Tiny Engine, a Big Lesson: Building Neural Nets with karpathy/micrograd

karpathy/micrograd

2025-10-20

micrograd is a tiny, self-contained scalar-valued automatic differentiation (autograd) engine written purely in Python, with a small neural network library built on top of it.

Think of it as the absolute core mechanism that powers large frameworks like PyTorch or TensorFlow, but stripped down to its essential logic. It's created by Andrej Karpathy, a prominent figure in AI, specifically for educational purposes.

For a software engineer, micrograd isn't necessarily a tool you'd use for large-scale production models, but it offers immense educational value

BenefitExplanation
Deepen Understanding of AutogradYou get to see exactly how backpropagation works. Instead of just calling .backward(), you see the Value objects, the computation graph, and the chain rule being applied step-by-step. This knowledge is invaluable for debugging complex models or optimizing custom loss functions in production frameworks.
Clean, Minimal CodebaseThe entire engine is contained in just a few files of Python code (under 150 lines of code in the core engine.py). It's the perfect size to read, understand, and even modify in a single afternoon.
"Build Your Own" IntuitionIt bridges the gap between theoretical calculus/linear algebra and practical deep learning. If you've ever wondered how a framework computes the gradients, micrograd is your answer. This makes you a much more effective deep learning engineer.
Foundation for Custom ToolsThe principles used in micrograd (operator overloading, storing functions/gradients for the chain rule) are transferable to other areas where automatic differentiation might be useful, like complex numerical simulations or optimization problems outside of traditional neural networks.

You only need Python 3. Since micrograd is so small, there are essentially no heavy dependencies.

The best way to "install" micrograd is simply to clone the repository and work directly with the source files.

git clone https://github.com/karpathy/micrograd.git
cd micrograd

To start using it in your own project, you usually just need to ensure the engine.py and nn.py files are accessible in your environment, and then you import the necessary classes.

The heart of micrograd is the Value class defined in engine.py. This class wraps a scalar number and keeps track of two crucial things

_prev
A set of the input Value objects that were used to compute the current value. This builds the computation graph.

_backward
A function that performs the local backward pass for the operation that created this value.

Here's a simple example of defining a calculation and calculating the gradients

# Assuming you've imported the Value class
from micrograd.engine import Value

# --- 1. Define the Variables (Value objects) ---
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10.0, label='c')

# --- 2. Perform a Calculation ---
# This builds the forward pass (the computation graph)
d = a * b; d.label = 'd'
e = d + c; e.label = 'e'
f = Value(-2.0, label='f')
L = e * f; L.label = 'L'

print(f"L's value: {L.data}") # Output: L's value: -8.0

# --- 3. Compute the Gradients (Backpropagation) ---
# This is the "magic" step!
L.backward()

# --- 4. Inspect the Gradients ---
# L.grad is dL/dL (always 1.0)
print(f"Gradient of L w.r.t a (dL/da): {a.grad}")
print(f"Gradient of L w.r.t b (dL/db): {b.grad}")
print(f"Gradient of L w.r.t c (dL/dc): {c.grad}")
L's value: -8.0
Gradient of L w.r.t a (dL/da): 6.0
Gradient of L w.r.t b (dL/db): -4.0
Gradient of L w.r.t c (dL/dc): -2.0

It starts at L, setting its gradient dL/dL=1.0.

It traverses the computation graph backward (from L back to a,b,c).

At each node (e.g., e), it uses the local derivative and the incoming gradient (from L) to calculate the outgoing gradient (to d and c) via the chain rule.

For example, since L=e⋅f, the local derivative dL/de=f's value, which is -2.0. Therefore, the gradient for e is grad(e)=grad(L)⋅dL/de=1.0⋅(−2.0)=−2.0. This process continues all the way back to a, b, and c.


karpathy/micrograd