Understanding Python Descriptors

python
deep learning
LLM
While vibe coding with Claude, it introduced me to unfamiliar behavior: using the __get__(obj) method to convert a function to a bound method of the given object. This was necessary to monkey patch the self attention module’s forward pass to log input data types as register_forward_hook only works on positional arguments (which LLaMA’s self attention module doesn’t have, it only has keyword arguments). This led me to do a deep dive into understanding descriptors with the helpful Descriptor Guide in the Python docs, which I walkthrough in this blog post.
Author

Vishal Bakshi

Published

April 1, 2025

Background

When monkey-patching the Llama self-attention forward pass (to log its inputs’ data type) I was vibe coding with Claude and it generated the following line to pass the necessary arguments to the original forward pass of the module:

orig_forward.__get__(self_attn, type(self_attn))(**kwargs)

In a prior iteration, I was using the following line suggested by Claude, with the intention of passing self_attn as self:

orig_forward(self_attn, *args, **kwargs)

This was essentially doing the following:

orig_forward(self_attn, hidden_states=hidden_states, attention_mask=attention_mask, ...)

Which caused the following error:

TypeError: LlamaFlashAttention2.forward() got multiple values for argument 'hidden_states'

self_attn was being passed as the argument to the hidden_states parameter, and then hidden_states=hidden_states was again assigning an argument to the hidden_states parameter. So how do we pass self_attn as self? This is where the __get__ method comes in which is part of the Python Descriptor. Descriptors are:

Any object which defines the methods __get__(), __set__(), or __delete__(). When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup. Normally, using a.b to get, set or delete an attribute looks up the object named b in the class dictionary for a, but if b is a descriptor, the respective descriptor method gets called. Understanding descriptors is a key to a deep understanding of Python because they are the basis for many features including functions, methods, properties, class methods, static methods, and reference to super classes.

After reading that a few times I still didn’t understand it! Though I think the key is:

When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup.

Claude explained it this way:

__get__ is a special method that converts a function into a bound method. It’s like saying “make this function a method of this object.”

Translating that to my use case: __get__ makes orig_forward a method of self_attn, no longer requiring us to pass self_attn as it now is self.

That certainly makes sense (i.e. I understand those words) but I don’t really understand why or how. That led me to the Python documentation’s Descriptor Guide which I’ll walk through here.

(There was also this interesting discussion about changing the name to __bind__ when calling it on a function as it binds the function as a method of the given object, which we’ll see later on).

Primer

Simple example: A descriptor that returns a constant

class Ten:
    def __get__(self, obj, objtype=None):
        return 10
t = Ten()
t
<__main__.Ten at 0x78b2fd072c50>
type(t)
__main__.Ten
t.__get__(4)
10

I think the only reason Ten is a descriptor is because it “defines the methods __get__(), __set__(), or __delete__()”.

To use the descriptor, it must be stored as a class variable in another class:

class A:
    x = 5                       # Regular class attribute
    y = Ten()                   # Descriptor instance
a = A()                     # Make an instance of class A
a
<__main__.A at 0x78b2fd0707d0>
a.x                         # Normal attribute lookup
5
a.y                         # Descriptor lookup
10

Note that the value 10 is not stored in either the class dictionary or the instance dictionary. Instead, the value 10 is computed on demand.

A.__dict__
mappingproxy({'__module__': '__main__',
              'x': 5,
              'y': <__main__.Ten at 0x78b2fd0722d0>,
              '__dict__': <attribute '__dict__' of 'A' objects>,
              '__weakref__': <attribute '__weakref__' of 'A' objects>,
              '__doc__': None})

Modifying Ten a bit to visualize this:

class Ten2:
    def __get__(self, obj, objtype=None):
        print(f"__get__ called with obj={obj}, objtype={objtype}")
        return 10

class A2:
    x = 5
    y = Ten2()  # Descriptor instance
a2 = A2()
a2.y
__get__ called with obj=<__main__.A2 object at 0x78b2fd089710>, objtype=<class '__main__.A2'>
10

Cool!

Dynamic Lookups

import os

class DirectorySize:

    def __get__(self, obj, objtype=None):
        return len(os.listdir(obj.dirname))

class Directory:

    size = DirectorySize()              # Descriptor instance

    def __init__(self, dirname):
        self.dirname = dirname          # Regular instance attribute
s = Directory('songs')
g = Directory('games')
s.size
4
g.size
2

Removing a file then calling the descriptor’s __get__ dynamically calculates the new value:

os.remove('games/game1.txt')            # Delete a game
g.size
1

Managed attributes

The descriptor is assigned to a public attribute in the class dictionary while the actual data is stored as a private attribute in the instance dictionary.

Note that I wasn’t able to see the logging output in this notebook so I’m using print statements instead.

class LoggedAgeAccess:

    def __get__(self, obj, objtype=None):
        value = obj._age
        print(f'Accessing age giving {value}')
        return value

    def __set__(self, obj, value):
        print(f'Updating age to {value}')
        obj._age = value

class Person:

    age = LoggedAgeAccess()             # Descriptor instance

    def __init__(self, name, age):
        self.name = name                # Regular instance attribute
        self.age = age                  # Calls __set__()

    def birthday(self):
        self.age += 1                   # Calls both __get__() and __set__()
mary = Person('Mary M', 30)         # The initial age update is logged
dave = Person('David D', 40)
Updating age to 30
Updating age to 40
vars(mary), vars(dave)
({'name': 'Mary M', '_age': 30}, {'name': 'David D', '_age': 40})
mary.age
Accessing age giving 30
30
mary.birthday()
Accessing age giving 30
Updating age to 31
mary.age
Accessing age giving 31
31
dave.name
'David D'
dave.age
Accessing age giving 40
40

Customized names

When a class uses descriptors, it can inform each descriptor about which variable name was used.

class LoggedAccess:

    def __set_name__(self, owner, name):
        self.public_name = name
        self.private_name = '_' + name

    def __get__(self, obj, objtype=None):
        value = getattr(obj, self.private_name)
        print(f'Accessing {self.public_name} giving {value}')
        return value

    def __set__(self, obj, value):
        print(f'Updating {self.public_name} to {value}')
        setattr(obj, self.private_name, value)

class Person:

    name = LoggedAccess()                # First descriptor instance
    age = LoggedAccess()                 # Second descriptor instance

    def __init__(self, name, age):
        self.name = name                 # Calls the first descriptor
        self.age = age                   # Calls the second descriptor

    def birthday(self):
        self.age += 1
vars(Person)['name']
<__main__.LoggedAccess at 0x78b2edeb8950>
vars(vars(Person)['name'])
{'public_name': 'name', 'private_name': '_name'}
vars(vars(Person)['age'])
{'public_name': 'age', 'private_name': '_age'}
pete = Person('Peter P', 10)
Updating name to Peter P
Updating age to 10
kate = Person('Catherine C', 20)
Updating name to Catherine C
Updating age to 20
vars(pete)
{'_name': 'Peter P', '_age': 10}
vars(kate)
{'_name': 'Catherine C', '_age': 20}

I think the main takeaway here is that we didn’t specify the name of the field so we could use the same descriptor for both name and age.

Closing thoughts

Looking at how __set_name__ behaves (the example in the docs):

class C:
    def __set_name__(self, owner, name):
        print(f"__set_name__ called with owner={owner.__name__}, name='{name}'")
        self.name = name

class A:
    x = C()  # This will trigger __set_name__
    y = C()  # This will trigger it again with a different name
    bananas = C()
__set_name__ called with owner=A, name='x'
__set_name__ called with owner=A, name='y'
__set_name__ called with owner=A, name='bananas'
a = A()
a.x, a.y, a.x.name, a.y.name, a.bananas.name
(<__main__.C at 0x78b331674190>,
 <__main__.C at 0x78b2df52ccd0>,
 'x',
 'y',
 'bananas')

The part of particular interest to me is:

Descriptors are used throughout the language. It is how functions turn into bound methods.

Complete practical example

Validator class

A validator is a descriptor for managed attribute access. Prior to storing any data, it verifies that the new value meets various type and range restrictions. If those restrictions aren’t met, it raises an exception to prevent data corruption at its source.

from abc import ABC, abstractmethod

class Validator(ABC):

    def __set_name__(self, owner, name):
        print("__set_name__ is called")
        self.private_name = '_' + name

    def __get__(self, obj, objtype=None):
        print("__get__ is called")
        return getattr(obj, self.private_name)

    def __set__(self, obj, value):
        print("__set__ is called")
        self.validate(value)
        setattr(obj, self.private_name, value)

    @abstractmethod
    def validate(self, value):
        print("validate is called")
        pass

Custom validators

Here are three practical data validation utilities:

  1. OneOf verifies that a value is one of a restricted set of options.

  2. Number verifies that a value is either an int or float. Optionally, it verifies that a value is between a given minimum or maximum.

  3. String verifies that a value is a str. Optionally, it validates a given minimum or maximum length. It can validate a user-defined predicate as well.

class OneOf(Validator):

    def __init__(self, *options):
        self.options = set(options)

    def validate(self, value):
        if value not in self.options:
            raise ValueError(
                f'Expected {value!r} to be one of {self.options!r}'
            )

class Number(Validator):

    def __init__(self, minvalue=None, maxvalue=None):
        self.minvalue = minvalue
        self.maxvalue = maxvalue

    def validate(self, value):
        if not isinstance(value, (int, float)):
            raise TypeError(f'Expected {value!r} to be an int or float')
        if self.minvalue is not None and value < self.minvalue:
            raise ValueError(
                f'Expected {value!r} to be at least {self.minvalue!r}'
            )
        if self.maxvalue is not None and value > self.maxvalue:
            raise ValueError(
                f'Expected {value!r} to be no more than {self.maxvalue!r}'
            )

class String(Validator):

    def __init__(self, minsize=None, maxsize=None, predicate=None):
        self.minsize = minsize
        self.maxsize = maxsize
        self.predicate = predicate

    def validate(self, value):
        if not isinstance(value, str):
            raise TypeError(f'Expected {value!r} to be an str')
        if self.minsize is not None and len(value) < self.minsize:
            raise ValueError(
                f'Expected {value!r} to be no smaller than {self.minsize!r}'
            )
        if self.maxsize is not None and len(value) > self.maxsize:
            raise ValueError(
                f'Expected {value!r} to be no bigger than {self.maxsize!r}'
            )
        if self.predicate is not None and not self.predicate(value):
            raise ValueError(
                f'Expected {self.predicate} to be true for {value!r}'
            )

Practical application

class Component:

    name = String(minsize=3, maxsize=10, predicate=str.isupper)
    kind = OneOf('wood', 'metal', 'plastic')
    quantity = Number(minvalue=0)

    def __init__(self, name, kind, quantity):
        self.name = name
        self.kind = kind
        self.quantity = quantity
__set_name__ is called
__set_name__ is called
__set_name__ is called

The descriptors prevent invalid instances from being created:

Component('Widget', 'metal', 5)      # Blocked: 'Widget' is not all uppercase
__set__ is called
ValueError: Expected <method 'isupper' of 'str' objects> to be true for 'Widget'
Component('WIDGET', 'metle', 5)      # Blocked: 'metle' is misspelled
__set__ is called
__set__ is called
ValueError: Expected 'metle' to be one of {'metal', 'plastic', 'wood'}
Component('WIDGET', 'metal', -5)     # Blocked: -5 is negative
__set__ is called
__set__ is called
__set__ is called
ValueError: Expected -5 to be at least 0
Component('WIDGET', 'metal', 'V')    # Blocked: 'V' isn't a number
__set__ is called
__set__ is called
__set__ is called
TypeError: Expected 'V' to be an int or float
c = Component('WIDGET', 'metal', 5)  # Allowed:  The inputs are valid
__set__ is called
__set__ is called
__set__ is called
c.name
__get__ is called
'WIDGET'

Technical tutorial

After the reading the introduction of this guide I assumed I would skip the technical tutorial, expecting it to be too technical, but after skimming it I’ve decided to go through it as it might clear some things up for me and the following line was attractive:

Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works.

Definition and introduction

Reiterating the important definition that a descriptor is anything that has one of the methods in the descriptor protocol:

In general, a descriptor is an attribute value that has one of the methods in the descriptor protocol. Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an attribute, it is said to be a descriptor.

And the main goal of descriptors:

The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.

Descriptor protocol

I don’t have any comments for this section other than reiterating the following points:

descr.__get__(self, obj, type=None)

descr.__set__(self, obj, value)

descr.__delete__(self, obj)

That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.

If an object defines __set__() or __delete__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are often used for methods but other uses are possible).

Overview of descriptor invocation

A descriptor can be called directly with desc.__get__(obj) or desc.__get__(None, cls).

But it is more common for a descriptor to be invoked automatically from attribute access.

We saw this earlier, but putting that example here again:

class Ten2:
    def __get__(self, obj, objtype=None):
        print(f"__get__ called with obj={obj}, objtype={objtype}")
        return 10

class A2:
    x = 5
    y = Ten2()  # Descriptor instance

a2 = A2()
a2.y
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10

Invocation from an instance

Instance lookup scans through a chain of namespaces giving data descriptors the highest priority, followed by instance variables, then non-data descriptors, then class variables, and lastly __getattr__() if it is provided.

I’ve added some print statements in their example code to show which option is triggered:

def find_name_in_mro(cls, name, default):
    "Emulate _PyType_Lookup() in Objects/typeobject.c"
    for base in cls.__mro__:
        if name in vars(base):
            return vars(base)[name]
    return default

def object_getattribute(obj, name):
    "Emulate PyObject_GenericGetAttr() in Objects/object.c"
    null = object()
    objtype = type(obj)
    cls_var = find_name_in_mro(objtype, name, null)
    descr_get = getattr(type(cls_var), '__get__', null)
    if descr_get is not null:
        if (hasattr(type(cls_var), '__set__')
            or hasattr(type(cls_var), '__delete__')):
            print("returning data descriptor set/delete")
            return descr_get(cls_var, obj, objtype)     # data descriptor
    if hasattr(obj, '__dict__') and name in vars(obj):
        print("returning instance variable")
        return vars(obj)[name]                          # instance variable
    if descr_get is not null:
        print("returning descr_get")
        return descr_get(cls_var, obj, objtype)         # non-data descriptor
    if cls_var is not null:
        print("returning class variable")
        return cls_var                                  # class variable
    raise AttributeError(name)
object_getattribute(a2, 'y')
returning descr_get
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10
object_getattribute(a2, 'x')
returning class variable
5
def getattr_hook(obj, name):
    "Emulate slot_tp_getattr_hook() in Objects/typeobject.c"
    try:
        print("__getattribute__")
        return obj.__getattribute__(name)
    except AttributeError:
        if not hasattr(type(obj), '__getattr__'):
            raise
    print("__getattr__")
    return type(obj).__getattr__(obj, name)
getattr_hook(a2, 'y')
__getattribute__
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10
getattr_hook(a2, 'x')
__getattribute__
5

Invocation from a class

The logic for a dotted lookup such as A.x is in type.__getattribute__().

A2.__getattribute__??
Signature:   A2.__getattribute__(*args, **kwargs)
Type:        wrapper_descriptor
String form: <slot wrapper '__getattribute__' of 'object' objects>
Docstring:   Return getattr(self, name).
A2.__getattribute__(A2, 'y')
<__main__.Ten2 at 0x78b2dee79310>
A2.__getattribute__(A2, 'x')
5

Invocation from super

A dotted lookup such as super(A, obj).m searches obj.__class__.__mro__ for the base class B immediately following A and then returns B.__dict__['m'].__get__(obj, A). If not a descriptor, m is returned unchanged.

class Base:
    z = Ten2()  # Descriptor in the base class

class A2(Base):
    x = 5
    y = Ten2()  # Descriptor instance in A2

    def show_super_lookup(self):
        # This will trigger the descriptor lookup through super()
        return super().z
a = A2()
a.y
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
super(A2, a).z
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
Base.__dict__['z'].__get__(a, A2)
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
a.__class__.__mro__
(__main__.A2, __main__.Base, object)

Summary of invocation logic

Showing examples of some of the bullet points in the summary:

  • Descriptors are invoked by the __getattribute__() method.
a.__getattribute__('y')
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
  • Overriding __getattribute__() prevents automatic descriptor calls because all the descriptor logic is in that method.
class MyDescriptor:
    def __get__(self, obj, objtype=None):
        print(f"Descriptor __get__ called!")
        return 42

class Normal:
    x = MyDescriptor()

n = Normal()
n.x
Descriptor __get__ called!
42
class OverrideGetattribute:
    x = MyDescriptor()
    y = 5

    def __getattribute__(self, name):
        print(f"Custom __getattribute__ called for {name}")
        if name == 'x':
            return "Bypassed descriptor"
        return object.__getattribute__(self, name)

o = OverrideGetattribute()
o.x
Custom __getattribute__ called for x
'Bypassed descriptor'
o.y
Custom __getattribute__ called for y
5
  • object.__getattribute__() and type.__getattribute__() make different calls to __get__(). The first includes the instance and may include the class. The second puts in None for the instance and always includes the class.
class DetailedDescriptor:
    def __get__(self, obj, objtype=None):
        print(f"__get__ called with obj={obj}, objtype={objtype}")
        return 42

class Normal:
    x = DetailedDescriptor()

n = Normal()
n.x
__get__ called with obj=<__main__.Normal object at 0x78b2dedf0750>, objtype=<class '__main__.Normal'>
42
Normal.x
__get__ called with obj=None, objtype=<class '__main__.Normal'>
42
  • Data descriptors always override instance dictionaries.
class DataDescriptor:
    def __init__(self, initial_value=None):
        self.value = initial_value

    def __get__(self, obj, objtype=None):
        print("DataDescriptor.__get__ called")
        return self.value

    def __set__(self, obj, value):
        print(f"DataDescriptor.__set__ called with value: {value}")
        self.value = value

class Example:
    x = DataDescriptor(42)  # Data descriptor defined in class

    def __init__(self):
        # Try to override with instance attribute
        self.__dict__['x'] = "Instance value"


example = Example()
example.__dict__
{'x': 'Instance value'}
example.x
DataDescriptor.__get__ called
42
example.x = 100
example.__dict__['x']
DataDescriptor.__set__ called with value: 100
'Instance value'
example.x
DataDescriptor.__get__ called
100
  • Non-data descriptors may be overridden by instance dictionaries.
class NonDataDescriptor:
    def __init__(self, initial_value=None):
        self.value = initial_value

    def __get__(self, obj, objtype=None):
        print("DataDescriptor.__get__ called")
        return self.value

class Example:
    x = NonDataDescriptor(42)  # Data descriptor defined in class

    def __init__(self):
        # Try to override with instance attribute
        self.__dict__['x'] = "Instance value"


example = Example()
example.__dict__
{'x': 'Instance value'}
example.x
'Instance value'

Automatic name notification

Sometimes it is desirable for a descriptor to know what class variable name it was assigned to. When a new class is created, the type metaclass scans the dictionary of the new class. If any of the entries are descriptors and if they define __set_name__(), that method is called with two arguments. The owner is the class where the descriptor is used, and the name is the class variable the descriptor was assigned to.

class NameTracker:
   def __set_name__(self, owner, name): self.name = name
class_dict = {
        'x': NameTracker(),
        'y': NameTracker(),
        'z': 5
    }
Demo = type('Demo', (), class_dict)
Demo.x.name
'x'
Demo.y.name
'y'

I’m skipping the ORM example since I don’t have access to the example database.

Pure Python Equivalents

Finally! The section I’m most interested in.

Properties, bound methods, static methods, class methods, and __slots__ are all based on the descriptor protocol.

I’m going to focus on the functions and methods section.

Functions and methods

Functions stored in class dictionaries get turned into methods when invoked. Methods only differ from regular functions in that the object instance is prepended to the other arguments. By convention, the instance is called self but could be called this or any other variable name.

Methods can be created manually with types.MethodType which is roughly equivalent to:

class MethodType:
    "Emulate PyMethod_Type in Objects/classobject.c"

    def __init__(self, func, obj):
        self.__func__ = func
        self.__self__ = obj

    def __call__(self, *args, **kwargs):
        func = self.__func__
        obj = self.__self__
        return func(obj, *args, **kwargs)

    def __getattribute__(self, name):
        "Emulate method_getset() in Objects/classobject.c"
        if name == '__doc__':
            return self.__func__.__doc__
        return object.__getattribute__(self, name)

    def __getattr__(self, name):
        "Emulate method_getattro() in Objects/classobject.c"
        return getattr(self.__func__, name)

    def __get__(self, obj, objtype=None):
        "Emulate method_descr_get() in Objects/classobject.c"
        return self

The key dunder method of interest is __call__:

def __call__(self, *args, **kwargs):
    func = self.__func__
    obj = self.__self__
    return func(obj, *args, **kwargs)

In the example of the self attention module, it has no positional arguments *args and so when I passed self_attn to the obj parameter in func(obj, *args, **kwargs) it understood it to be the first keyword argument.

The interesting behavior occurs during dotted access from an instance. The dotted lookup calls get() which returns a bound method object:

class D:
    def f(self):
         return self
d = D()
print(d.f)
<bound method D.f of <__main__.D object at 0x78b2dec54790>>

Internally, the bound method stores the underlying function and the bound instance:

print(d.f.__func__)
<function D.f at 0x78b2dedd3ba0>
print(d.f.__self__)
<__main__.D object at 0x78b2dec54790>

If you have ever wondered where self comes from in regular methods or where cls comes from in class methods, this is it!

Kinds of methods

Here’s the crux of what I was looking for:

To recap, functions have a __get__() method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms an obj.f(*args) call into f(obj, *args). Calling cls.f(*args) becomes f(*args).

If I call __get__(d) on d.f it creates a bound method which passes in the object as self, the first argument of a bound method.

print(d.f.__get__(d))
<bound method D.f of <__main__.D object at 0x78b2dec54790>>

Now when I call d.f.__get__(d)() I don’t need to explicitly pass in the object:

d.f.__get__(d)()
<__main__.D at 0x78b2dec54790>

Final Thoughts

Thanks to vibe coding, Claude introduced me to Python behavior I was unfamiliar with, and thanks to the excellent Python documentation, I understood it at a much deeper level than I was planning to.

I think something that still confuses me, and where I feel empathy for this poster, is how __get__ has special behavior for functions where it binds it to the given object.

In the Primer, initial examples of __get__ all, well, get a value:

def __get__(self, obj, objtype=None):
    print(f"__get__ called with obj={obj}, objtype={objtype}")
    return 10


def __get__(self, obj, objtype=None):
    return len(os.listdir(obj.dirname))


def __get__(self, obj, objtype=None):
    value = obj._age
    print(f'Accessing age giving {value}')
    return value

How that behavior is related to binding a function to an object is beyond my current understanding.

This poster’s response does make sense:

If descriptors were only callables that bind as methods when accessed as an attribute, then perhaps __bind__() would be a reasonable name for the method. But the descriptor protocol (i.e. __get__, __set__, and __delete__) is a means of implementing a computed attribute in general, which is not necessarily about binding a callable to the instance or type. For example, the __get__() method of a property named x might return the instance attribute _x.

So perhaps of a computed attributed is generalizable whether your using __get__ on a callable descriptor or otherwise. For a function, the “computation” of the attribute is binding it to the object.

I hope you enjoyed this blog post! I’m trying to grow my YouTube channel so please give that a look/subscribe.