class Ten:
def __get__(self, obj, objtype=None):
return 10
Understanding Python Descriptors
__get__(obj)
method to convert a function to a bound method of the given object. This was necessary to monkey patch the self attention module’s forward pass to log input data types as register_forward_hook
only works on positional arguments (which LLaMA’s self attention module doesn’t have, it only has keyword arguments). This led me to do a deep dive into understanding descriptors with the helpful Descriptor Guide in the Python docs, which I walkthrough in this blog post.
Background
When monkey-patching the Llama self-attention forward pass (to log its inputs’ data type) I was vibe coding with Claude and it generated the following line to pass the necessary arguments to the original forward pass of the module:
__get__(self_attn, type(self_attn))(**kwargs) orig_forward.
In a prior iteration, I was using the following line suggested by Claude, with the intention of passing self_attn
as self
:
*args, **kwargs) orig_forward(self_attn,
This was essentially doing the following:
=hidden_states, attention_mask=attention_mask, ...) orig_forward(self_attn, hidden_states
Which caused the following error:
TypeError: LlamaFlashAttention2.forward() got multiple values for argument 'hidden_states'
self_attn
was being passed as the argument to the hidden_states
parameter, and then hidden_states=hidden_states
was again assigning an argument to the hidden_states
parameter. So how do we pass self_attn
as self
? This is where the __get__
method comes in which is part of the Python Descriptor. Descriptors are:
Any object which defines the methods
__get__()
,__set__()
, or__delete__()
. When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup. Normally, using a.b to get, set or delete an attribute looks up the object named b in the class dictionary for a, but if b is a descriptor, the respective descriptor method gets called. Understanding descriptors is a key to a deep understanding of Python because they are the basis for many features including functions, methods, properties, class methods, static methods, and reference to super classes.
After reading that a few times I still didn’t understand it! Though I think the key is:
When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup.
Claude explained it this way:
__get__
is a special method that converts a function into a bound method. It’s like saying “make this function a method of this object.”
Translating that to my use case: __get__
makes orig_forward
a method of self_attn
, no longer requiring us to pass self_attn
as it now is self
.
That certainly makes sense (i.e. I understand those words) but I don’t really understand why or how. That led me to the Python documentation’s Descriptor Guide which I’ll walk through here.
(There was also this interesting discussion about changing the name to __bind__
when calling it on a function as it binds the function as a method of the given object, which we’ll see later on).
Primer
Simple example: A descriptor that returns a constant
= Ten()
t t
<__main__.Ten at 0x78b2fd072c50>
type(t)
__main__.Ten
__get__(4) t.
10
I think the only reason Ten
is a descriptor is because it “defines the methods __get__()
, __set__()
, or __delete__()
”.
To use the descriptor, it must be stored as a class variable in another class:
class A:
= 5 # Regular class attribute
x = Ten() # Descriptor instance y
= A() # Make an instance of class A
a a
<__main__.A at 0x78b2fd0707d0>
# Normal attribute lookup a.x
5
# Descriptor lookup a.y
10
Note that the value 10 is not stored in either the class dictionary or the instance dictionary. Instead, the value 10 is computed on demand.
A.__dict__
mappingproxy({'__module__': '__main__',
'x': 5,
'y': <__main__.Ten at 0x78b2fd0722d0>,
'__dict__': <attribute '__dict__' of 'A' objects>,
'__weakref__': <attribute '__weakref__' of 'A' objects>,
'__doc__': None})
Modifying Ten
a bit to visualize this:
class Ten2:
def __get__(self, obj, objtype=None):
print(f"__get__ called with obj={obj}, objtype={objtype}")
return 10
class A2:
= 5
x = Ten2() # Descriptor instance y
= A2() a2
a2.y
__get__ called with obj=<__main__.A2 object at 0x78b2fd089710>, objtype=<class '__main__.A2'>
10
Cool!
Dynamic Lookups
import os
class DirectorySize:
def __get__(self, obj, objtype=None):
return len(os.listdir(obj.dirname))
class Directory:
= DirectorySize() # Descriptor instance
size
def __init__(self, dirname):
self.dirname = dirname # Regular instance attribute
= Directory('songs')
s = Directory('games') g
s.size
4
g.size
2
Removing a file then calling the descriptor’s __get__
dynamically calculates the new value:
'games/game1.txt') # Delete a game
os.remove( g.size
1
Managed attributes
The descriptor is assigned to a public attribute in the class dictionary while the actual data is stored as a private attribute in the instance dictionary.
Note that I wasn’t able to see the logging output in this notebook so I’m using print statements instead.
class LoggedAgeAccess:
def __get__(self, obj, objtype=None):
= obj._age
value print(f'Accessing age giving {value}')
return value
def __set__(self, obj, value):
print(f'Updating age to {value}')
= value
obj._age
class Person:
= LoggedAgeAccess() # Descriptor instance
age
def __init__(self, name, age):
self.name = name # Regular instance attribute
self.age = age # Calls __set__()
def birthday(self):
self.age += 1 # Calls both __get__() and __set__()
= Person('Mary M', 30) # The initial age update is logged
mary = Person('David D', 40) dave
Updating age to 30
Updating age to 40
vars(mary), vars(dave)
({'name': 'Mary M', '_age': 30}, {'name': 'David D', '_age': 40})
mary.age
Accessing age giving 30
30
mary.birthday()
Accessing age giving 30
Updating age to 31
mary.age
Accessing age giving 31
31
dave.name
'David D'
dave.age
Accessing age giving 40
40
Customized names
When a class uses descriptors, it can inform each descriptor about which variable name was used.
class LoggedAccess:
def __set_name__(self, owner, name):
self.public_name = name
self.private_name = '_' + name
def __get__(self, obj, objtype=None):
= getattr(obj, self.private_name)
value print(f'Accessing {self.public_name} giving {value}')
return value
def __set__(self, obj, value):
print(f'Updating {self.public_name} to {value}')
setattr(obj, self.private_name, value)
class Person:
= LoggedAccess() # First descriptor instance
name = LoggedAccess() # Second descriptor instance
age
def __init__(self, name, age):
self.name = name # Calls the first descriptor
self.age = age # Calls the second descriptor
def birthday(self):
self.age += 1
vars(Person)['name']
<__main__.LoggedAccess at 0x78b2edeb8950>
vars(vars(Person)['name'])
{'public_name': 'name', 'private_name': '_name'}
vars(vars(Person)['age'])
{'public_name': 'age', 'private_name': '_age'}
= Person('Peter P', 10) pete
Updating name to Peter P
Updating age to 10
= Person('Catherine C', 20) kate
Updating name to Catherine C
Updating age to 20
vars(pete)
{'_name': 'Peter P', '_age': 10}
vars(kate)
{'_name': 'Catherine C', '_age': 20}
I think the main takeaway here is that we didn’t specify the name of the field so we could use the same descriptor for both name
and age
.
Closing thoughts
Looking at how __set_name__
behaves (the example in the docs):
class C:
def __set_name__(self, owner, name):
print(f"__set_name__ called with owner={owner.__name__}, name='{name}'")
self.name = name
class A:
= C() # This will trigger __set_name__
x = C() # This will trigger it again with a different name
y = C() bananas
__set_name__ called with owner=A, name='x'
__set_name__ called with owner=A, name='y'
__set_name__ called with owner=A, name='bananas'
= A()
a a.x, a.y, a.x.name, a.y.name, a.bananas.name
(<__main__.C at 0x78b331674190>,
<__main__.C at 0x78b2df52ccd0>,
'x',
'y',
'bananas')
The part of particular interest to me is:
Descriptors are used throughout the language. It is how functions turn into bound methods.
Complete practical example
Validator class
A validator is a descriptor for managed attribute access. Prior to storing any data, it verifies that the new value meets various type and range restrictions. If those restrictions aren’t met, it raises an exception to prevent data corruption at its source.
from abc import ABC, abstractmethod
class Validator(ABC):
def __set_name__(self, owner, name):
print("__set_name__ is called")
self.private_name = '_' + name
def __get__(self, obj, objtype=None):
print("__get__ is called")
return getattr(obj, self.private_name)
def __set__(self, obj, value):
print("__set__ is called")
self.validate(value)
setattr(obj, self.private_name, value)
@abstractmethod
def validate(self, value):
print("validate is called")
pass
Custom validators
Here are three practical data validation utilities:
OneOf
verifies that a value is one of a restricted set of options.
Number
verifies that a value is either an int or float. Optionally, it verifies that a value is between a given minimum or maximum.
String
verifies that a value is a str. Optionally, it validates a given minimum or maximum length. It can validate a user-defined predicate as well.
class OneOf(Validator):
def __init__(self, *options):
self.options = set(options)
def validate(self, value):
if value not in self.options:
raise ValueError(
f'Expected {value!r} to be one of {self.options!r}'
)
class Number(Validator):
def __init__(self, minvalue=None, maxvalue=None):
self.minvalue = minvalue
self.maxvalue = maxvalue
def validate(self, value):
if not isinstance(value, (int, float)):
raise TypeError(f'Expected {value!r} to be an int or float')
if self.minvalue is not None and value < self.minvalue:
raise ValueError(
f'Expected {value!r} to be at least {self.minvalue!r}'
)if self.maxvalue is not None and value > self.maxvalue:
raise ValueError(
f'Expected {value!r} to be no more than {self.maxvalue!r}'
)
class String(Validator):
def __init__(self, minsize=None, maxsize=None, predicate=None):
self.minsize = minsize
self.maxsize = maxsize
self.predicate = predicate
def validate(self, value):
if not isinstance(value, str):
raise TypeError(f'Expected {value!r} to be an str')
if self.minsize is not None and len(value) < self.minsize:
raise ValueError(
f'Expected {value!r} to be no smaller than {self.minsize!r}'
)if self.maxsize is not None and len(value) > self.maxsize:
raise ValueError(
f'Expected {value!r} to be no bigger than {self.maxsize!r}'
)if self.predicate is not None and not self.predicate(value):
raise ValueError(
f'Expected {self.predicate} to be true for {value!r}'
)
Practical application
class Component:
= String(minsize=3, maxsize=10, predicate=str.isupper)
name = OneOf('wood', 'metal', 'plastic')
kind = Number(minvalue=0)
quantity
def __init__(self, name, kind, quantity):
self.name = name
self.kind = kind
self.quantity = quantity
__set_name__ is called
__set_name__ is called
__set_name__ is called
The descriptors prevent invalid instances from being created:
'Widget', 'metal', 5) # Blocked: 'Widget' is not all uppercase Component(
__set__ is called
ValueError: Expected <method 'isupper' of 'str' objects> to be true for 'Widget'
'WIDGET', 'metle', 5) # Blocked: 'metle' is misspelled Component(
__set__ is called
__set__ is called
ValueError: Expected 'metle' to be one of {'metal', 'plastic', 'wood'}
'WIDGET', 'metal', -5) # Blocked: -5 is negative Component(
__set__ is called
__set__ is called
__set__ is called
ValueError: Expected -5 to be at least 0
'WIDGET', 'metal', 'V') # Blocked: 'V' isn't a number Component(
__set__ is called
__set__ is called
__set__ is called
TypeError: Expected 'V' to be an int or float
= Component('WIDGET', 'metal', 5) # Allowed: The inputs are valid c
__set__ is called
__set__ is called
__set__ is called
c.name
__get__ is called
'WIDGET'
Technical tutorial
After the reading the introduction of this guide I assumed I would skip the technical tutorial, expecting it to be too technical, but after skimming it I’ve decided to go through it as it might clear some things up for me and the following line was attractive:
Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works.
Definition and introduction
Reiterating the important definition that a descriptor is anything that has one of the methods in the descriptor protocol:
In general, a descriptor is an attribute value that has one of the methods in the descriptor protocol. Those methods are
__get__()
,__set__()
, and__delete__()
. If any of those methods are defined for an attribute, it is said to be a descriptor.
And the main goal of descriptors:
The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.
Descriptor protocol
I don’t have any comments for this section other than reiterating the following points:
descr.__get__(self, obj, type=None)
descr.__set__(self, obj, value)
descr.__delete__(self, obj)
That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute.
If an object defines
__set__()
or__delete__()
, it is considered a data descriptor. Descriptors that only define__get__()
are called non-data descriptors (they are often used for methods but other uses are possible).
Overview of descriptor invocation
A descriptor can be called directly with
desc.__get__(obj)
ordesc.__get__(None, cls)
.
But it is more common for a descriptor to be invoked automatically from attribute access.
We saw this earlier, but putting that example here again:
class Ten2:
def __get__(self, obj, objtype=None):
print(f"__get__ called with obj={obj}, objtype={objtype}")
return 10
class A2:
= 5
x = Ten2() # Descriptor instance
y
= A2()
a2 a2.y
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10
Invocation from an instance
Instance lookup scans through a chain of namespaces giving data descriptors the highest priority, followed by instance variables, then non-data descriptors, then class variables, and lastly
__getattr__()
if it is provided.
I’ve added some print statements in their example code to show which option is triggered:
def find_name_in_mro(cls, name, default):
"Emulate _PyType_Lookup() in Objects/typeobject.c"
for base in cls.__mro__:
if name in vars(base):
return vars(base)[name]
return default
def object_getattribute(obj, name):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
= object()
null = type(obj)
objtype = find_name_in_mro(objtype, name, null)
cls_var = getattr(type(cls_var), '__get__', null)
descr_get if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')):
print("returning data descriptor set/delete")
return descr_get(cls_var, obj, objtype) # data descriptor
if hasattr(obj, '__dict__') and name in vars(obj):
print("returning instance variable")
return vars(obj)[name] # instance variable
if descr_get is not null:
print("returning descr_get")
return descr_get(cls_var, obj, objtype) # non-data descriptor
if cls_var is not null:
print("returning class variable")
return cls_var # class variable
raise AttributeError(name)
'y') object_getattribute(a2,
returning descr_get
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10
'x') object_getattribute(a2,
returning class variable
5
def getattr_hook(obj, name):
"Emulate slot_tp_getattr_hook() in Objects/typeobject.c"
try:
print("__getattribute__")
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__'):
raise
print("__getattr__")
return type(obj).__getattr__(obj, name)
'y') getattr_hook(a2,
__getattribute__
__get__ called with obj=<__main__.A2 object at 0x78b2ded96890>, objtype=<class '__main__.A2'>
10
'x') getattr_hook(a2,
__getattribute__
5
Invocation from a class
The logic for a dotted lookup such as
A.x
is intype.__getattribute__()
.
__getattribute__?? A2.
Signature: A2.__getattribute__(*args, **kwargs)
Type: wrapper_descriptor
String form: <slot wrapper '__getattribute__' of 'object' objects>
Docstring: Return getattr(self, name).
__getattribute__(A2, 'y') A2.
<__main__.Ten2 at 0x78b2dee79310>
__getattribute__(A2, 'x') A2.
5
Invocation from super
A dotted lookup such as
super(A, obj).m
searchesobj.__class__.__mro__
for the base classB
immediately followingA
and then returnsB.__dict__['m'].__get__(obj, A)
. If not a descriptor,m
is returned unchanged.
class Base:
= Ten2() # Descriptor in the base class
z
class A2(Base):
= 5
x = Ten2() # Descriptor instance in A2
y
def show_super_lookup(self):
# This will trigger the descriptor lookup through super()
return super().z
= A2()
a a.y
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
super(A2, a).z
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
'z'].__get__(a, A2) Base.__dict__[
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
a.__class__.__mro__
(__main__.A2, __main__.Base, object)
Summary of invocation logic
Showing examples of some of the bullet points in the summary:
- Descriptors are invoked by the
__getattribute__()
method.
__getattribute__('y') a.
__get__ called with obj=<__main__.A2 object at 0x78b2dededa90>, objtype=<class '__main__.A2'>
10
- Overriding
__getattribute__()
prevents automatic descriptor calls because all the descriptor logic is in that method.
class MyDescriptor:
def __get__(self, obj, objtype=None):
print(f"Descriptor __get__ called!")
return 42
class Normal:
= MyDescriptor()
x
= Normal()
n n.x
Descriptor __get__ called!
42
class OverrideGetattribute:
= MyDescriptor()
x = 5
y
def __getattribute__(self, name):
print(f"Custom __getattribute__ called for {name}")
if name == 'x':
return "Bypassed descriptor"
return object.__getattribute__(self, name)
= OverrideGetattribute()
o o.x
Custom __getattribute__ called for x
'Bypassed descriptor'
o.y
Custom __getattribute__ called for y
5
object.__getattribute__()
andtype.__getattribute__()
make different calls to__get__()
. The first includes the instance and may include the class. The second puts inNone
for the instance and always includes the class.
class DetailedDescriptor:
def __get__(self, obj, objtype=None):
print(f"__get__ called with obj={obj}, objtype={objtype}")
return 42
class Normal:
= DetailedDescriptor()
x
= Normal() n
n.x
__get__ called with obj=<__main__.Normal object at 0x78b2dedf0750>, objtype=<class '__main__.Normal'>
42
Normal.x
__get__ called with obj=None, objtype=<class '__main__.Normal'>
42
- Data descriptors always override instance dictionaries.
class DataDescriptor:
def __init__(self, initial_value=None):
self.value = initial_value
def __get__(self, obj, objtype=None):
print("DataDescriptor.__get__ called")
return self.value
def __set__(self, obj, value):
print(f"DataDescriptor.__set__ called with value: {value}")
self.value = value
class Example:
= DataDescriptor(42) # Data descriptor defined in class
x
def __init__(self):
# Try to override with instance attribute
self.__dict__['x'] = "Instance value"
= Example() example
example.__dict__
{'x': 'Instance value'}
example.x
DataDescriptor.__get__ called
42
= 100
example.x 'x'] example.__dict__[
DataDescriptor.__set__ called with value: 100
'Instance value'
example.x
DataDescriptor.__get__ called
100
- Non-data descriptors may be overridden by instance dictionaries.
class NonDataDescriptor:
def __init__(self, initial_value=None):
self.value = initial_value
def __get__(self, obj, objtype=None):
print("DataDescriptor.__get__ called")
return self.value
class Example:
= NonDataDescriptor(42) # Data descriptor defined in class
x
def __init__(self):
# Try to override with instance attribute
self.__dict__['x'] = "Instance value"
= Example() example
example.__dict__
{'x': 'Instance value'}
example.x
'Instance value'
Automatic name notification
Sometimes it is desirable for a descriptor to know what class variable name it was assigned to. When a new class is created, the
type
metaclass scans the dictionary of the new class. If any of the entries are descriptors and if they define__set_name__()
, that method is called with two arguments. The owner is the class where the descriptor is used, and the name is the class variable the descriptor was assigned to.
class NameTracker:
def __set_name__(self, owner, name): self.name = name
= {
class_dict 'x': NameTracker(),
'y': NameTracker(),
'z': 5
}
= type('Demo', (), class_dict) Demo
Demo.x.name
'x'
Demo.y.name
'y'
I’m skipping the ORM example since I don’t have access to the example database.
Pure Python Equivalents
Finally! The section I’m most interested in.
Properties, bound methods, static methods, class methods, and
__slots__
are all based on the descriptor protocol.
I’m going to focus on the functions and methods section.
Functions and methods
Functions stored in class dictionaries get turned into methods when invoked. Methods only differ from regular functions in that the object instance is prepended to the other arguments. By convention, the instance is called self but could be called this or any other variable name.
Methods can be created manually with types.MethodType which is roughly equivalent to:
class MethodType:
"Emulate PyMethod_Type in Objects/classobject.c"
def __init__(self, func, obj):
self.__func__ = func
self.__self__ = obj
def __call__(self, *args, **kwargs):
= self.__func__
func = self.__self__
obj return func(obj, *args, **kwargs)
def __getattribute__(self, name):
"Emulate method_getset() in Objects/classobject.c"
if name == '__doc__':
return self.__func__.__doc__
return object.__getattribute__(self, name)
def __getattr__(self, name):
"Emulate method_getattro() in Objects/classobject.c"
return getattr(self.__func__, name)
def __get__(self, obj, objtype=None):
"Emulate method_descr_get() in Objects/classobject.c"
return self
The key dunder method of interest is __call
__:
def __call__(self, *args, **kwargs):
= self.__func__
func = self.__self__
obj return func(obj, *args, **kwargs)
In the example of the self attention module, it has no positional arguments *args
and so when I passed self_attn
to the obj
parameter in func(obj, *args, **kwargs)
it understood it to be the first keyword argument.
The interesting behavior occurs during dotted access from an instance. The dotted lookup calls get() which returns a bound method object:
class D:
def f(self):
return self
= D()
d print(d.f)
<bound method D.f of <__main__.D object at 0x78b2dec54790>>
Internally, the bound method stores the underlying function and the bound instance:
print(d.f.__func__)
<function D.f at 0x78b2dedd3ba0>
print(d.f.__self__)
<__main__.D object at 0x78b2dec54790>
If you have ever wondered where
self
comes from in regular methods or wherecls
comes from in class methods, this is it!
Kinds of methods
Here’s the crux of what I was looking for:
To recap, functions have a
__get__()
method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms anobj.f(*args
) call intof(obj, *args)
. Callingcls.f(*args)
becomesf(*args)
.
If I call __get__(d)
on d.f
it creates a bound method which passes in the object as self
, the first argument of a bound method.
print(d.f.__get__(d))
<bound method D.f of <__main__.D object at 0x78b2dec54790>>
Now when I call d.f.__get__(d)()
I don’t need to explicitly pass in the object:
__get__(d)() d.f.
<__main__.D at 0x78b2dec54790>
Final Thoughts
Thanks to vibe coding, Claude introduced me to Python behavior I was unfamiliar with, and thanks to the excellent Python documentation, I understood it at a much deeper level than I was planning to.
I think something that still confuses me, and where I feel empathy for this poster, is how __get__
has special behavior for functions where it binds it to the given object.
In the Primer, initial examples of __get__
all, well, get a value:
def __get__(self, obj, objtype=None):
print(f"__get__ called with obj={obj}, objtype={objtype}")
return 10
def __get__(self, obj, objtype=None):
return len(os.listdir(obj.dirname))
def __get__(self, obj, objtype=None):
= obj._age
value print(f'Accessing age giving {value}')
return value
How that behavior is related to binding a function to an object is beyond my current understanding.
This poster’s response does make sense:
If descriptors were only callables that bind as methods when accessed as an attribute, then perhaps
__bind__()
would be a reasonable name for the method. But the descriptor protocol (i.e.__get__
,__set__
, and__delete__
) is a means of implementing a computed attribute in general, which is not necessarily about binding a callable to the instance or type. For example, the__get__()
method of a property namedx
might return the instance attribute_x
.
So perhaps of a computed attributed is generalizable whether your using __get__
on a callable descriptor or otherwise. For a function, the “computation” of the attribute is binding it to the object.
I hope you enjoyed this blog post! I’m trying to grow my YouTube channel so please give that a look/subscribe.