I am trying to understand what Python\'s descriptors are and what they are useful for. I understand how they work, but here are my doubts. Consider the following co
I tried (with minor changes as suggested) the code from Andrew Cooke's answer. (I am running python 2.7).
The code:
#!/usr/bin/env python
class Celsius:
def __get__(self, instance, owner): return 9 * (instance.fahrenheit + 32) / 5.0
def __set__(self, instance, value): instance.fahrenheit = 32 + 5 * value / 9.0
class Temperature:
def __init__(self, initial_f): self.fahrenheit = initial_f
celsius = Celsius()
if __name__ == "__main__":
t = Temperature(212)
print(t.celsius)
t.celsius = 0
print(t.fahrenheit)
The result:
C:\Users\gkuhn\Desktop>python test2.py
<__main__.Celsius instance at 0x02E95A80>
212
With Python prior to 3, make sure you subclass from object which will make the descriptor work correctly as the get magic does not work for old style classes.
Why do I need the descriptor class?
It gives you extra control over how attributes work. If you're used to getters and setters in Java, for example, then it's Python's way of doing that. One advantage is that it looks to users just like an attribute (there's no change in syntax). So you can start with an ordinary attribute and then, when you need to do something fancy, switch to a descriptor.
An attribute is just a mutable value. A descriptor lets you execute arbitrary code when reading or setting (or deleting) a value. So you could imagine using it to map an attribute to a field in a database, for example – a kind of ORM.
Another use might be refusing to accept a new value by throwing an exception in __set__
– effectively making the "attribute" read only.
What is
instance
andowner
here? (in__get__
). What is the purpose of these parameters?
This is pretty subtle (and the reason I am writing a new answer here - I found this question while wondering the same thing and didn't find the existing answer that great).
A descriptor is defined on a class, but is typically called from an instance. When it's called from an instance both instance
and owner
are set (and you can work out owner
from instance
so it seems kinda pointless). But when called from a class, only owner
is set – which is why it's there.
This is only needed for __get__
because it's the only one that can be called on a class. If you set the class value you set the descriptor itself. Similarly for deletion. Which is why the owner
isn't needed there.
How would I call/use this example?
Well, here's a cool trick using similar classes:
class Celsius:
def __get__(self, instance, owner):
return 5 * (instance.fahrenheit - 32) / 9
def __set__(self, instance, value):
instance.fahrenheit = 32 + 9 * value / 5
class Temperature:
celsius = Celsius()
def __init__(self, initial_f):
self.fahrenheit = initial_f
t = Temperature(212)
print(t.celsius)
t.celsius = 0
print(t.fahrenheit)
(I'm using Python 3; for python 2 you need to make sure those divisions are / 5.0
and / 9.0
). That gives:
100.0
32.0
Now there are other, arguably better ways to achieve the same effect in python (e.g. if celsius were a property, which is the same basic mechanism but places all the source inside the Temperature class), but that shows what can be done...
Before going into the details of descriptors it may be important to know how attribute lookup in Python works. This assumes that the class has no metaclass and that it uses the default implementation of __getattribute__
(both can be used to "customize" the behavior).
The best illustration of attribute lookup (in Python 3.x or for new-style classes in Python 2.x) in this case is from Understanding Python metaclasses (ionel's codelog). The image uses :
as substitute for "non-customizable attribute lookup".
This represents the lookup of an attribute foobar
on an instance
of Class
:
Two conditions are important here:
instance
has an entry for the attribute name and it has __get__
and __set__
.instance
has no entry for the attribute name but the class has one and it has __get__
.That's where descriptors come into it:
__get__
and __set__
.__get__
.In both cases the returned value goes through __get__
called with the instance as first argument and the class as second argument.
The lookup is even more complicated for class attribute lookup (see for example Class attribute lookup (in the above mentioned blog)).
Let's move to your specific questions:
Why do I need the descriptor class?
In most cases you don't need to write descriptor classes! However you're probably a very regular end user. For example functions. Functions are descriptors, that's how functions can be used as methods with self
implicitly passed as first argument.
def test_function(self):
return self
class TestClass(object):
def test_method(self):
...
If you look up test_method
on an instance you'll get back a "bound method":
>>> instance = TestClass()
>>> instance.test_method
<bound method TestClass.test_method of <__main__.TestClass object at ...>>
Similarly you could also bind a function by invoking its __get__
method manually (not really recommended, just for illustrative purposes):
>>> test_function.__get__(instance, TestClass)
<bound method test_function of <__main__.TestClass object at ...>>
You can even call this "self-bound method":
>>> test_function.__get__(instance, TestClass)()
<__main__.TestClass at ...>
Note that I did not provide any arguments and the function did return the instance I had bound!
Functions are Non-data descriptors!
Some built-in examples of a data-descriptor would be property
. Neglecting getter
, setter
, and deleter
the property
descriptor is (from Descriptor HowTo Guide "Properties"):
class Property(object):
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
if doc is None and fget is not None:
doc = fget.__doc__
self.__doc__ = doc
def __get__(self, obj, objtype=None):
if obj is None:
return self
if self.fget is None:
raise AttributeError("unreadable attribute")
return self.fget(obj)
def __set__(self, obj, value):
if self.fset is None:
raise AttributeError("can't set attribute")
self.fset(obj, value)
def __delete__(self, obj):
if self.fdel is None:
raise AttributeError("can't delete attribute")
self.fdel(obj)
Since it's a data descriptor it's invoked whenever you look up the "name" of the property
and it simply delegates to the functions decorated with @property
, @name.setter
, and @name.deleter
(if present).
There are several other descriptors in the standard library, for example staticmethod
, classmethod
.
The point of descriptors is easy (although you rarely need them): Abstract common code for attribute access. property
is an abstraction for instance variable access, function
provides an abstraction for methods, staticmethod
provides an abstraction for methods that don't need instance access and classmethod
provides an abstraction for methods that need class access rather than instance access (this is a bit simplified).
Another example would be a class property.
One fun example (using __set_name__
from Python 3.6) could also be a property that only allows a specific type:
class TypedProperty(object):
__slots__ = ('_name', '_type')
def __init__(self, typ):
self._type = typ
def __get__(self, instance, klass=None):
if instance is None:
return self
return instance.__dict__[self._name]
def __set__(self, instance, value):
if not isinstance(value, self._type):
raise TypeError(f"Expected class {self._type}, got {type(value)}")
instance.__dict__[self._name] = value
def __delete__(self, instance):
del instance.__dict__[self._name]
def __set_name__(self, klass, name):
self._name = name
Then you can use the descriptor in a class:
class Test(object):
int_prop = TypedProperty(int)
And playing a bit with it:
>>> t = Test()
>>> t.int_prop = 10
>>> t.int_prop
10
>>> t.int_prop = 20.0
TypeError: Expected class <class 'int'>, got <class 'float'>
Or a "lazy property":
class LazyProperty(object):
__slots__ = ('_fget', '_name')
def __init__(self, fget):
self._fget = fget
def __get__(self, instance, klass=None):
if instance is None:
return self
try:
return instance.__dict__[self._name]
except KeyError:
value = self._fget(instance)
instance.__dict__[self._name] = value
return value
def __set_name__(self, klass, name):
self._name = name
class Test(object):
@LazyProperty
def lazy(self):
print('calculating')
return 10
>>> t = Test()
>>> t.lazy
calculating
10
>>> t.lazy
10
These are cases where moving the logic into a common descriptor might make sense, however one could also solve them (but maybe with repeating some code) with other means.
What is
instance
andowner
here? (in__get__
). What is the purpose of these parameters?
It depends on how you look up the attribute. If you look up the attribute on an instance then:
In case you look up the attribute on the class (assuming the descriptor is defined on the class):
None
So basically the third argument is necessary if you want to customize the behavior when you do class-level look-up (because the instance
is None
).
How would I call/use this example?
Your example is basically a property that only allows values that can be converted to float
and that is shared between all instances of the class (and on the class - although one can only use "read" access on the class otherwise you would replace the descriptor instance):
>>> t1 = Temperature()
>>> t2 = Temperature()
>>> t1.celsius = 20 # setting it on one instance
>>> t2.celsius # looking it up on another instance
20.0
>>> Temperature.celsius # looking it up on the class
20.0
That's why descriptors generally use the second argument (instance
) to store the value to avoid sharing it. However in some cases sharing a value between instances might be desired (although I cannot think of a scenario at this moment). However it makes practically no sense for a celsius property on a temperature class... except maybe as purely academic exercise.
Why do I need the descriptor class?
Inspired by Fluent Python by Buciano Ramalho
Imaging you have a class like this
class LineItem:
price = 10.9
weight = 2.1
def __init__(self, name, price, weight):
self.name = name
self.price = price
self.weight = weight
item = LineItem("apple", 2.9, 2.1)
item.price = -0.9 # it's price is negative, you need to refund to your customer even you delivered the apple :(
item.weight = -0.8 # negative weight, it doesn't make sense
We should validate the weight and price in avoid to assign them a negative number, we can write less code if we use descriptor as a proxy as this
class Quantity(object):
__index = 0
def __init__(self):
self.__index = self.__class__.__index
self._storage_name = "quantity#{}".format(self.__index)
self.__class__.__index += 1
def __set__(self, instance, value):
if value > 0:
setattr(instance, self._storage_name, value)
else:
raise ValueError('value should >0')
def __get__(self, instance, owner):
return getattr(instance, self._storage_name)
then define class LineItem like this:
class LineItem(object):
weight = Quantity()
price = Quantity()
def __init__(self, name, weight, price):
self.name = name
self.weight = weight
self.price = price
and we can extend the Quantity class to do more common validating
The descriptor is how Python's property
type is implemented. A descriptor simply implements __get__
, __set__
, etc. and is then added to another class in its definition (as you did above with the Temperature class). For example:
temp=Temperature()
temp.celsius #calls celsius.__get__
Accessing the property you assigned the descriptor to (celsius
in the above example) calls the appropriate descriptor method.
instance
in __get__
is the instance of the class (so above, __get__
would receive temp
, while owner
is the class with the descriptor (so it would be Temperature
).
You need to use a descriptor class to encapsulate the logic that powers it. That way, if the descriptor is used to cache some expensive operation (for example), it could store the value on itself and not its class.
An article about descriptors can be found here.
EDIT: As jchl pointed out in the comments, if you simply try Temperature.celsius
, instance
will be None
.
I am trying to understand what Python's descriptors are and what they can be useful for.
Descriptors are class attributes (like properties or methods) with any of the following special methods:
__get__
(non-data descriptor method, for example on a method/function)__set__
(data descriptor method, for example on a property instance)__delete__
(data descriptor method)These descriptor objects can be used as attributes on other object class definitions. (That is, they live in the __dict__
of the class object.)
Descriptor objects can be used to programmatically manage the results of a dotted lookup (e.g. foo.descriptor
) in a normal expression, an assignment, and even a deletion.
Functions/methods, bound methods, property
, classmethod
, and staticmethod
all use these special methods to control how they are accessed via the dotted lookup.
A data descriptor, like property
, can allow for lazy evaluation of attributes based on a simpler state of the object, allowing instances to use less memory than if you precomputed each possible attribute.
Another data descriptor, a member_descriptor
, created by __slots__, allow memory savings by allowing the class to store data in a mutable tuple-like datastructure instead of the more flexible but space-consuming __dict__
.
Non-data descriptors, usually instance, class, and static methods, get their implicit first arguments (usually named cls
and self
, respectively) from their non-data descriptor method, __get__
.
Most users of Python need to learn only the simple usage, and have no need to learn or understand the implementation of descriptors further.
A descriptor is an object with any of the following methods (__get__
, __set__
, or __delete__
), intended to be used via dotted-lookup as if it were a typical attribute of an instance. For an owner-object, obj_instance
, with a descriptor
object:
obj_instance.descriptor
invokes
descriptor.__get__(self, obj_instance, owner_class)
returning a value
This is how all methods and the get
on a property work.
obj_instance.descriptor = value
invokes
descriptor.__set__(self, obj_instance, value)
returning None
This is how the setter
on a property works.
del obj_instance.descriptor
invokes
descriptor.__delete__(self, obj_instance)
returning None
This is how the deleter
on a property works.
obj_instance
is the instance whose class contains the descriptor object's instance. self
is the instance of the descriptor (probably just one for the class of the obj_instance
)
To define this with code, an object is a descriptor if the set of its attributes intersects with any of the required attributes:
def has_descriptor_attrs(obj):
return set(['__get__', '__set__', '__delete__']).intersection(dir(obj))
def is_descriptor(obj):
"""obj can be instance of descriptor or the descriptor class"""
return bool(has_descriptor_attrs(obj))
A Data Descriptor has a __set__
and/or __delete__
.
A Non-Data-Descriptor has neither __set__
nor __delete__
.
def has_data_descriptor_attrs(obj):
return set(['__set__', '__delete__']) & set(dir(obj))
def is_data_descriptor(obj):
return bool(has_data_descriptor_attrs(obj))
classmethod
staticmethod
property
We can see that classmethod
and staticmethod
are Non-Data-Descriptors:
>>> is_descriptor(classmethod), is_data_descriptor(classmethod)
(True, False)
>>> is_descriptor(staticmethod), is_data_descriptor(staticmethod)
(True, False)
Both only have the __get__
method:
>>> has_descriptor_attrs(classmethod), has_descriptor_attrs(staticmethod)
(set(['__get__']), set(['__get__']))
Note that all functions are also Non-Data-Descriptors:
>>> def foo(): pass
...
>>> is_descriptor(foo), is_data_descriptor(foo)
(True, False)
property
However, property
is a Data-Descriptor:
>>> is_data_descriptor(property)
True
>>> has_descriptor_attrs(property)
set(['__set__', '__get__', '__delete__'])
These are important distinctions, as they affect the lookup order for a dotted lookup.
obj_instance.attribute
obj_instance
's __dict__
, then The consequence of this lookup order is that Non-Data-Descriptors like functions/methods can be overridden by instances.
We have learned that descriptors are objects with any of __get__
, __set__
, or __delete__
. These descriptor objects can be used as attributes on other object class definitions. Now we will look at how they are used, using your code as an example.
Here's your code, followed by your questions and answers to each:
class Celsius(object):
def __init__(self, value=0.0):
self.value = float(value)
def __get__(self, instance, owner):
return self.value
def __set__(self, instance, value):
self.value = float(value)
class Temperature(object):
celsius = Celsius()
- Why do I need the descriptor class?
Your descriptor ensures you always have a float for this class attribute of Temperature
, and that you can't use del
to delete the attribute:
>>> t1 = Temperature()
>>> del t1.celsius
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: __delete__
Otherwise, your descriptors ignore the owner-class and instances of the owner, instead, storing state in the descriptor. You could just as easily share state across all instances with a simple class attribute (so long as you always set it as a float to the class and never delete it, or are comfortable with users of your code doing so):
class Temperature(object):
celsius = 0.0
This gets you exactly the same behavior as your example (see response to question 3 below), but uses a Pythons builtin (property
), and would be considered more idiomatic:
class Temperature(object):
_celsius = 0.0
@property
def celsius(self):
return type(self)._celsius
@celsius.setter
def celsius(self, value):
type(self)._celsius = float(value)
- What is instance and owner here? (in get). What is the purpose of these parameters?
instance
is the instance of the owner that is calling the descriptor. The owner is the class in which the descriptor object is used to manage access to the data point. See the descriptions of the special methods that define descriptors next to the first paragraph of this answer for more descriptive variable names.
- How would I call/use this example?
Here's a demonstration:
>>> t1 = Temperature()
>>> t1.celsius
0.0
>>> t1.celsius = 1
>>>
>>> t1.celsius
1.0
>>> t2 = Temperature()
>>> t2.celsius
1.0
You can't delete the attribute:
>>> del t2.celsius
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: __delete__
And you can't assign a variable that can't be converted to a float:
>>> t1.celsius = '0x02'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in __set__
ValueError: invalid literal for float(): 0x02
Otherwise, what you have here is a global state for all instances, that is managed by assigning to any instance.
The expected way that most experienced Python programmers would accomplish this outcome would be to use the property
decorator, which makes use of the same descriptors under the hood, but brings the behavior into the implementation of the owner class (again, as defined above):
class Temperature(object):
_celsius = 0.0
@property
def celsius(self):
return type(self)._celsius
@celsius.setter
def celsius(self, value):
type(self)._celsius = float(value)
Which has the exact same expected behavior of the original piece of code:
>>> t1 = Temperature()
>>> t2 = Temperature()
>>> t1.celsius
0.0
>>> t1.celsius = 1.0
>>> t2.celsius
1.0
>>> del t1.celsius
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't delete attribute
>>> t1.celsius = '0x02'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in celsius
ValueError: invalid literal for float(): 0x02
We've covered the attributes that define descriptors, the difference between data- and non-data-descriptors, builtin objects that use them, and specific questions about use.
So again, how would you use the question's example? I hope you wouldn't. I hope you would start with my first suggestion (a simple class attribute) and move on to the second suggestion (the property decorator) if you feel it is necessary.