问题
Say a I have a dataclass in python3. I want to be able to hash and order these objects.
I only want them ordered/hashed on id.
I see in the docs that I can just implement __hash__ and all that but I'd like to get datacalsses to do the work for me because they are intended to handle this.
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
回答1:
From the docs:
Here are the rules governing implicit creation of a
__hash__()
method:[...]
If
eq
andfrozen
are both true, by defaultdataclass()
will generate a__hash__()
method for you. Ifeq
is true andfrozen
is false,__hash__()
will be set toNone
, marking it unhashable (which it is, since it is mutable). Ifeq
is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).
Since you set eq=True
and left frozen
at the default (False
), your dataclass is unhashable.
You have 3 options:
- Set
frozen=True
(in addition toeq=True
), which will make your class immutable and hashable. Set
unsafe_hash=True
, which will create a__hash__
method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:cat = Category('foo', 'bar') categories = {cat} cat.id = 'baz' print(cat in categories) # False
- Manually implement a
__hash__
method.
回答2:
TL;DR
Use frozen=True
in conjunction to eq=True
(which will make the instances immutable).
Long Answer
From the docs:
__hash__()
is used by built-inhash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a__hash__()
implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of__eq__()
, and the values of the eq and frozen flags in thedataclass()
decorator.By default,
dataclass()
will not implicitly add a__hash__()
method unless it is safe to do so. Neither will it add or change an existing explicitly defined__hash__()
method. Setting the class attribute__hash__ = None
has a specific meaning to Python, as described in the__hash__()
documentation.If
__hash__()
is not explicit defined, or if it is set to None, thendataclass()
may add an implicit__hash__()
method. Although not recommended, you can forcedataclass()
to create a__hash__()
method withunsafe_hash=True
. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.Here are the rules governing implicit creation of a
__hash__()
method. Note that you cannot both have an explicit__hash__()
method in your dataclass and setunsafe_hash=True
; this will result in aTypeError
.If eq and frozen are both true, by default
dataclass()
will generate a__hash__()
method for you. If eq is true and frozen is false,__hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).
回答3:
I'd like to add a special note for use of unsafe_hash.
You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they're in a set of unvisited nodes..).
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
This took me hours to figure out... Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations. https://docs.python.org/3/library/dataclasses.html
来源:https://stackoverflow.com/questions/52390576/how-can-i-make-a-python-dataclass-hashable