问题
I have the following class
@dataclass_json
@dataclass
class Source:
type: str =None
label: str =None
path: str = None
and the two subclasses:
@dataclass_json
@dataclass
class Csv(Source):
csv_path: str=None
delimiter: str=';'
and
@dataclass_json
@dataclass
class Parquet(Source):
parquet_path: str=None
Given now the dictionary:
parquet={type: 'Parquet', label: 'events', path: '/.../test.parquet', parquet_path: '../../result.parquet'}
csv={type: 'Csv', label: 'events', path: '/.../test.csv', csv_path: '../../result.csv', delimiter:','}
Now I would like to do something like
Source().from_dict(csv)
and that the output will be the class Csv or Parquet. I understand that if you initiate the class source you just "upload" the parameters with the method "from dict", but is there any posibility in doing this by some type of inheritence without using a "Constructor" which makes a if-else if-else over all possible 'types'?
Pureconfig, a Scala Library, creates different case classes when the attribute 'type' has the name of the desired subclass. In Python this is possible?
回答1:
You can build a helper that picks and instantiates the appropriate subclass.
def from_data(data: dict, tp: type):
"""Create the subtype of ``tp`` for the given ``data``"""
subtype = [
stp for stp in tp.__subclasses__() # look through all subclasses...
if stp.__name__ == data['type'] # ...and select by type name
][0]
return subtype(**data) # instantiate the subtype
This can be called with your data and the base class from which to select:
>>> from_data(
... {'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','},
... Source,
... )
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
If you need to run this often, it is worth building a dict
to optimise the subtype lookup. A simple means is to add a method to your base class, and store the lookup there:
@dataclass_json
@dataclass
class Source:
type: str =None
label: str =None
path: str = None
@classmethod
def from_data(cls, data: dict):
if not hasattr(cls, '_lookup'):
cls._lookup = {stp.__name__: stp for stp in cls.__subclasses__()}
return cls._lookup[data["type"]](**data)
This can be called directly on the base class:
>>> Source.from_data({'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','})
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
回答2:
This is a variation on my answer to this question.
@dataclass_json
@dataclass
class Source:
type: str = None
label: str = None
path: str = None
def __new__(cls, type=None, **kwargs):
for subclass in cls.__subclasses__():
if subclass.__name__ == type:
break
else:
subclass = cls
instance = super(Source, subclass).__new__(subclass)
return instance
assert type(Source(**csv)) == Csv
assert type(Source(**parquet)) == Parquet
assert Csv(**csv) == Source(**csv)
assert Parquet(**parquet) == Source(**parquet)
You asked and I am happy to oblige. However, I'm questioning whether this is really what you need. I think it might be overkill for your situation. I originally figured this trick out so I could instantiate directly from data when...
- my data was heterogeneous and I didn't know ahead of time which subclass was appropriate for each datum,
- I didn't have control over the data, and
- figuring out which subclass to use required some processing of the data, processing which I felt belonged inside the class (for logical reasons as well as to avoid polluting the scope in which the instantiating took place).
If those conditions apply to your situation, then I think this is a worth-while approach. If not, the added complexity of mucking with __new__
-- a moderately advanced maneuver -- might not outweigh the savings in complexity in the code used to instantiate. There are probably simpler alternatives.
For example, it appears as though you already know which subclass you need; it's one of the fields in the data. If you put it there, presumably whatever logic you wrote to do so could be used to instantiate the appropriate subclass right then and there, bypassing the need for my solution. Alternatively, instead of storing the name of the subclass as a string, store the subclass itself. Then you could do this: data['type'](**data)
It also occurs to me that maybe you don't need inheritance at all. Do Csv
and Parquet
store the same type of data, differing only in which file format they read it from? Then maybe you just need one class with from_csv
and from_parquet
methods. Alternatively, if one of the parameters is a filename, it would be easy to figure out which type of file parsing you need based on the filename extension. Normally I'd put this in __init__
, but since you're using dataclass
, I guess this would happen in __post_init__
.
来源:https://stackoverflow.com/questions/61339788/dict-attribute-type-to-select-subclass-of-dataclass