Validating detailed types in python dataclasses

前端 未结 3 1055
日久生厌
日久生厌 2020-11-27 03:26

Python 3.7 was released a while ago, and I wanted to test some of the fancy new dataclass+typing features. Getting hints to work right is easy enough, with both

相关标签:
3条回答
  • 2020-11-27 03:50

    Instead of checking for type equality, you should use isinstance. But you cannot use a parametrized generic type (typing.List[int]) to do so, you must use the "generic" version (typing.List). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__ attribute that you can use for that.

    Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__ attribute. Compare:

    # Python 3.6
    >>> import typing
    >>> typing.List.__origin__
    >>> typing.List[int].__origin__
    typing.List
    

    and

    # Python 3.7
    >>> import typing
    >>> typing.List.__origin__
    <class 'list'>
    >>> typing.List[int].__origin__
    <class 'list'>
    

    Python 3.8 introduce even better support with the typing.get_origin() introspection function:

    # Python 3.8
    >>> import typing
    >>> typing.get_origin(typing.List)
    <class 'list'>
    >>> typing.get_origin(typing.List[int])
    <class 'list'>
    

    Notable exceptions being typing.Any, typing.Union and typing.ClassVar… Well, anything that is a typing._SpecialForm does not define __origin__. Fortunately:

    >>> isinstance(typing.Union, typing._SpecialForm)
    True
    >>> isinstance(typing.Union[int, str], typing._SpecialForm)
    False
    >>> typing.get_origin(typing.Union[int, str])
    typing.Union
    

    But parametrized types define an __args__ attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args() function to retrieve them:

    # Python 3.7
    >>> typing.Union[int, str].__args__
    (<class 'int'>, <class 'str'>)
    
    # Python 3.8
    >>> typing.get_args(typing.Union[int, str])
    (<class 'int'>, <class 'str'>)
    

    So we can improve type checking a bit:

    for field_name, field_def in self.__dataclass_fields__.items():
        if isinstance(field_def.type, typing._SpecialForm):
            # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
            continue
        try:
            actual_type = field_def.type.__origin__
        except AttributeError:
            # In case of non-typing types (such as <class 'int'>, for instance)
            actual_type = field_def.type
        # In Python 3.8 one would replace the try/except with
        # actual_type = typing.get_origin(field_def.type) or field_def.type
        if isinstance(actual_type, typing._SpecialForm):
            # case of typing.Union[…] or typing.ClassVar[…]
            actual_type = field_def.type.__args__
    
        actual_value = getattr(self, field_name)
        if not isinstance(actual_value, actual_type):
            print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")
            ret = False
    

    This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]] or typing.Optional[typing.List[int]] for instance, but it should get things started.


    Next is the way to apply this check.

    Instead of using __post_init__, I would go the decorator route: this could be used on anything with type hints, not only dataclasses:

    import inspect
    import typing
    from contextlib import suppress
    from functools import wraps
    
    
    def enforce_types(callable):
        spec = inspect.getfullargspec(callable)
    
        def check_types(*args, **kwargs):
            parameters = dict(zip(spec.args, args))
            parameters.update(kwargs)
            for name, value in parameters.items():
                with suppress(KeyError):  # Assume un-annotated parameters can be any type
                    type_hint = spec.annotations[name]
                    if isinstance(type_hint, typing._SpecialForm):
                        # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
                        continue
                    try:
                        actual_type = type_hint.__origin__
                    except AttributeError:
                        # In case of non-typing types (such as <class 'int'>, for instance)
                        actual_type = type_hint
                    # In Python 3.8 one would replace the try/except with
                    # actual_type = typing.get_origin(type_hint) or type_hint
                    if isinstance(actual_type, typing._SpecialForm):
                        # case of typing.Union[…] or typing.ClassVar[…]
                        actual_type = type_hint.__args__
    
                    if not isinstance(value, actual_type):
                        raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))
    
        def decorate(func):
            @wraps(func)
            def wrapper(*args, **kwargs):
                check_types(*args, **kwargs)
                return func(*args, **kwargs)
            return wrapper
    
        if inspect.isclass(callable):
            callable.__init__ = decorate(callable.__init__)
            return callable
    
        return decorate(callable)
    

    Usage being:

    @enforce_types
    @dataclasses.dataclass
    class Point:
        x: float
        y: float
    
    @enforce_types
    def foo(bar: typing.Union[int, str]):
        pass
    

    Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:

    • type hints using strings (class Foo: def __init__(self: 'Foo'): pass) are not taken into account by inspect.getfullargspec: you may want to use typing.get_type_hints and inspect.signature instead;
    • a default value which is not the appropriate type is not validated:

      @enforce_type
      def foo(bar: int = None):
          pass
      
      foo()
      

      does not raise any TypeError. You may want to use inspect.Signature.bind in conjuction with inspect.BoundArguments.apply_defaults if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None));

    • variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping) and, as said at the beginning, we can only validate containers and not contained objects.

    Update

    After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing module and will propose a few findings and a new approach here.

    For starter, typing is doing a great job in finding when an argument is optional:

    >>> def foo(a: int, b: str, c: typing.List[str] = None):
    ...   pass
    ... 
    >>> typing.get_type_hints(foo)
    {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}
    

    This is pretty neat and definitely an improvement over inspect.getfullargspec, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints will bail out for other kind of default values:

    >>> def foo(a: int, b: str, c: typing.List[str] = 3):
    ...   pass
    ... 
    >>> typing.get_type_hints(foo)
    {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}
    

    So you may still need extra strict checking, even though such cases feels very fishy.

    Next is the case of typing hints used as arguments for typing._SpecialForm, such as typing.Optional[typing.List[str]] or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]. Since the __args__ of these typing._SpecialForms is always a tuple, it is possible to recursively find the __origin__ of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm left.

    Proposed improvements:

    import inspect
    import typing
    from functools import wraps
    
    
    def _find_type_origin(type_hint):
        if isinstance(type_hint, typing._SpecialForm):
            # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,
            # typing.NoReturn, typing.Optional, or typing.Union without parameters
            yield typing.Any
            return
    
        actual_type = typing.get_origin(type_hint) or type_hint  # requires Python 3.8
        if isinstance(actual_type, typing._SpecialForm):
            # case of typing.Union[…] or typing.ClassVar[…] or …
            for origins in map(_find_type_origin, typing.get_args(type_hint)):
                yield from origins
        else:
            yield actual_type
    
    
    def _check_types(parameters, hints):
        for name, value in parameters.items():
            type_hint = hints.get(name, typing.Any)
            actual_types = tuple(
                    origin
                    for origin in _find_type_origin(type_hint)
                    if origin is not typing.Any
            )
            if actual_types and not isinstance(value, actual_types):
                raise TypeError(
                        f"Expected type '{type_hint}' for argument '{name}'"
                        f" but received type '{type(value)}' instead"
                )
    
    
    def enforce_types(callable):
        def decorate(func):
            hints = typing.get_type_hints(func)
            signature = inspect.signature(func)
    
            @wraps(func)
            def wrapper(*args, **kwargs):
                parameters = dict(zip(signature.parameters, args))
                parameters.update(kwargs)
                _check_types(parameters, hints)
    
                return func(*args, **kwargs)
            return wrapper
    
        if inspect.isclass(callable):
            callable.__init__ = decorate(callable.__init__)
            return callable
    
        return decorate(callable)
    
    
    def enforce_strict_types(callable):
        def decorate(func):
            hints = typing.get_type_hints(func)
            signature = inspect.signature(func)
    
            @wraps(func)
            def wrapper(*args, **kwargs):
                bound = signature.bind(*args, **kwargs)
                bound.apply_defaults()
                parameters = dict(zip(signature.parameters, bound.args))
                parameters.update(bound.kwargs)
                _check_types(parameters, hints)
    
                return func(*args, **kwargs)
            return wrapper
    
        if inspect.isclass(callable):
            callable.__init__ = decorate(callable.__init__)
            return callable
    
        return decorate(callable)
    

    Thanks to @Aran-Fey that helped me improve this answer.

    0 讨论(0)
  • 2020-11-27 03:54

    For typing aliases, you must separately check the annotation. I did like this: https://github.com/EvgeniyBurdin/validated_dc

    0 讨论(0)
  • 2020-11-27 04:06

    Just found this question.

    pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)

    Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.

    from datetime import datetime
    from pydantic.dataclasses import dataclass
    
    @dataclass
    class User:
        id: int
        name: str = 'John Doe'
        signup_ts: datetime = None
    
    print(User(id=42, signup_ts='2032-06-21T12:00'))
    """
    User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0))
    """
    
    User(id='not int', signup_ts='2032-06-21T12:00')
    

    The last line will give:

        ...
    pydantic.error_wrappers.ValidationError: 1 validation error
    id
      value is not a valid integer (type=type_error.integer)
    
    0 讨论(0)
提交回复
热议问题