Improper use of __new__ to generate classes?

后端 未结 3 1194
我寻月下人不归
我寻月下人不归 2020-11-29 04:05

I\'m creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data

相关标签:
3条回答
  • 2020-11-29 04:38

    Edit [BLUF]: there is no problem with the answer provided by @martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.

    I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that @martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.

    To follow up on a comment trail with @martinaeu I have taken the following snippet from his answer:

    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        @classmethod
        def __init_subclass__(cls, /, **kwargs):
            path_prefix = kwargs.pop('path_prefix', None)
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass. """
            path_prefix = cls._get_prefix(path)
            subclass = FileSystem._registry.get(path_prefix)
            if subclass:
                # Using "object" base class method avoids recursion here.
                return object.__new__(subclass)
            else:  # No subclass with matching prefix found (and no default).
                raise FileSystem.Unknown(
                    f'path "{path}" has no known file system prefix')
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem, path_prefix='nfs'):
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    
    class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise FileSystem.NoAccess('Cannot read directory')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
    
        print(type(data1).__name__)  # -> Nfs
        print(type(data2).__name__)  # -> LocalDrive
    
        print(data2.count_files())  # -> <some number>
    
        try:
            data3 = FileSystem('foobar://42')  # Unregistered path prefix.
        except FileSystem.Unknown as exc:
            print(str(exc), '- raised as expected')
        else:
            raise RuntimeError(
                  "Unregistered path prefix should have raised Exception!")
    

    This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.

    Firstly, for the decorator on __init_subclass__, per the PEP:

    One could require the explicit use of @classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.

    Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).

    In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as @martineau does here:

    class Nfs(FileSystem, path_prefix='nfs'): ...
    class LocalDrive(FileSystem, path_prefix=None): ...
    

    When browsing through the PEP:

    As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.

    Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().

    If we look at the following:

    # code in CPython 3.7
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        def __init_subclass__(cls, **kwargs):
            path_prefix = kwargs.pop('path_prefix', None)
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        ...
    
    class Nfs(FileSystem, path_prefix='nfs'): ...
    

    This will still run without issue, but if we remove the kwargs.pop():

        def __init_subclass__(cls, **kwargs):
            super().__init_subclass__(**kwargs)  # throws TypeError
            cls._registry[path_prefix] = cls  # Add class to registry.
    

    The error thrown is already known and described in the PEP:

    In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.

    What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).

    To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:

    # code in CPython 3.7
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        def __init_subclass__(cls, **kwargs):
            super().__init_subclass__(**kwargs)
            cls._registry[cls._path_prefix] = cls  # Add class to registry.
    
        ...
    
    class Nfs(FileSystem):
        _path_prefix = 'nfs'
    
        ...
    

    Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.

    So to @martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.

    0 讨论(0)
  • 2020-11-29 04:48

    In my opinion, using __new__ in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs and LocalDrive with their corresponding classes.

    Why not make a separate function with this behaviour? It can even be a static method of FileSystem class:

    class FileSystem(object):
        # other code ...
    
        @staticmethod
        def from_path(path):
            if path.upper().startswith('NFS://'): 
                return Nfs(path)
            else: 
                return LocalDrive(path)
    

    And you call it like this:

    data1 = FileSystem.from_path('nfs://192.168.1.18')
    data2 = FileSystem.from_path('/var/log')
    
    0 讨论(0)
  • 2020-11-29 04:52

    I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".

    If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method. Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.

    That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.

    A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.

    # Works in Python 2 and 3.
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
    
        @classmethod
        def _get_all_subclasses(cls):
            """ Recursive generator of all class' subclasses. """
            for subclass in cls.__subclasses__():
                yield subclass
                for subclass in subclass._get_all_subclasses():
                    yield subclass
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass using path prefix. """
            path_prefix = cls._get_prefix(path)
    
            for subclass in cls._get_all_subclasses():
                if subclass.prefix == path_prefix:
                    # Using "object" base class method avoids recursion here.
                    return object.__new__(subclass)
            else:  # No subclass with matching prefix found (& no default defined)
                raise FileSystem.Unknown(
                    'path "{}" has no known file system prefix'.format(path))
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem):
        prefix = 'nfs'
    
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    
    class LocalDrive(FileSystem):
        prefix = None  # Default when no file system prefix is found.
    
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise FileSystem.NoAccess('Cannot read directory')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
    
        print(type(data1).__name__)  # -> Nfs
        print(type(data2).__name__)  # -> LocalDrive
    
        print(data2.count_files())  # -> <some number>
    

    Python 3.6+ Update

    The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.

    # Requires Python 3.6+
    
    import os
    import re
    
    class FileSystem(object):
        class NoAccess(Exception): pass
        class Unknown(Exception): pass
    
        # Regex for matching "xxx://" where x is any non-whitespace character except for ":".
        _PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
        _registry = {}  # Registered subclasses.
    
        @classmethod
        def __init_subclass__(cls, /, path_prefix, **kwargs):
            super().__init_subclass__(**kwargs)
            cls._registry[path_prefix] = cls  # Add class to registry.
    
        @classmethod
        def _get_prefix(cls, s):
            """ Extract any file system prefix at beginning of string s and
                return a lowercase version of it or None when there isn't one.
            """
            match = cls._PATH_PREFIX_PATTERN.match(s)
            return match.group(1).lower() if match else None
    
        def __new__(cls, path):
            """ Create instance of appropriate subclass. """
            path_prefix = cls._get_prefix(path)
            subclass = FileSystem._registry.get(path_prefix)
            if subclass:
                # Using "object" base class method avoids recursion here.
                return object.__new__(subclass)
            else:  # No subclass with matching prefix found (and no default).
                raise FileSystem.Unknown(
                    f'path "{path}" has no known file system prefix')
    
        def count_files(self):
            raise NotImplementedError
    
    
    class Nfs(FileSystem, path_prefix='nfs'):
        def __init__ (self, path):
            pass
    
        def count_files(self):
            pass
    
    
    class LocalDrive(FileSystem, path_prefix=None):  # Default file system.
        def __init__(self, path):
            if not os.access(path, os.R_OK):
                raise FileSystem.NoAccess('Cannot read directory')
            self.path = path
    
        def count_files(self):
            return sum(os.path.isfile(os.path.join(self.path, filename))
                         for filename in os.listdir(self.path))
    
    
    if __name__ == '__main__':
    
        data1 = FileSystem('nfs://192.168.1.18')
        data2 = FileSystem('c:/')  # Change as necessary for testing.
    
        print(type(data1).__name__)  # -> Nfs
        print(type(data2).__name__)  # -> LocalDrive
    
        print(data2.count_files())  # -> <some number>
    
        try:
            data3 = FileSystem('foobar://42')  # Unregistered path prefix.
        except FileSystem.Unknown as exc:
            print(str(exc), '- raised as expected')
        else:
            raise RuntimeError(
                  "Unregistered path prefix should have raised Exception!")
    
    0 讨论(0)
提交回复
热议问题