I\'m creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data
Edit [BLUF]: there is no problem with the answer provided by @martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.
I'd like to supply some additional information on the use of __init_subclass__
in conjuncture with using __new__
as a factory. The answer that @martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path
is implemented.
To follow up on a comment trail with @martinaeu I have taken the following snippet from his answer:
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
@classmethod
def __init_subclass__(cls, /, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")
This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.
Firstly, for the decorator on __init_subclass__
, per the PEP:
One could require the explicit use of
@classmethod
on the__init_subclass__
decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.
Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).
In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as @martineau does here:
class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...
When browsing through the PEP:
As a second change, the new
type.__init__
just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding__init__
.
Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass=
keyword) and the overridden __init_subclass__
. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop()
.
If we look at the following:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem, path_prefix='nfs'): ...
This will still run without issue, but if we remove the kwargs.pop()
:
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs) # throws TypeError
cls._registry[path_prefix] = cls # Add class to registry.
The error thrown is already known and described in the PEP:
In the new code, it is not
__init__
that complains about keyword arguments, but__init_subclass__
, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each__init_subclass__
may take out it's keyword arguments until none are left, which is checked by the default implementation of__init_subclass__
.
What is happening is the path_prefix=
keyword is being "popped" off of kwargs
, not just accessed, so then **kwargs
is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).
To avoid this entirely, I propose not relying on kwargs
but instead use that which is already present in the call to __init_subclass__
, namely the cls
reference:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[cls._path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem):
_path_prefix = 'nfs'
...
Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix
). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.
So to @martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.
In my opinion, using __new__
in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs
and LocalDrive
with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem
class:
class FileSystem(object):
# other code ...
@staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')
I don't think using __new__()
to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method. Given the choices available, making the __new__()
method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__()
method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of the __new__()
method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
# Works in Python 2 and 3.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
@classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object
named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses()
method is doing in the above.
# Requires Python 3.6+
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
@classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
@classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")