I want to be able to match a pattern in glob format to a list of strings, rather than to actual files in the filesystem. Is there any way to do this, or convert a glob
An extension to @Veedrac PurePath.match answer that can be applied to a lists of strings:
# Python 3.4+
from pathlib import Path
path_list = ["foo/bar.txt", "spam/bar.txt", "foo/eggs.txt"]
# convert string to pathlib.PosixPath / .WindowsPath, then apply PurePath.match to list
print([p for p in path_list if Path(p).match("ba*")]) # "*ba*" also works
# output: ['foo/bar.txt', 'spam/bar.txt']
print([p for p in path_list if Path(p).match("*o/ba*")])
# output: ['foo/bar.txt']
It is preferable to use pathlib.Path()
over pathlib.PurePath()
, because then you don't have to worry about the underlying filesystem.
On Python 3.4+ you can just use PurePath.match.
pathlib.PurePath(path_string).match(pattern)
On Python 3.3 or earlier (including 2.x), get pathlib from PyPI.
Note that to get platform-independent results (which will depend on why you're running this) you'd want to explicitly state PurePosixPath
or PureWindowsPath
.
I wanted to add support for recursive glob patterns, i.e. things/**/*.py
and have relative path matching so example*.py
doesn't match with folder/example_stuff.py
.
Here is my approach:
from os import path
import re
def recursive_glob_filter(files, glob):
# Convert to regex and add start of line match
pattern_re = '^' + fnmatch_translate(glob)
# fnmatch does not escape path separators so escape them
if path.sep in pattern_re and not r'\{}'.format(path.sep) in pattern_re:
pattern_re = pattern_re.replace('/', r'\/')
# Replace `*` with one that ignores path separators
sep_respecting_wildcard = '[^\{}]*'.format(path.sep)
pattern_re = pattern_re.replace('.*', sep_respecting_wildcard)
# And now for `**` we have `[^\/]*[^\/]*`, so replace that with `.*`
# to match all patterns in-between
pattern_re = pattern_re.replace(2 * sep_respecting_wildcard, '.*')
compiled_re = re.compile(pattern_re)
return filter(compiled_re.search, files)