Python glob but against a list of strings rather than the filesystem

后端 未结 9 1767
被撕碎了的回忆
被撕碎了的回忆 2021-02-06 21:24

I want to be able to match a pattern in glob format to a list of strings, rather than to actual files in the filesystem. Is there any way to do this, or convert a glob

相关标签:
9条回答
  • 2021-02-06 21:43

    An extension to @Veedrac PurePath.match answer that can be applied to a lists of strings:

    # Python 3.4+
    from pathlib import Path
    
    path_list = ["foo/bar.txt", "spam/bar.txt", "foo/eggs.txt"]
    # convert string to pathlib.PosixPath / .WindowsPath, then apply PurePath.match to list
    print([p for p in path_list if Path(p).match("ba*")])  # "*ba*" also works
    # output: ['foo/bar.txt', 'spam/bar.txt']
    
    print([p for p in path_list if Path(p).match("*o/ba*")])
    # output: ['foo/bar.txt']
    

    It is preferable to use pathlib.Path() over pathlib.PurePath(), because then you don't have to worry about the underlying filesystem.

    0 讨论(0)
  • 2021-02-06 21:45

    On Python 3.4+ you can just use PurePath.match.

    pathlib.PurePath(path_string).match(pattern)
    

    On Python 3.3 or earlier (including 2.x), get pathlib from PyPI.

    Note that to get platform-independent results (which will depend on why you're running this) you'd want to explicitly state PurePosixPath or PureWindowsPath.

    0 讨论(0)
  • 2021-02-06 21:48

    I wanted to add support for recursive glob patterns, i.e. things/**/*.py and have relative path matching so example*.py doesn't match with folder/example_stuff.py.

    Here is my approach:

    
    from os import path
    import re
    
    def recursive_glob_filter(files, glob):
        # Convert to regex and add start of line match
        pattern_re = '^' + fnmatch_translate(glob)
    
        # fnmatch does not escape path separators so escape them
        if path.sep in pattern_re and not r'\{}'.format(path.sep) in pattern_re:
            pattern_re = pattern_re.replace('/', r'\/')
    
        # Replace `*` with one that ignores path separators
        sep_respecting_wildcard = '[^\{}]*'.format(path.sep)
        pattern_re = pattern_re.replace('.*', sep_respecting_wildcard)
    
        # And now for `**` we have `[^\/]*[^\/]*`, so replace that with `.*`
        # to match all patterns in-between
        pattern_re = pattern_re.replace(2 * sep_respecting_wildcard, '.*')
        compiled_re = re.compile(pattern_re)
        return filter(compiled_re.search, files)
    
    0 讨论(0)
提交回复
热议问题