Python glob but against a list of strings rather than the filesystem

后端 未结 9 1784
被撕碎了的回忆
被撕碎了的回忆 2021-02-06 21:24

I want to be able to match a pattern in glob format to a list of strings, rather than to actual files in the filesystem. Is there any way to do this, or convert a glob

9条回答
  •  广开言路
    2021-02-06 21:30

    Here is a glob that can deal with escaped punctuation. It does not stop on path separators. I'm posting it here because it matches the title of the question.

    To use on a list:

    rex = glob_to_re(glob_pattern)
    rex = r'(?s:%s)\Z' % rex # Can match newline; match whole string.
    rex = re.compile(rex)
    matches = [name for name in names if rex.match(name)]
    

    Here's the code:

    import re as _re
    
    class GlobSyntaxError(SyntaxError):
        pass
    
    def glob_to_re(pattern):
        r"""
        Given pattern, a unicode string, return the equivalent regular expression.
        Any special character * ? [ ! - ] \ can be escaped by preceding it with 
        backslash ('\') in the pattern.  Forward-slashes ('/') and escaped 
        backslashes ('\\') are treated as ordinary characters, not boundaries.
    
        Here is the language glob_to_re understands.
        Earlier alternatives within rules have precedence.  
            pattern = item*
            item    = '*'  |  '?'  |  '[!' set ']'  |  '[' set ']'  |  literal
            set     = element element*
            element = literal '-' literal  |  literal
            literal = '\' char  |  char other than \  [  ] and sometimes -
        glob_to_re does not understand "{a,b...}".
        """
        # (Note: the docstring above is r""" ... """ to preserve backslashes.)
        def expect_char(i, context):
            if i >= len(pattern):
                s = "Unfinished %s: %r, position %d." % (context, pattern, i)
                raise GlobSyntaxError(s)
        
        def literal_to_re(i, context="pattern", bad="[]"):
            if pattern[i] == '\\':
                i += 1
                expect_char(i, "backslashed literal")
            else:
                if pattern[i] in bad:
                    s = "Unexpected %r in %s: %r, position %d." \
                        % (pattern[i], context, pattern, i)
                    raise GlobSyntaxError(s)
            return _re.escape(pattern[i]), i + 1
    
        def set_to_re(i):
            assert pattern[i] == '['
            set_re = "["
            i += 1
            try:
                if pattern[i] == '!':
                    set_re += '^'
                    i += 1
                while True:
                    lit_re, i = literal_to_re(i, "character set", bad="[-]")
                    set_re += lit_re
                    if pattern[i] == '-':
                        set_re += '-'
                        i += 1
                        expect_char(i, "character set range")
                        lit_re, i = literal_to_re(i, "character set range", bad="[-]")
                        set_re += lit_re
                    if pattern[i] == ']':
                        return set_re + ']', i + 1
                    
            except IndexError:
                expect_char(i, "character set")  # Trigger "unfinished" error.
    
        i = 0
        re_pat = ""
        while i < len(pattern):
            if pattern[i] == '*':
                re_pat += ".*"
                i += 1
            elif pattern[i] == '?':
                re_pat += "."
                i += 1
            elif pattern[i] == '[':
                set_re, i = set_to_re(i)
                re_pat += set_re
            else:
                lit_re, i = literal_to_re(i)
                re_pat += lit_re
        return re_pat
    

提交回复
热议问题