Python glob but against a list of strings rather than the filesystem

后端 未结 9 1768
被撕碎了的回忆
被撕碎了的回忆 2021-02-06 21:24

I want to be able to match a pattern in glob format to a list of strings, rather than to actual files in the filesystem. Is there any way to do this, or convert a glob

9条回答
  •  闹比i
    闹比i (楼主)
    2021-02-06 21:40

    Good artists copy; great artists steal.

    I stole ;)

    fnmatch.translate translates globs ? and * to regex . and .* respectively. I tweaked it not to.

    import re
    
    def glob2re(pat):
        """Translate a shell PATTERN to a regular expression.
    
        There is no way to quote meta-characters.
        """
    
        i, n = 0, len(pat)
        res = ''
        while i < n:
            c = pat[i]
            i = i+1
            if c == '*':
                #res = res + '.*'
                res = res + '[^/]*'
            elif c == '?':
                #res = res + '.'
                res = res + '[^/]'
            elif c == '[':
                j = i
                if j < n and pat[j] == '!':
                    j = j+1
                if j < n and pat[j] == ']':
                    j = j+1
                while j < n and pat[j] != ']':
                    j = j+1
                if j >= n:
                    res = res + '\\['
                else:
                    stuff = pat[i:j].replace('\\','\\\\')
                    i = j+1
                    if stuff[0] == '!':
                        stuff = '^' + stuff[1:]
                    elif stuff[0] == '^':
                        stuff = '\\' + stuff
                    res = '%s[%s]' % (res, stuff)
            else:
                res = res + re.escape(c)
        return res + '\Z(?ms)'
    

    This one à la fnmatch.filter, both re.match and re.search work.

    def glob_filter(names,pat):
        return (name for name in names if re.match(glob2re(pat),name))
    

    Glob patterns and strings found on this page pass test.

    pat_dict = {
                'a/b/*/f.txt': ['a/b/c/f.txt', 'a/b/q/f.txt', 'a/b/c/d/f.txt','a/b/c/d/e/f.txt'],
                '/foo/bar/*': ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'],
                '/*/bar/b*': ['/foo/bar/baz', '/foo/bar/bar'],
                '/*/[be]*/b*': ['/foo/bar/baz', '/foo/bar/bar'],
                '/foo*/bar': ['/foolicious/spamfantastic/bar', '/foolicious/bar']
    
            }
    for pat in pat_dict:
        print('pattern :\t{}\nstrings :\t{}'.format(pat,pat_dict[pat]))
        print('matched :\t{}\n'.format(list(glob_filter(pat_dict[pat],pat))))
    

提交回复
热议问题