Python, how to implement something like .gitignore behavior

不羁的心 提交于 2019-11-29 06:56:31

You're on the right track: If you want to use fnmatch-style patterns, you should use fnmatch.filter with them.

But there are three problems that make this not quite trivial.

First, you want to apply multiple filters. How do you do that? Call filter multiple times:

for ignore in ignore_files:
    filenames = fnmatch.filter(filenames, ignore)

Second, you actually want to do the reverse of filter: return the subset of names that don't match. As the documentation explains:

It is the same as [n for n in names if fnmatch(n, pattern)], but implemented more efficiently.

So, to do the opposite, you just throw in a not:

for ignore in ignore_files:
    filenames = [n for n in filenames if not fnmatch(n, ignore)]

Finally, you're attempting to filter on partial pathnames, not just filenames, but you're not doing the join until after the filtering. So switch the order:

filenames = [os.path.join(root, filename) for filename in filenames]
for ignore in ignore_files:
    filenames = [n for n in filenames if not fnmatch(n, ignore)]
matches.extend(filenames)

There are few ways you could improve this.

You may want to use a generator expression instead of a list comprehension (parentheses instead of square brackets), so if you have huge lists of filenames you're using a lazy pipeline instead of wasting time and space repeatedly building huge lists.

Also, it may or may not be easier to understand if you invert the order of the loops, like this:

filenames = (n for n in filenames 
             if not any(fnmatch(n, ignore) for ignore in ignore_files))

Finally, if you're worried about performance, you can use fnmatch.translate on each expression to turn them into equivalent regexps, then merge them into one big regexp and compile it, and use that instead of a loop around fnmatch. This can get tricky if your patterns are allowed to be more complicated than just *.jpg, and I wouldn't recommend it unless you really do identify a performance bottleneck here. But if you need to do it, I've seen at least one question on SO where someone put a lot of effort into hammering out all the edge cases, so search instead of trying to write it yourself.

matches.extend([fn for fn if not filename in ignore_files])

Should do the trick for simple filenames, for ignore patterns something like:

def reject(filename, filter):
    """ Takes a filename and a filter to reject files that match."""
    if len(filter)==0:
         return False
    else:
         return fnmatch.fnmach(filename, filter[0]) or reject(filename, filter[1:])

matches.extend([os.path.join(root, fn) for fn in filenames if not reject(fn, ignore_files)])

The above will while building a list from the filenames in the os.walk check that none of the filters provide a match - the filters are checked until either there are none left or the first match is found so it should be quite quick.

You could also try something like:

filenames = set(filenames)  # convert to a set
for filter in ignore_files:
   filenames = filenames - set(fnmatch.filter(filenames, filter)) # remove the matches
matches.extend([os.path.join(root, fn) for fn in filenames])  # Add to matches
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!