How to use glob() to find files recursively?

前端 未结 28 1757
天涯浪人
天涯浪人 2020-11-21 22:54

This is what I have:

glob(os.path.join(\'src\',\'*.c\'))

but I want to search the subfolders of src. Something like this would work:

<
相关标签:
28条回答
  • Here is a solution that will match the pattern against the full path and not just the base filename.

    It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

    re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn't bother compiling the regex because docs indicate it should be cached internally.)

    import fnmatch
    import os
    import re
    
    def findfiles(dir, pattern):
        patternregex = fnmatch.translate(pattern)
        for root, dirs, files in os.walk(dir):
            for basename in files:
                filename = os.path.join(root, basename)
                if re.search(patternregex, filename, re.IGNORECASE):
                    yield filename
    
    0 讨论(0)
  • 2020-11-21 23:28

    That one uses fnmatch or regular expression:

    import fnmatch, os
    
    def filepaths(directory, pattern):
        for root, dirs, files in os.walk(directory):
            for basename in files:
                try:
                    matched = pattern.match(basename)
                except AttributeError:
                    matched = fnmatch.fnmatch(basename, pattern)
                if matched:
                    yield os.path.join(root, basename)
    
    # usage
    if __name__ == '__main__':
        from pprint import pprint as pp
        import re
        path = r'/Users/hipertracker/app/myapp'
        pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
        pp([x for x in filepaths(path, '*.py')])
    
    0 讨论(0)
  • 2020-11-21 23:29

    In case this may interest anyone, I've profiled the top three proposed methods. I have about ~500K files in the globbed folder (in total), and 2K files that match the desired pattern.

    here's the (very basic) code

    import glob
    import json
    import fnmatch
    import os
    from pathlib import Path
    from time import time
    
    
    def find_files_iglob():
        return glob.iglob("./data/**/data.json", recursive=True)
    
    
    def find_files_oswalk():
        for root, dirnames, filenames in os.walk('data'):
            for filename in fnmatch.filter(filenames, 'data.json'):
                yield os.path.join(root, filename)
    
    def find_files_rglob():
        return Path('data').rglob('data.json')
    
    t0 = time()
    for f in find_files_oswalk(): pass    
    t1 = time()
    for f in find_files_rglob(): pass
    t2 = time()
    for f in find_files_iglob(): pass 
    t3 = time()
    print(t1-t0, t2-t1, t3-t2)
    

    And the results I got were:
    os_walk: ~3.6sec
    rglob ~14.5sec
    iglob: ~16.9sec

    The platform: Ubuntu 16.04, x86_64 (core i7),

    0 讨论(0)
  • 2020-11-21 23:30

    Here's a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

    import os
    cfiles = [os.path.join(root, filename)
              for root, dirnames, filenames in os.walk('src')
              for filename in filenames if filename.endswith('.c')]
    

    It can be compressed to a one-liner:

    import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]
    

    or generalized as a function:

    import os
    
    def recursive_glob(rootdir='.', suffix=''):
        return [os.path.join(looproot, filename)
                for looproot, _, filenames in os.walk(rootdir)
                for filename in filenames if filename.endswith(suffix)]
    
    cfiles = recursive_glob('src', '.c')
    

    If you do need full glob style patterns, you can follow Alex's and Bruno's example and use fnmatch:

    import fnmatch
    import os
    
    def recursive_glob(rootdir='.', pattern='*'):
        return [os.path.join(looproot, filename)
                for looproot, _, filenames in os.walk(rootdir)
                for filename in filenames
                if fnmatch.fnmatch(filename, pattern)]
    
    cfiles = recursive_glob('src', '*.c')
    
    0 讨论(0)
  • 2020-11-21 23:31

    I've modified the glob module to support ** for recursive globbing, e.g:

    >>> import glob2
    >>> all_header_files = glob2.glob('src/**/*.c')
    

    https://github.com/miracle2k/python-glob2/

    Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.

    0 讨论(0)
  • 2020-11-21 23:31

    Simplified version of Johan Dahlin's answer, without fnmatch.

    import os
    
    matches = []
    for root, dirnames, filenames in os.walk('src'):
      matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']
    
    0 讨论(0)
提交回复
热议问题