How to use glob() to find files recursively?

霸气de小男生 提交于 2019-11-25 21:56:55

问题


This is what I have:

glob(os.path.join(\'src\',\'*.c\'))

but I want to search the subfolders of src. Something like this would work:

glob(os.path.join(\'src\',\'*.c\'))
glob(os.path.join(\'src\',\'*\',\'*.c\'))
glob(os.path.join(\'src\',\'*\',\'*\',\'*.c\'))
glob(os.path.join(\'src\',\'*\',\'*\',\'*\',\'*.c\'))

But this is obviously limited and clunky.


回答1:


Python 3.5+

Since you're on a new python, you should use pathlib.Path.rglob from the the pathlib module.

from pathlib import Path

for filename in Path('src').rglob('*.c'):
    print(filename)

If you don't want to use pathlib, just use glob.glob, but don't forget to pass in the recursive keyword parameter.

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk solution below.

Older Python versions

For older Python versions, use os.walk to recursively walk a directory and fnmatch.filter to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))



回答2:


Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.




回答3:


I've modified the glob module to support ** for recursive globbing, e.g:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.




回答4:


Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().




回答5:


import os
import fnmatch


def recursive_glob(treeroot, pattern):
    results = []
    for base, dirs, files in os.walk(treeroot):
        goodfiles = fnmatch.filter(files, pattern)
        results.extend(os.path.join(base, f) for f in goodfiles)
    return results

fnmatch gives you exactly the same patterns as glob, so this is really an excellent replacement for glob.glob with very close semantics. An iterative version (e.g. a generator), IOW a replacement for glob.iglob, is a trivial adaptation (just yield the intermediate results as you go, instead of extending a single results list to return at the end).




回答6:


You'll want to use os.walk to collect filenames that match your criteria. For example:

import os
cfiles = []
for root, dirs, files in os.walk('src'):
  for file in files:
    if file.endswith('.c'):
      cfiles.append(os.path.join(root, file))



回答7:


Here's a solution with nested list comprehensions, os.walk and simple suffix matching instead of glob:

import os
cfiles = [os.path.join(root, filename)
          for root, dirnames, filenames in os.walk('src')
          for filename in filenames if filename.endswith('.c')]

It can be compressed to a one-liner:

import os;cfiles=[os.path.join(r,f) for r,d,fs in os.walk('src') for f in fs if f.endswith('.c')]

or generalized as a function:

import os

def recursive_glob(rootdir='.', suffix=''):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames if filename.endswith(suffix)]

cfiles = recursive_glob('src', '.c')

If you do need full glob style patterns, you can follow Alex's and Bruno's example and use fnmatch:

import fnmatch
import os

def recursive_glob(rootdir='.', pattern='*'):
    return [os.path.join(looproot, filename)
            for looproot, _, filenames in os.walk(rootdir)
            for filename in filenames
            if fnmatch.fnmatch(filename, pattern)]

cfiles = recursive_glob('src', '*.c')



回答8:


Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

#!/usr/binenv python2.7

import glob
import shutil
import os

src_dir = "/home/mustafa/Masaüstü/yedek"
dst_dir = "/home/mustafa/Genel/media"
for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
    shutil.copy(mediafile, dst_dir)



回答9:


Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

import formic
fileset = formic.FileSet(include="/src/**/*.c")
for file_name in fileset.qualified_files():
    print file_name



回答10:


based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

files = []
for root, dirnames, filenames in os.walk(myDir):
    files.extend(glob.glob(root + "/*.xml"))

I'm really having fun with python :)




回答11:


For python >= 3.5 you can use **, recursive=True :

import glob
for x in glob.glob('path/**/*.c', recursive=True):
    print(x)

Demo


If recursive is true, the pattern ** will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.




回答12:


Consider pathlib.rglob().

This is like calling Path.glob() with "**/" added in front of the given relative pattern:

import pathlib


for p in pathlib.Path("src").rglob("*.c"):
    print(p)

See also @taleinat's related post here and a similar post elsewhere.




回答13:


Another way to do it using just the glob module. Just seed the rglob method with a starting base directory and a pattern to match and it will return a list of matching file names.

import glob
import os

def _getDirs(base):
    return [x for x in glob.iglob(os.path.join( base, '*')) if os.path.isdir(x) ]

def rglob(base, pattern):
    list = []
    list.extend(glob.glob(os.path.join(base,pattern)))
    dirs = _getDirs(base)
    if len(dirs):
        for d in dirs:
            list.extend(rglob(os.path.join(base,d), pattern))
    return list



回答14:


Just made this.. it will print files and directory in hierarchical way

But I didn't used fnmatch or walk

#!/usr/bin/python

import os,glob,sys

def dirlist(path, c = 1):

        for i in glob.glob(os.path.join(path, "*")):
                if os.path.isfile(i):
                        filepath, filename = os.path.split(i)
                        print '----' *c + filename

                elif os.path.isdir(i):
                        dirname = os.path.basename(i)
                        print '----' *c + dirname
                        c+=1
                        dirlist(i,c)
                        c-=1


path = os.path.normpath(sys.argv[1])
print(os.path.basename(path))
dirlist(path)



回答15:


That one uses fnmatch or regular expression:

import fnmatch, os

def filepaths(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            try:
                matched = pattern.match(basename)
            except AttributeError:
                matched = fnmatch.fnmatch(basename, pattern)
            if matched:
                yield os.path.join(root, basename)

# usage
if __name__ == '__main__':
    from pprint import pprint as pp
    import re
    path = r'/Users/hipertracker/app/myapp'
    pp([x for x in filepaths(path, re.compile(r'.*\.py$'))])
    pp([x for x in filepaths(path, '*.py')])



回答16:


In addition to the suggested answers, you can do this with some lazy generation and list comprehension magic:

import os, glob, itertools

results = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.c'))
                                               for root, dirs, files in os.walk('src'))

for f in results: print(f)

Besides fitting in one line and avoiding unnecessary lists in memory, this also has the nice side effect, that you can use it in a way similar to the ** operator, e.g., you could use os.path.join(root, 'some/path/*.c') in order to get all .c files in all sub directories of src that have this structure.




回答17:


Simplified version of Johan Dahlin's answer, without fnmatch.

import os

matches = []
for root, dirnames, filenames in os.walk('src'):
  matches += [os.path.join(root, f) for f in filenames if f[-2:] == '.c']



回答18:


Or with a list comprehension:

 >>> base = r"c:\User\xtofl"
 >>> binfiles = [ os.path.join(base,f) 
            for base, _, files in os.walk(root) 
            for f in files if f.endswith(".jpg") ] 



回答19:


Here is my solution using list comprehension to search for multiple file extensions recursively in a directory and all subdirectories:

import os, glob

def _globrec(path, *exts):
""" Glob recursively a directory and all subdirectories for multiple file extensions 
    Note: Glob is case-insensitive, i. e. for '\*.jpg' you will get files ending
    with .jpg and .JPG

    Parameters
    ----------
    path : str
        A directory name
    exts : tuple
        File extensions to glob for

    Returns
    -------
    files : list
        list of files matching extensions in exts in path and subfolders

    """
    dirs = [a[0] for a in os.walk(path)]
    f_filter = [d+e for d in dirs for e in exts]    
    return [f for files in [glob.iglob(files) for files in f_filter] for f in files]

my_pictures = _globrec(r'C:\Temp', '\*.jpg','\*.bmp','\*.png','\*.gif')
for f in my_pictures:
    print f



回答20:


For python 3.5 and later

import glob

#file_names_array = glob.glob('path/*.c', recursive=True)
#above works for files directly at path/ as guided by NeStack

#updated version
file_names_array = glob.glob('path/**/*.c', recursive=True)

further you might need

for full_path_in_src in  file_names_array:
    print (full_path_in_src ) # be like 'abc/xyz.c'
    #Full system path of this would be like => 'path till src/abc/xyz.c'



回答21:


import sys, os, glob

dir_list = ["c:\\books\\heap"]

while len(dir_list) > 0:
    cur_dir = dir_list[0]
    del dir_list[0]
    list_of_files = glob.glob(cur_dir+'\\*')
    for book in list_of_files:
        if os.path.isfile(book):
            print(book)
        else:
            dir_list.append(book)



回答22:


I modified the top answer in this posting.. and recently created this script which will loop through all files in a given directory (searchdir) and the sub-directories under it... and prints filename, rootdir, modified/creation date, and size.

Hope this helps someone... and they can walk the directory and get fileinfo.

import time
import fnmatch
import os

def fileinfo(file):
    filename = os.path.basename(file)
    rootdir = os.path.dirname(file)
    lastmod = time.ctime(os.path.getmtime(file))
    creation = time.ctime(os.path.getctime(file))
    filesize = os.path.getsize(file)

    print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation, filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

for root, dirnames, filenames in os.walk(searchdir):
    ##  for filename in fnmatch.filter(filenames, '*.c'):
    for filename in filenames:
        ##      matches.append(os.path.join(root, filename))
        ##print matches
        fileinfo(os.path.join(root, filename))



回答23:


Here is a solution that will match the pattern against the full path and not just the base filename.

It uses fnmatch.translate to convert a glob-style pattern into a regular expression, which is then matched against the full path of each file found while walking the directory.

re.IGNORECASE is optional, but desirable on Windows since the file system itself is not case-sensitive. (I didn't bother compiling the regex because docs indicate it should be cached internally.)

import fnmatch
import os
import re

def findfiles(dir, pattern):
    patternregex = fnmatch.translate(pattern)
    for root, dirs, files in os.walk(dir):
        for basename in files:
            filename = os.path.join(root, basename)
            if re.search(patternregex, filename, re.IGNORECASE):
                yield filename



回答24:


I needed a solution for python 2.x that works fast on large directories.
I endet up with this:

import subprocess
foundfiles= subprocess.check_output("ls src/*.c src/**/*.c", shell=True)
for foundfile in foundfiles.splitlines():
    print foundfile

Note that you might need some exception handling in case ls doesn't find any matching file.



来源:https://stackoverflow.com/questions/2186525/how-to-use-glob-to-find-files-recursively

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!