How to use glob() to find files recursively?

前端 未结 28 1767
天涯浪人
天涯浪人 2020-11-21 22:54

This is what I have:

glob(os.path.join(\'src\',\'*.c\'))

but I want to search the subfolders of src. Something like this would work:

<
相关标签:
28条回答
  • 2020-11-21 23:15

    You'll want to use os.walk to collect filenames that match your criteria. For example:

    import os
    cfiles = []
    for root, dirs, files in os.walk('src'):
      for file in files:
        if file.endswith('.c'):
          cfiles.append(os.path.join(root, file))
    
    0 讨论(0)
  • 2020-11-21 23:17

    Recently I had to recover my pictures with the extension .jpg. I ran photorec and recovered 4579 directories 2.2 million files within, having tremendous variety of extensions.With the script below I was able to select 50133 files havin .jpg extension within minutes:

    #!/usr/binenv python2.7
    
    import glob
    import shutil
    import os
    
    src_dir = "/home/mustafa/Masaüstü/yedek"
    dst_dir = "/home/mustafa/Genel/media"
    for mediafile in glob.iglob(os.path.join(src_dir, "*", "*.jpg")): #"*" is for subdirectory
        shutil.copy(mediafile, dst_dir)
    
    0 讨论(0)
  • 2020-11-21 23:17

    based on other answers this is my current working implementation, which retrieves nested xml files in a root directory:

    files = []
    for root, dirnames, filenames in os.walk(myDir):
        files.extend(glob.glob(root + "/*.xml"))
    

    I'm really having fun with python :)

    0 讨论(0)
  • 2020-11-21 23:18

    Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

    from pathlib import Path
    
    for file_path in Path('src').glob('**/*.c'):
        print(file_path) # do whatever you need with these files
    

    Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().

    0 讨论(0)
  • 2020-11-21 23:18

    If the files are on a remote file system or inside an archive, you can use an implementation of the fsspec AbstractFileSystem class. For example, to list all the files in a zipfile:

    from fsspec.implementations.zip import ZipFileSystem
    fs = ZipFileSystem("/tmp/test.zip")
    fs.glob("/**")  # equivalent: fs.find("/")
    

    or to list all the files in a publicly available S3 bucket:

    from s3fs import S3FileSystem
    fs_s3 = S3FileSystem(anon=True)
    fs_s3.glob("noaa-goes16/ABI-L1b-RadF/2020/045/**")  # or use fs_s3.find
    

    you can also use it for a local filesystem, which may be interesting if your implementation should be filesystem-agnostic:

    from fsspec.implementations.local import LocalFileSystem
    fs = LocalFileSystem()
    fs.glob("/tmp/test/**")
    

    Other implementations include Google Cloud, Github, SFTP/SSH, Dropbox, and Azure. For details, see the fsspec API documentation.

    0 讨论(0)
  • 2020-11-21 23:22

    Johan and Bruno provide excellent solutions on the minimal requirement as stated. I have just released Formic which implements Ant FileSet and Globs which can handle this and more complicated scenarios. An implementation of your requirement is:

    import formic
    fileset = formic.FileSet(include="/src/**/*.c")
    for file_name in fileset.qualified_files():
        print file_name
    
    0 讨论(0)
提交回复
热议问题