问题
I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:
configfiles = glob.glob(\'C:/Users/sam/Desktop/file1/*.txt\')
But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?
回答1:
In Python 3.5 and newer use the new recursive **/
functionality:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive
is set, **
followed by a path separator matches 0 or more subdirectories.
In earlier Python versions, glob.glob()
cannot list files in subdirectories recursively.
In that case I'd use os.walk() combined with fnmatch.filter() instead:
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txt
files. In this specific case the fnmatch.filter()
may be overkill, you could also use a .endswith()
test:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]
回答2:
To find files in immediate subdirectories:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use **
and pass recursive=True
since Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob()
to return paths one by one. Or use pathlib:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).
回答3:
The glob2 package supports wild cards and is reasonably fast
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.
回答4:
You can use Formic with Python 2.6
import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")
Disclosure - I am the author of this package.
回答5:
There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):
glob.glob('*.txt') :
matches all files ending in '.txt' in current directoryglob.glob('*/*.txt') :
same as 1glob.glob('**/*.txt') :
matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directoryglob.glob('*.txt',recursive=True) :
same as 1glob.glob('*/*.txt',recursive=True) :
same as 3glob.glob('**/*.txt',recursive=True):
matches all files ending in '.txt' in the current directory and in all subdirectories
So it's best to always specify recursive=True.
回答6:
Here is a adapted version that enables glob.glob
like functionality without using glob2
.
def find_files(directory, pattern='*'):
if not os.path.exists(directory):
raise ValueError("Directory not found {}".format(directory))
matches = []
for root, dirnames, filenames in os.walk(directory):
for filename in filenames:
full_path = os.path.join(root, filename)
if fnmatch.filter([full_path], pattern):
matches.append(os.path.join(root, filename))
return matches
So if you have the following dir structure
tests/files
├── a0
│ ├── a0.txt
│ ├── a0.yaml
│ └── b0
│ ├── b0.yaml
│ └── b00.yaml
└── a1
You can do something like this
files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']
Pretty much fnmatch
pattern match on the whole filename itself, rather than the filename only.
回答7:
configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
Doesn't works for all cases, instead use glob2
configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")
回答8:
If you can install glob2 package...
import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext") # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")
All filenames and folders:
all_ff = glob2.glob("C:\\top_directory\\**\\**")
回答9:
If you're running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the **
pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.
from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")
回答10:
As pointed out by Martijn, glob can only do this through the **
operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly
import os, glob, itertools
configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))
Note that you can only iterate once over configfiles
in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles)
.
来源:https://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module