问题
So I have a folder of a few thousand pdf files in /path, and I have a list of hundreds of names called names.csv (only one column, it could just as easily be .txt).
I'm trying to select (and ideally, move) the pdfs, where any name from names.csv is found in any filename.
From my research so far, it seems like listdir and regex is one approach to at least get a list of the files I want:
import os, sys
import re
for files in os.listdir('path'):
with open('names.csv') as names:
for name in names:
match = re.search(name, files)
print match
But currently this is just returning 'None' 'None' etc, all the way down.
I'm probably doing a bunch of things wrong here. And I'm not even near the part where I need to move the files. But I'm just hoping to get over this first hump.
Any advice is much appreciated!
回答1:
You say that your names.csv is one column. That must mean that each name is followed by a newline char, which will also be included when matching. You could try this:
match = re.search(name.rstrip(), files)
Hope it helps.
回答2:
The problem is that your name
variable always ends with a newline character \n
. The newline character isn't present in the file names, so regex doesn't find any matches.
There are also a few other small issues with your code:
- You're opening the
names.csv
file in each iteration of the loop. It would be more efficient to open the file once, then loop through all files in the directory. - Regex isn't necessary here, and in fact can cause problems. If, for example, a line in your csv file looked like
(this isn't a valid regex
, then your code would throw an exception. This could be fixed by escaping it first, but regex still isn't necessary. - Your
print match
is in the wrong place. Sincematch
is overwritten in each iteration of the loop, and you're printing its value after the loop, you only get to see its last value.
The fixed code could look like this:
import os
# open the file, make a list of all filenames, close the file
with open('names.csv') as names_file:
# use .strip() to remove trailing whitespace and line breaks
names= [line.strip() for line in names_file]
for filename in os.listdir('path'):
for name in names:
# no need for re.search, just use the "in" operator
if name in filename:
# move the file
os.rename(os.path.join('path', filename), '/path/to/somewhere/else')
break
来源:https://stackoverflow.com/questions/37297527/select-files-in-directory-and-move-them-based-on-text-list-of-filenames