问题
I'm learning Python and would like to search for a keyword in multiple files recursively.
I have an example function which should find the *.doc
extension in a directory.
Then, the function should open each file with that file extension and read it.
If a keyword is found while reading the file, the function should identify the file path and print it.
Else, if the keyword is not found, python should continue.
To do that, I have defined a function which takes two arguments:
def find_word(extension, word):
# define the path for os.walk
for dname, dirs, files in os.walk('/rootFolder'):
#search for file name in files:
for fname in files:
#define the path of each file
fpath = os.path.join(dname, fname)
#open each file and read it
with open(fpath) as f:
data=f.read()
# if data contains the word
if word in data:
#print the file path of that file
print (fpath)
else:
continue
Could you give me a hand to fix this code?
Thanks,
回答1:
def find_word(extension, word):
for root, dirs, files in os.walk('/DOC'):
# filter files for given extension:
files = [fi for fi in files if fi.endswith(".{ext}".format(ext=extension))]
for filename in files:
path = os.path.join(root, filename)
# open each file and read it
with open(path) as f:
# split() will create list of words and set will
# create list of unique words
words = set(f.read().split())
if word in words:
print(path)
回答2:
.doc
files are rich text files, i.e. they wont open with a simple text editor or python open method. In this case, you can use other python modules such as python-docx.
Update
For doc files (previous to Word 2007) you can also use other tools such as catdoc or antiword. Try the following.
import subprocess
def doc_to_text(filename):
return subprocess.Popen(
'catdoc -w "%s"' % filename,
shell=True,
stdout=subprocess.PIPE
).stdout.read()
print doc_to_text('fixtures/doc.doc')
回答3:
If you are trying to read .doc file in your code the this won't work. you will have to change the part where you are reading the file.
Here are some links for reading a .doc file in python.
extracting text from MS word files in python
Reading/Writing MS Word files in Python
Reading/Writing MS Word files in Python
来源:https://stackoverflow.com/questions/36572887/python-finds-a-string-in-multiple-files-recursively-and-returns-the-file-path