Using the following bit of code:
for root, dirs, files in os.walk(corpus_name):
for file in files:
if file.endswith(\".v4_gold_conll\"):
You'll have to join the root
with the filename.
for root, dirs, files in os.walk(corpus_name):
for file in files:
if file.endswith(".v4_gold_conll"):
with open(os.path.join(root, file)) as f:
tokens = [
line.split()[3]
for line in f
if line.strip() and not line.startswith("#")
]
print(tokens)
file
is just the file without the directory, which is root
in your code. Try this:
f = open(os.path.join(root, file)))
Also, you should better use with
to open the file, and not use file
as a variable name, shadowing the builtin type. Also, judging from your comment, you should probably extend the list of tokens (use +=
instead of =
):
tokens = []
for root, dirs, files in os.walk(corpus_name):
for filename in files:
if filename.endswith(".v4_gold_conll"):
with open(os.path.join(root, filename))) as f:
tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
print(tokens)