问题
I have a sentence 'And now for something completely different'. I want to tokenize it, tag it and store it into a excel file for further processing. <pre>sent = "And now for something completely different"
words = nltk.word_tokenize(sent)
tags = nltk.pos_tag()
print tags</pre>
The result of above is the words with their tag in a nested list format.
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'), ('completely', 'RB'), ('different', 'JJ')]
I want to store this result list into a excel file, with words in one column and tags to the other.
I tried the following code to achieve the above.
fd = open("output.txt",'w')
i=0
for words in tags:
for word in words:
i+=1
fd.write(word)
if i==1:
fd.write('\t')
fd.write('\n')
i=0
The above code will perfectly write the words and tag into the output file. If I use shutil method to copy from the text file to excel format, it will execute perfectly. The problem comes when I try to read the converted. I get the following error.
XLRDError: Unsupported format, or currupt file: Expected BOF record; founf 'And\tCC\n'
Can anyone tell me how do I write the tagged list to the output file such that I the above error can be resolved?
回答1:
Excel files (xlsx) are not just simple flat files, so trying to copy a text file to xlsx will not work. You could save the file as csv and open it in Excel. I think pandas is really useful for parsing and writing data files (obviously it is also useful for processing data).
import pandas as pd
df = pd.DataFrame(tags)
df.to_excel('output.xlsx', header=False, index=False)
回答2:
Instead writing to excel format. You already writing your file into a tab-separated-value. Excel knows how to read that. I suggest you save your file with '.tsv' extension and open it in excel.
来源:https://stackoverflow.com/questions/33690843/write-a-list-into-excel