问题
I want to join a list of tuples within a dataframe.
I have tried several methods of doing this within the dataframe with join
and with lambda
import pandas as pd
from nltk import word_tokenize, pos_tag, pos_tag_sents
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']}
def posTag(data):
data = pd.DataFrame(data)
comments = data['Comment'].tolist()
taggedComments = pos_tag_sents(map(word_tokenize,comments))
data['taggedComment'] = taggedComments
print data['taggedComment']
data['taggedComment'].apply(lambda x: (' '.join(x)))
return data
taggedData = posTag(data)
print data
Some other methods of tuple
joining that I have tried are:
(' '.join(['_'.join(x) for x in data['taggedComment']]))
[''.join(x) for x in data['taggedComment']]
['_'.join(str(x)) for x in data['taggedComment']]
No matter what I do I arrive a the same error.
TypeError: sequence item 0: expected string, tuple found
My desired response if for each list
[('A', 'B'), ('B', 'C'), ('C', 'B')]
in the dataframe to outPutFile
'A_B B_C C_B'
Any suggestions as to what is going wrong?
回答1:
You can use double list comprehension
and assign output to column back:
So instead of :
data['taggedComment'].apply(lambda x: (' '.join(x)))
use the following in your posTag(data)
method:
data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in data['taggedComment']]
taggedData = posTag(data)
print (taggedData)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
taggedComment
0 The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ
1 NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...
2 The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.
All together:
def posTag(data):
data = pd.DataFrame(data)
comments = data['Comment'].tolist()
print (pos_tag_sents(map(word_tokenize, comments)))
taggedComments = pos_tag_sents(map(word_tokenize, comments))
data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in taggedComments]
return data
taggedData = posTag(data)
print (taggedData)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
taggedComment
0 The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ
1 NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...
2 The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.
来源:https://stackoverflow.com/questions/46366833/joining-a-list-of-tuples-within-a-pandas-dataframe