Joining a list of tuples within a pandas dataframe

↘锁芯ラ 提交于 2021-02-05 07:12:05

问题


I want to join a list of tuples within a dataframe. I have tried several methods of doing this within the dataframe with join and with lambda

import pandas as pd
from nltk import word_tokenize, pos_tag, pos_tag_sents

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']}
def posTag(data):
    data = pd.DataFrame(data)
    comments = data['Comment'].tolist()
    taggedComments = pos_tag_sents(map(word_tokenize,comments))
    data['taggedComment'] = taggedComments
    print data['taggedComment']
    data['taggedComment'].apply(lambda x: (' '.join(x)))
    return data
taggedData = posTag(data)
print data

Some other methods of tuple joining that I have tried are:

(' '.join(['_'.join(x) for x in data['taggedComment']]))
 [''.join(x) for x in data['taggedComment']]
 ['_'.join(str(x)) for x in data['taggedComment']]

No matter what I do I arrive a the same error.

TypeError: sequence item 0: expected string, tuple found

My desired response if for each list

[('A', 'B'),  ('B', 'C'),  ('C', 'B')]

in the dataframe to outPutFile

'A_B B_C C_B'

Any suggestions as to what is going wrong?


回答1:


You can use double list comprehension and assign output to column back:

So instead of :

data['taggedComment'].apply(lambda x: (' '.join(x)))

use the following in your posTag(data) method:

data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in data['taggedComment']] 


taggedData = posTag(data)
print (taggedData)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                       taggedComment  
0       The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ  
1  NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...  
2  The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._. 

All together:

def posTag(data):
    data  = pd.DataFrame(data)
    comments = data['Comment'].tolist()
    print (pos_tag_sents(map(word_tokenize, comments)))

    taggedComments =  pos_tag_sents(map(word_tokenize,  comments))
    data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in taggedComments]
    return data

taggedData = posTag(data)
print (taggedData)

  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                       taggedComment  
0       The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ  
1  NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...  
2  The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.


来源:https://stackoverflow.com/questions/46366833/joining-a-list-of-tuples-within-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!