Pandas drop_duplicates method not working

前端 未结 3 1206
梦谈多话
梦谈多话 2020-12-29 22:34

I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:

error: TypeError: unhashable type: \'list\'<

相关标签:
3条回答
  • 2020-12-29 23:21

    @Allen's answer is great, but have a little problem.

    df.iloc[df.astype(str).drop_duplicates().index]
    

    it should be loc not iloc.loot at the example.

    a = pd.DataFrame([['a',18],['b',11],['a',18]],index=[4,6,8])
    Out[52]: 
       0   1
    4  a  18
    6  b  11
    8  a  18
    
    a.iloc[a.astype(str).drop_duplicates().index]
    Out[53]:
    ...
    IndexError: positional indexers are out-of-bounds
    
    a.loc[a.astype(str).drop_duplicates().index]
    Out[54]: 
       0   1
    4  a  18
    6  b  11
    
    0 讨论(0)
  • 2020-12-29 23:26

    Overview: you can see which rows are duplicated

    Method 1:

    df2=df.copy()
    mylist=df2.iloc[0,1]
    df2.iloc[0,1]=' '.join(map(str,mylist))
    
    mylist=df2.iloc[1,1]
    df2.iloc[1,1]=' '.join(map(str,mylist))
    
    duplicates=df2.duplicated(keep=False)
    print(df2[duplicates])
    

    Method 2:

    print(df.astype(str).duplicated(keep=False))
    
    0 讨论(0)
  • 2020-12-29 23:35

    drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.

    Setup

    df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
     'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
     'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})
    
    #Drop directly causes the same error
    df.drop_duplicates()
    Traceback (most recent call last):
    ...
    TypeError: unhashable type: 'list'
    

    Solution

    #convert hte df to str type, drop duplicates and then select the rows from original df.
    
    df.loc[df.astype(str).drop_duplicates().index]
    Out[205]: 
      Keyword       X   Y
    0   apply  [1, 2]  yy
    2   apply      xy  yx
    3   terms      xx  ix
    4   terms      yy  xi
    
    #the list elements are still list in the final results.
    df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
    Out[207]: [1, 2]
    

    Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general

    0 讨论(0)
提交回复
热议问题