Pandas: DataFrame filtering using groupby and a function

前端 未结 2 1852
名媛妹妹
名媛妹妹 2021-01-06 09:44

Using Python 3.3 and Pandas 0.10

I have a DataFrame that is built from concatenating multiple CSV files. First, I filter out all values in the Name column that conta

相关标签:
2条回答
  • 2021-01-06 09:52

    Instead of length len, I think you want to consider the number of unique values of Name in each group. Use nunique(), and check out this neat recipe for filtering groups.

    df[df.groupby('ID').Name.transform(lambda x: x.nunique() == 1).astype('bool')]
    

    If you upgrade to pandas 0.12, you can use the new filter method on groups, which makes this more succinct and straightforward.

    df.groupby('ID').filter(lambda x: x.Name.nunique() == 1)
    

    A general remark: Sometimes, of course, you do want to know the length of the group, but I find that size is a safer choice than len, which has been troublesome for me in some cases.

    0 讨论(0)
  • 2021-01-06 10:02

    You could first drop the duplicates:

    In [11]: df = df.drop_duplicates()
    
    In [12]: df
    Out[12]:
      Name ID
    0    A  1
    1    B  2
    2    C  3
    4    E  4
    5    F  4
    

    The groupby id and only consider those with one element:

    In [13]: g = df.groupby('ID')
    
    In [14]: size = (g.size() == 1)
    
    In [15]: size
    Out[15]:
    ID
    1      True
    2      True
    3      True
    4     False
    dtype: bool
    
    In [16]: size[size].index
    Out[16]: Int64Index([1, 2, 3], dtype=int64)
    
    In [17]: df['ID'].isin(size[size].index)
    Out[17]:
    0     True
    1     True
    2     True
    4    False
    5    False
    Name: ID, dtype: bool
    

    And boolean index by this:

    In [18]: df[df['ID'].isin(size[size].index)]
    Out[18]:
      Name ID
    0    A  1
    1    B  2
    2    C  3
    
    0 讨论(0)
提交回复
热议问题