Pandas: .groupby().size() and percentages

前端 未结 1 392
无人及你
无人及你 2021-01-02 09:34

I have a DataFrame that originates from a df.groupby().size() operation, and looks like this:

Localization                           RNA level           


        
相关标签:
1条回答
  • 2021-01-02 10:03

    Here is the complete example based on pandas groupby, sum functions. The basic idea is to group data based on 'Localization' and to apply a function on group.

    import pandas as pd
    from io import StringIO
    #For Python 2, replace previous line with: from StringIO import StringIO
    
    data = \
    """Localization,RNA level,Size
    cytoplasm                            ,1 Non-expressed, 7
    cytoplasm                            ,2 Very low     ,13
    cytoplasm                            ,3 Low          , 8
    cytoplasm                            ,4 Medium       , 6
    cytoplasm                            ,5 Moderate     , 8
    cytoplasm                            ,6 High         , 2
    cytoplasm                            ,7 Very high    , 6
    cytoplasm & nucleus                  ,1 Non-expressed, 5
    cytoplasm & nucleus                  ,2 Very low     , 8
    cytoplasm & nucleus                  ,3 Low          , 2
    cytoplasm & nucleus                  ,4 Medium       ,10
    cytoplasm & nucleus                  ,5 Moderate     ,16
    cytoplasm & nucleus                  ,6 High         , 6
    cytoplasm & nucleus                  ,7 Very high    , 5
    cytoplasm & nucleus & plasma membrane,1 Non-expressed, 6
    cytoplasm & nucleus & plasma membrane,2 Very low     , 3
    cytoplasm & nucleus & plasma membrane,3 Low          , 3
    cytoplasm & nucleus & plasma membrane,4 Medium       , 7
    cytoplasm & nucleus & plasma membrane,5 Moderate     , 8
    cytoplasm & nucleus & plasma membrane,6 High         , 4
    cytoplasm & nucleus & plasma membrane,7 Very high    , 1"""
    
    # Create the dataframe
    df = pd.read_csv(StringIO(data))
    df['Localization'].str.strip()
    df['RNA level'].str.strip()
    df['Size'].astype(int)
    df['Percent'] = df.groupby('Localization')['Size'].transform(lambda x: x/sum(x))
    
    0 讨论(0)
提交回复
热议问题