Count occurences for each year in pandas dataframe based on subgroup

后端 未结 2 1489
自闭症患者
自闭症患者 2021-01-14 18:40

Imagine a pandasdataframe that are given by

df = pd.DataFrame({
    \'id\': [1, 1, 1, 2, 2],
    \'location\': [1, 2, 3, 1, 2],
    \'date\': [p         


        
相关标签:
2条回答
  • 2021-01-14 19:20

    Create helper DataFrame by groupby with size, unstack and year and join to original df:

    df1 = df.join(df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0), on='id')
    print (df1)
        location       date  2015  2016  2017  2018
    id                                             
    1          1 2015-01-01     2     1     0     0
    1          2 2016-01-01     2     1     0     0
    1          3 2015-01-01     2     1     0     0
    2          1 2017-01-01     0     0     1     1
    2          2 2018-01-01     0     0     1     1
    

    Detail:

    print (df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0))
    
    date  2015  2016  2017  2018
    id                          
    1        2     1     0     0
    2        0     0     1     1
    

    Another solution with crosstab:

    df1 = df.join(pd.crosstab(df.index, df['date'].dt.year), on='id')
    
    print (pd.crosstab(df.index, df['date'].dt.year))
    date   2015  2016  2017  2018
    row_0                        
    1         2     1     0     0
    2         0     0     1     1
    
    0 讨论(0)
  • 2021-01-14 19:33

    get_dummies

    df.join(pd.get_dummies(df.date.dt.year).sum(level=0))
    
             date  location  2015  2016  2017  2018
    id                                             
    1  2015-01-01         1     2     1     0     0
    1  2016-01-01         2     2     1     0     0
    1  2015-01-01         3     2     1     0     0
    2  2017-01-01         1     0     0     1     1
    2  2018-01-01         2     0     0     1     1
    

    factorize

    i, r = pd.factorize(df.index)
    j, c = pd.factorize(df.date.dt.year)
    n, m = shape = len(r), len(c)
    b = np.zeros(shape, dtype=np.int64)
    np.add.at(b, (i, j), 1)
    
    df.join(pd.DataFrame(b, r, c).rename_axis('id'))
    
             date  location  2015  2016  2017  2018
    id                                             
    1  2015-01-01         1     2     1     0     0
    1  2016-01-01         2     2     1     0     0
    1  2015-01-01         3     2     1     0     0
    2  2017-01-01         1     0     0     1     1
    2  2018-01-01         2     0     0     1     1
    
    0 讨论(0)
提交回复
热议问题