Pandas explode column into rows

前端 未结 2 1953
别跟我提以往
别跟我提以往 2021-01-20 11:35

I have a DataFrame where each row has two columns: date, and mentions. The end result would be a Dataframe of mentions per date, which should be easy via GroupBy if I can br

相关标签:
2条回答
  • 2021-01-20 11:54

    From sklearn

    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    pd.DataFrame(mlb.fit_transform(df['mentions'].str.split(',')),columns=mlb.classes_, index=df.date).sum(level=0)
    Out[1745]: 
                alpha  beta  delta  gamma
    date                                 
    2018-01-01      2     1      0      1
    2018-01-02      0     1      0      0
    2018-01-03      0     0      1      0
    2018-01-05      1     0      0      0
    2018-01-07      1     0      0      0
    2018-01-10      0     0      1      1
    2018-01-11      0     0      0      1
    

    Borrow Zero's resample('D')

    pd.DataFrame(mlb.fit_transform(df['mentions'].str.split(',')),columns=mlb.classes_, index=df.date).sum(level=0).resample('D')
    
    0 讨论(0)
  • 2021-01-20 12:14

    If your end result is dummy columns then use pd.Series.str.get_dummies

    df.set_index('date').mentions.str.get_dummies(', ').sum(level=0)
    
                alpha  beta  delta  gamma
    date                                 
    2018-01-01      2     1      0      1
    2018-01-02      0     1      0      0
    2018-01-03      0     0      1      0
    2018-01-05      1     0      0      0
    2018-01-07      1     0      0      0
    2018-01-10      0     0      1      1
    2018-01-11      0     0      0      1
    

    As mentioned by @Zero

    df.set_index('date').mentions.str.get_dummies(', ').resample('D').sum()
    
                alpha  beta  delta  gamma
    date                                 
    2018-01-01      2     1      0      1
    2018-01-02      0     1      0      0
    2018-01-03      0     0      1      0
    2018-01-04      0     0      0      0
    2018-01-05      1     0      0      0
    2018-01-06      0     0      0      0
    2018-01-07      1     0      0      0
    2018-01-08      0     0      0      0
    2018-01-09      0     0      0      0
    2018-01-10      0     0      1      1
    2018-01-11      0     0      0      1
    
    0 讨论(0)
提交回复
热议问题