Split (explode) pandas dataframe string entry to separate rows

后端 未结 22 3484
一向
一向 2020-11-21 05:03

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as

22条回答
  •  孤街浪徒
    2020-11-21 05:18

    Pandas >= 0.25

    Series and DataFrame methods define a .explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column.

    Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.

    df = pd.DataFrame({'var1': ['a,b,c', 'd,e,f'], 'var2': [1, 2]})
    df
        var1  var2
    0  a,b,c     1
    1  d,e,f     2
    
    df.assign(var1=df['var1'].str.split(',')).explode('var1')
    
      var1  var2
    0    a     1
    0    b     1
    0    c     1
    1    d     2
    1    e     2
    1    f     2
    

    Note that explode only works on a single column (for now).


    NaNs and empty lists get the treatment they deserve without you having to jump through hoops to get it right.

    df = pd.DataFrame({'var1': ['d,e,f', '', np.nan], 'var2': [1, 2, 3]})
    df
        var1  var2
    0  d,e,f     1
    1            2
    2    NaN     3
    
    df['var1'].str.split(',')
    
    0    [d, e, f]
    1           []
    2          NaN
    
    df.assign(var1=df['var1'].str.split(',')).explode('var1')
    
      var1  var2
    0    d     1
    0    e     1
    0    f     1
    1          2  # empty list entry becomes empty string after exploding 
    2  NaN     3  # NaN left un-touched
    

    This is a serious advantage over ravel + repeat -based solutions (which ignore empty lists completely, and choke on NaNs).

提交回复
热议问题