Split (explode) pandas dataframe string entry to separate rows

后端 未结 22 3546
一向
一向 2020-11-21 05:03

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as

22条回答
  •  时光说笑
    2020-11-21 05:35

    I had a similar problem, my solution was converting the dataframe to a list of dictionaries first, then do the transition. Here is the function:

    import re
    import pandas as pd
    
    def separate_row(df, column_name):
        ls = []
        for row_dict in df.to_dict('records'):
            for word in re.split(',', row_dict[column_name]):
                row = row_dict.copy()
                row[column_name]=word
                ls.append(row)
        return pd.DataFrame(ls)
    

    Example:

    >>> from pandas import DataFrame
    >>> import numpy as np
    >>> a = DataFrame([{'var1': 'a,b,c', 'var2': 1},
                   {'var1': 'd,e,f', 'var2': 2}])
    >>> a
        var1  var2
    0  a,b,c     1
    1  d,e,f     2
    >>> separate_row(a, "var1")
      var1  var2
    0    a     1
    1    b     1
    2    c     1
    3    d     2
    4    e     2
    5    f     2
    

    You can also change the function a bit to support separating list type rows.

提交回复
热议问题