How to create a new row for each comma separated value in a column in pandas

前端 未结 4 2060
野性不改
野性不改 2021-01-15 04:25

I have a dataframe like this:

text                   category 
sfsd sgvv              abc,xyz
zydf sefs sdfsd        yyy
dfsd dsrgd dggr        xyz
eter vxg          


        
相关标签:
4条回答
  • 2021-01-15 04:47

    Try using set_index + stack + str.split + unstack + reset_index for much older versions:

    print(df.set_index('text')
          .stack()
          .str.split(', ', expand=True)
          .stack()
          .unstack(-2)
          .reset_index(-1, drop=True)
          .reset_index())
    
    0 讨论(0)
  • 2021-01-15 04:57

    Use DataFrame.explode (pandas 0.25+) with Series.str.split:

    df1 = (df.assign(category = df['category'].str.split(','))
             .explode('category')
             .reset_index(drop=True))
    

    For oldier pandas versions first DataFrame.set_index for not separator column(s), then Series.str.split and reshape by DataFrame.stack, last DataFrame.reset_index - first for remove second level of MultiIndex and then for convert index to column:

    df1 = (df.set_index('text')['category']
             .str.split(',', expand=True)
             .stack()
             .reset_index(level=1, drop=True)
             .reset_index(name='category'))
    print (df1)
                  text category
    0        sfsd sgvv      abc
    1        sfsd sgvv      xyz
    2  zydf sefs sdfsd      yyy
    3  dfsd dsrgd dggr      xyz
    4     eter vxg wfe      abc
    5       dfvf ertet      abc
    6       dfvf ertet      xyz
    
    0 讨论(0)
  • 2021-01-15 04:57

    Below will give the output you need. Assuming df is your dataset name.

    new_df_skel = dict()
    new_df_skel['text'] = list()
    new_df_skel['category'] = list()
    
    for index,item in df.iterrows():
      item = dict(item)
      unref_cat = item['category']
      if "," in unref_cat:
        for strip in unref_cat.split(','):
          new_df_skel['category'].append(strip)
          new_df_skel['text'].append(item['text'])
      else:
        new_df_skel['category'].append(strip)
        new_df_skel['text'].append(unref_cat)
    
    new_dataset = pd.DataFrame(new_df_skel)
    

    Have a great day!

    0 讨论(0)
  • 2021-01-15 05:01

    Linking to this question, try the following code for your dataframe:

    We can first split the column, expand it, stack it and then join it back to the original df like below:

    df.drop('category', axis=1).join(
      df['category'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('category'))
    
    0 讨论(0)
提交回复
热议问题