Split (explode) pandas dataframe string entry to separate rows

后端 未结 22 3547
一向
一向 2020-11-21 05:03

I have a pandas dataframe in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as

22条回答
  •  长发绾君心
    2020-11-21 05:31

    My version of the solution to add to this collection! :-)

    # Original problem
    from pandas import DataFrame
    import numpy as np
    a = DataFrame([{'var1': 'a,b,c', 'var2': 1},
                   {'var1': 'd,e,f', 'var2': 2}])
    b = DataFrame([{'var1': 'a', 'var2': 1},
                   {'var1': 'b', 'var2': 1},
                   {'var1': 'c', 'var2': 1},
                   {'var1': 'd', 'var2': 2},
                   {'var1': 'e', 'var2': 2},
                   {'var1': 'f', 'var2': 2}])
    ### My solution
    import pandas as pd
    import functools
    def expand_on_cols(df, fuse_cols, delim=","):
        def expand_on_col(df, fuse_col):
            col_order = df.columns
            df_expanded = pd.DataFrame(
                df.set_index([x for x in df.columns if x != fuse_col])[fuse_col]
                .apply(lambda x: x.split(delim))
                .explode()
            ).reset_index()
            return df_expanded[col_order]
        all_expanded = functools.reduce(expand_on_col, fuse_cols, df)
        return all_expanded
    
    assert(b.equals(expand_on_cols(a, ["var1"], delim=",")))
    

提交回复
热议问题