Convert cells in dataframe with multiple values to multiple rows

前端 未结 3 1474
无人及你
无人及你 2021-01-14 15:17

My data is like this:

Name    test1     test2      Count
Emp1    X,Y        A           1
Emp2    X          A,B,C       2
Emp3    Z          C           3
<         


        
3条回答
  •  抹茶落季
    2021-01-14 15:25

    I don't believe it is that straightforward to adapt this answer highlighted by @wen to this question, so I'll propose a solution.

    You might create a function that takes a df, a column to be expanded and a separator for that column, and chain calls as many times as needed.

    def expand(df, col, sep=','):
        r = df[col].str.split(sep)
        d = {c: df[c].values.repeat(r.str.len(), axis=0) for c in df.columns}
        d[col] = [i for sub in r for i in sub]
        return pd.DataFrame(d)
    
    expand(expand(df, 'test1'), 'test2')
    
        Name    test1   test2   Count
    0   Emp1    X       A       1
    1   Emp1    Y       A       1
    2   Emp2    X       A       2
    3   Emp2    X       B       2
    4   Emp2    X       C       2
    5   Emp3    Z       C       3
    

    Suppose you have a

    df['test3'] = ['X1|X2|X3', 'X4', 'X5']
    

    such that

    >>> print(df)
    
        Name    test1   test2   Count   test3
    0   Emp1    X,Y     A       1       X1|X2|X3
    1   Emp2    X       A,B,C   2       X4
    2   Emp3    Z       C       3       X5
    

    Then,

    >>> expand(df,'test3', '|')
    
        Name    test1   test2   Count   test3
    0   Emp1    X,Y     A       1       X1
    1   Emp1    X,Y     A       1       X2
    2   Emp1    X,Y     A       1       X3
    3   Emp2    X       A,B,C   2       X4
    4   Emp3    Z       C       3       X5
    

    If you think columns size may increase substantially, you can define a function expand_all to avoid having something like expand(expand(expand(expand(........)))))). For example:

    def expand_all(df, cols, seps):
        ret = df
        for c,s in zip(cols,seps): ret = expand(ret,c,s)
        return ret
    
    >>> expand_all(df, ['test1', 'test2', 'test3'], [',', ',', '|'])
    
        Name    test1   test2   Count   test3
    0   Emp1    X       A       1       X1
    1   Emp1    X       A       1       X2
    2   Emp1    X       A       1       X3
    3   Emp1    Y       A       1       X1
    4   Emp1    Y       A       1       X2
    5   Emp1    Y       A       1       X3
    6   Emp2    X       A       2       X4
    7   Emp2    X       B       2       X4
    8   Emp2    X       C       2       X4
    9   Emp3    Z       C       3       X5
    

    Or however suitable ;)


    Detail:

    >>> expand(df, 'test1')
    
        Name    test1   test2   Count
    0   Emp1    X       A       1
    1   Emp1    Y       A       1
    2   Emp2    X       A,B,C   2
    3   Emp3    Z       C       3
    
    >>> expand(df, 'test2')
    
        Name    test1   test2   Count
    0   Emp1    X,Y     A       1
    1   Emp2    X       A       2
    2   Emp2    X       B       2
    3   Emp2    X       C       2
    4   Emp3    Z       C       3
    
    >>> expand(expand(df, 'test2'), 'test1') 
    
        Name    test1   test2   Count
    0   Emp1    X       A       1
    1   Emp1    Y       A       1
    2   Emp2    X       A       2
    3   Emp2    X       B       2
    4   Emp2    X       C       2
    5   Emp3    Z       C       3
    
    
    >>> expand(expand(df, 'test2'), 'test1').eq(expand(expand(df, 'test1'), 'test2')).all()
    
    Name     True
    test1    True
    test2    True
    Count    True
    dtype: bool
    

提交回复
热议问题