python split data frame columns into multiple rows

后端 未结 1 768
无人及你
无人及你 2021-02-10 15:40

I have a dataframe like this:

--------------------------------------------------------------------
Product        ProductType     SKU                Size
-------         


        
1条回答
  •  一向
    一向 (楼主)
    2021-02-10 16:02

    This is open to bugs so use with caution:

    Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)

    df["Product"] = df["Product"].map(list) * df["SKU"].map(len)
    
    Out[184]: 
                        SKU           Size       Product
    0  [111, 222, 333, 444]  [XS, S, M, L]  [a, a, a, a]
    1            [555, 666]         [M, L]        [b, b]
    

    Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict():

    pd.DataFrame(df.sum().to_dict())
    Out[185]: 
      Product  SKU Size
    0       a  111   XS
    1       a  222    S
    2       a  333    M
    3       a  444    L
    4       b  555    M
    5       b  666    L
    

    Edit:

    For several columns, you can define the columns to be repeated:

    cols_to_be_repeated = ["Product", "ProductType"]
    

    Save the rows that has None values in another dataframe:

    na_df = df[pd.isnull(df["SKU"])].copy()
    

    Drop None's from the original dataframe:

    df.dropna(inplace = True)
    

    Iterate over those columns:

    for col in cols_to_be_repeated:
        df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)
    

    And use the same approach:

    pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])
    
            Product ProductType    SKU  Size
    0       T-shirt         Top  111.0    XS
    1       T-shirt         Top  222.0     S
    2       T-shirt         Top  333.0     M
    3       T-shirt         Top  444.0     L
    4  Pant(Flared)     Bottoms  555.0     M
    5  Pant(Flared)     Bottoms  666.0     L
    2       Sweater         Top    NaN  None
    

    It might be better to work on a copy of the original dataframe.

    0 讨论(0)
提交回复
热议问题