split rows in pandas dataframe

后端 未结 4 742
無奈伤痛
無奈伤痛 2021-01-05 16:17

i stuck with the problem how to devide pandas dataframe by row,

i have similar dataframe with column where values separated by \\r\\n and they are in one cell,

相关标签:
4条回答
  • 2021-01-05 16:39

    As commented, str.split() followed by explode is helpful. If you are not on Pandas 0.25, then you can use melt afterward:

    (pd.concat( (df.Shape.str.split('\r\n', expand=True), 
                df[['Color','Price']]),
              axis=1)
       .melt(id_vars=['Color', 'Price'], value_name='Shape')
       .dropna()
    )
    

    Output:

       Color  Price variable      Shape
    0  Green     10        0  Rectangle
    1   Blue     15        0  Rectangle
    2  Green     10        1   Triangle
    3   Blue     15        1   Triangle
    4  Green     10        2   Octangle
    
    0 讨论(0)
  • 2021-01-05 16:57

    You can do:

    df["Shape"]=df["Shape"].str.split("\r\n")
    print(df.explode("Shape").reset_index(drop=True))
    

    Output:

       Color    Shape   Price
    0   Green   Rectangle   10
    1   Green   Triangle    10
    2   Green   Octangle    10
    3   Blue    Rectangle   15
    4   Blue    Triangle    15
    
    0 讨论(0)
  • 2021-01-05 17:02

    First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode to unpack the list and create new rows for each of them

    df["Shape"] = df.Shape.str.split()
    df.explode("Shape")
    
    0 讨论(0)
  • 2021-01-05 17:03

    This might not be the most efficient way to do it but I can confirm that it works with the sample df:

    data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]   
    df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
    new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])
    
    for index, row in df.iterrows():
        split = row['Shape'].split('\r\n')
        for shape in split:
            new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))
    
    new_df = new_df.reset_index(drop=True)
    print(new_df)
    

    Output:

       Color Price      Shape
    0  Green    10  Rectangle
    1  Green    10   Triangle
    2  Green    10   Octangle
    3   Blue    15  Rectangle
    4   Blue    15   Triangle
    
    0 讨论(0)
提交回复
热议问题