i stuck with the problem how to devide pandas dataframe by row,
i have similar dataframe with column where values separated by \\r\\n and they are in one cell,
As commented, str.split()
followed by explode
is helpful. If you are not on Pandas 0.25, then you can use melt
afterward:
(pd.concat( (df.Shape.str.split('\r\n', expand=True),
df[['Color','Price']]),
axis=1)
.melt(id_vars=['Color', 'Price'], value_name='Shape')
.dropna()
)
Output:
Color Price variable Shape
0 Green 10 0 Rectangle
1 Blue 15 0 Rectangle
2 Green 10 1 Triangle
3 Blue 15 1 Triangle
4 Green 10 2 Octangle
You can do:
df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))
Output:
Color Shape Price
0 Green Rectangle 10
1 Green Triangle 10
2 Green Octangle 10
3 Blue Rectangle 15
4 Blue Triangle 15
First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode
to unpack the list and create new rows for each of them
df["Shape"] = df.Shape.str.split()
df.explode("Shape")
This might not be the most efficient way to do it but I can confirm that it works with the sample df:
data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])
for index, row in df.iterrows():
split = row['Shape'].split('\r\n')
for shape in split:
new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))
new_df = new_df.reset_index(drop=True)
print(new_df)
Output:
Color Price Shape
0 Green 10 Rectangle
1 Green 10 Triangle
2 Green 10 Octangle
3 Blue 15 Rectangle
4 Blue 15 Triangle