I have a pandas dataframe
in which one column of text strings contains comma-separated values. I want to split each CSV field and create a new row per entry (as
Here is a fairly straightforward message that uses the split
method from pandas str
accessor and then uses NumPy to flatten each row into a single array.
The corresponding values are retrieved by repeating the non-split column the correct number of times with np.repeat
.
var1 = df.var1.str.split(',', expand=True).values.ravel()
var2 = np.repeat(df.var2.values, len(var1) / len(df))
pd.DataFrame({'var1': var1,
'var2': var2})
var1 var2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2