I have a pandas.DataFrame
as follows:
df1 =
a b
0 1 2
1 3 4
I\'d like to make this three times to become:
I do not know if it is more efficient than your loop, but it easy enough to construct as:
Code:
pd.concat([df] * 3).sort_index()
Test Code:
df = pd.DataFrame([[1, 2], [3, 4]], columns=list('ab'))
print(pd.concat([df] * 3).sort_index())
Results:
a b
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4
You can use numpy.repeat with parameter scalar 3
and then add columns
parameter to DataFrame constructor:
df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print (df)
a b
0 1 2
1 1 2
2 1 2
3 3 4
4 3 4
5 3 4
If really want duplicated index what can complicated some pandas functions like reindex which failed:
r = np.repeat(np.arange(len(df.index)), 3)
df = pd.DataFrame(df.values[r], df.index[r], df.columns)
print (df)
a b
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4
You can use np.repeat
df = pd.DataFrame(np.repeat(df.values,[3,3], axis = 0), columns = df.columns)
You get
a b
0 1 2
1 1 2
2 1 2
3 3 4
4 3 4
5 3 4
Time testing:
%timeit pd.DataFrame(np.repeat(df.values,[3,3], axis = 0))
1000 loops, best of 3: 235 µs per loop
%timeit pd.concat([df] * 3).sort_index()
best of 3: 1.26 ms per loop
Numpy is definitely faster in most cases so no surprises there
EDIT: I am not sure if you would be looking for repeating indices but incase you do,
pd.DataFrame(np.repeat(df.values,3, axis = 0), index = np.repeat(df.index, 3), columns = df.columns)
Build a one dimensional indexer to slice both the the values
array and index
. You must take care of the index as well to get your desired results.
np.repeat
on an np.arange
to get the indexerr = np.arange(len(df)).repeat(3)
pd.DataFrame(df.values[r], df.index[r], df.columns)
a b
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4
Not the fastest (not the slowest either) but the shortest solution so far.
#Build a index array and extract the rows to build the desired new df. This handles index and data all at once.
df.iloc[np.repeat(df.index,3)]
Out[270]: In [271]:
a b
0 1 2
0 1 2
0 1 2
1 3 4
1 3 4
1 3 4