How to duplicate Python dataframe one by one?

后端未结

关注

 5  1573

I have a pandas.DataFrame as follows:

I\'d like to make this three times to become:

相关标签:

5条回答

耶瑟儿～

2021-01-05 13:11

I do not know if it is more efficient than your loop, but it easy enough to construct as:

Code:

pd.concat([df] * 3).sort_index()

Test Code:

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('ab'))
print(pd.concat([df] * 3).sort_index())

Results:

0 讨论(0)

囚心锁ツ

2021-01-05 13:12

You can use numpy.repeat with parameter scalar 3 and then add columns parameter to DataFrame constructor:

df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
print (df)
   a  b
0  1  2
1  1  2
2  1  2
3  3  4
4  3  4
5  3  4

If really want duplicated index what can complicated some pandas functions like reindex which failed:

r = np.repeat(np.arange(len(df.index)), 3)
df = pd.DataFrame(df.values[r], df.index[r], df.columns)
print (df)
   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

0 讨论(0)

有刺的猬

2021-01-05 13:22

You can use np.repeat

df = pd.DataFrame(np.repeat(df.values,[3,3], axis = 0), columns = df.columns)

You get

Time testing:

%timeit pd.DataFrame(np.repeat(df.values,[3,3], axis = 0))
1000 loops, best of 3: 235 µs per loop

%timeit pd.concat([df] * 3).sort_index()
best of 3: 1.26 ms per loop

Numpy is definitely faster in most cases so no surprises there

EDIT: I am not sure if you would be looking for repeating indices but incase you do,

pd.DataFrame(np.repeat(df.values,3, axis = 0), index = np.repeat(df.index, 3), columns = df.columns)

0 讨论(0)

甜味超标

2021-01-05 13:25
Build a one dimensional indexer to slice both the the values array and index. You must take care of the index as well to get your desired results.
- use np.repeat on an np.arange to get the indexer
- construct a new dataframe using this indexer on both values and the index
```
r = np.arange(len(df)).repeat(3)
pd.DataFrame(df.values[r], df.index[r], df.columns)

   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

后悔当初

2021-01-05 13:29

Not the fastest (not the slowest either) but the shortest solution so far.

#Build a index array and extract the rows to build the desired new df. This handles index and data all at once.    
df.iloc[np.repeat(df.index,3)]

Out[270]: In [271]: 
   a  b
0  1  2
0  1  2
0  1  2
1  3  4
1  3  4
1  3  4

0 讨论(0)