Lets say I want to create and fill an empty dataframe with values from a loop.
import pandas as pd
import numpy as np
years = [2013, 2014, 2015]
dn=pd.Data
import pandas as pd
years = [2013, 2014, 2015]
dn = []
for year in years:
df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
year: [1, 1, 1 ],
}).set_index('Incidents')
dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)
yields
2013 2014 2015
Incidents
C 1 1 1
B 1 1 1
A 1 1 1
Note that calling pd.concat
once outside the loop is more time-efficient
than calling pd.concat
with each iteration of the loop.
Each time you call pd.concat
new space is allocated for a new DataFrame, and
all the data from each component DataFrame is copied into the new DataFrame. If
you call pd.concat
from within the for-loop then you end up doing on the order
of n**2
copies, where n
is the number of years.
If you accumulate the partial DataFrames in a list and call pd.concat
once
outside the list, then Pandas only needs to perform n
copies to make dn
.
As far as I know you should avoid to add line by line to the dataframe due to speed issue
What I usually do is:
l1 = []
l2 = []
for i in range(n):
compute value v1
compute value v2
l1.append(v1)
l2.append(v2)
d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2