I spent a while looking through SO and seems I have a unique problem.
I have a dictionary that looks like the following:
dict={
123: [2,4],
2
It may be of interest to some that Roger Fan's pandas.DataFrame(dict)
method is actually pretty slow if you have a ton of indices. The faster way is to just preprocess the data into separate lists and then create a DataFrame out of these lists.
(Perhaps this was explained in levi's answer, but it is gone now.)
For example, consider this dictionary, dict1
, where each value is a list. Specifically, dict1[i] = [ i*10, i*100]
(for ease of checking the final dataframe).
keys = range(1000)
values = zip(np.arange(1000)*10, np.arange(1000)*100)
dict1 = dict(zip(keys, values))
It takes roughly 30 times as long with the pandas method. E.g.
t = time.time()
test1 = pd.DataFrame(dict1).transpose()
print time.time() - t
0.118762016296
versus:
t = time.time()
keys = []
list1 = []
list2 = []
for k in dict1:
keys.append(k)
list1.append(dict1[k][0])
list2.append(dict1[k][1])
test2 = pd.DataFrame({'element1': list1, 'element2': list2}, index=keys)
print time.time() - t
0.00310587882996