Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.
How can I create a
A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:
>>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}
>>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })
>>> dict_df
one 2 3
0 1.0 4 8.0
1 2.0 5 NaN
2 3.0 6 NaN
3 NaN 7 NaN
A similar syntax exists for lists, too:
>>> mylist = [ [1,2,3], [4,5], 6 ]
>>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])
>>> list_df
0 1 2
0 1.0 2.0 3.0
1 4.0 5.0 NaN
2 6.0 NaN NaN
Another syntax for lists is:
>>> mylist = [ [1,2,3], [4,5], 6 ]
>>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })
>>> list_df
0 1 2
0 1 4.0 6.0
1 2 5.0 NaN
2 3 NaN NaN
You may additionally have to transpose the result and/or change the column data types (float, integer, etc).
You can also use pd.concat
along axis=1
with a list of pd.Series
objects:
import pandas as pd, numpy as np
d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])}
res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1)
print(res)
A B
0 1.0 1
1 2.0 2
2 NaN 3
3 NaN 4
pd.DataFrame([my_dict]) will do!
While this does not directly answer the OP's question. I found this to be an excellent solution for my case when I had unequal arrays and I'd like to share:
from pandas documentation
In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
....: 'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
....:
In [32]: df = DataFrame(d)
In [33]: df
Out[33]:
one two
a 1 1
b 2 2
c 3 3
d NaN 4
If you don't want it to show NaN
and you have two particular lengths, adding a 'space' in each remaining cell would also work.
import pandas
long = [6, 4, 7, 3]
short = [5, 6]
for n in range(len(long) - len(short)):
short.append(' ')
df = pd.DataFrame({'A':long, 'B':short}]
# Make sure Excel file exists in the working directory
datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter')
df.to_excel(datatoexcel,sheet_name = 'Sheet1')
datatoexcel.save()
A B
0 6 5
1 4 6
2 7
3 3
If you have more than 2 lengths of entries, it is advisable to make a function which uses a similar method.
list
of DataFrames
with pandas.DataFrame, from a dict
of uneven arrays
, and then concat the arrays together in a list-comprehension.
DataFrame
of arrays
, that are not equal in length.arrays
, use df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
import pandas as pd
import numpy as np
# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma * np.random.randn(10, 1)
x2 = mu + sigma * np.random.randn(15, 1)
x3 = mu + sigma * np.random.randn(20, 1)
data = {'x1': x1, 'x2': x2, 'x3': x3}
# create the dataframe
df = pd.concat([pd.DataFrame(v, columns=[k]) for k, v in data.items()], axis=1)
zip_longest
fills missing values with the fillvalue
.DataFrame
constructor won't unpack it.from itertools import zip_longest
# zip all the values together
zl = list(zip_longest(*data.values()))
# create dataframe
df = pd.DataFrame(zl, columns=data.keys())
df.plot(marker='o', figsize=[10, 5])
x1 x2 x3
0 232.06900 235.92577 173.19476
1 176.94349 209.26802 186.09590
2 194.18474 168.36006 194.36712
3 196.55705 238.79899 218.33316
4 249.25695 167.91326 191.62559
5 215.25377 214.85430 230.95119
6 232.68784 240.30358 196.72593
7 212.43409 201.15896 187.96484
8 188.97014 187.59007 164.78436
9 196.82937 252.67682 196.47132
10 NaN 223.32571 208.43823
11 NaN 209.50658 209.83761
12 NaN 215.27461 249.06087
13 NaN 210.52486 158.65781
14 NaN 193.53504 199.10456
15 NaN NaN 186.19700
16 NaN NaN 223.02479
17 NaN NaN 185.68525
18 NaN NaN 213.41414
19 NaN NaN 271.75376