问题
I have a dict of lists say:
data = {'a': [80, 130], 'b': [64], 'c': [58,80]}
How do I flatten it and convert it into dataframe like the one below:
回答1:
Use nested list comprehension with if-else
if want no count one element lists:
df = pd.DataFrame([('{}{}'.format(k, i), v1)
if len(v) > 1
else (k, v1)
for k, v in data.items()
for i, v1 in enumerate(v, 1)], columns=['Index','Data'])
print (df)
Index Data
0 a1 80
1 a2 130
2 b 64
3 c1 58
4 c2 80
EDIT:
data = {'a': [80, 130], 'b': np.nan, 'c': [58,80], 'd':[34]}
out = []
for k, v in data.items():
if isinstance(v, float):
out.append([k, v])
else:
for i, x in enumerate(v, 1):
if len(v) == 1:
out.append([k, x])
else:
out.append(['{}{}'.format(k, i), x])
print (out)
[['a1', 80], ['a2', 130], ['b', nan], ['c1', 58], ['c2', 80], ['d', 34]]
df = pd.DataFrame(out, columns=['Index','Data'])
print (df)
Index Data
0 a1 80.0
1 a2 130.0
2 b NaN
3 c1 58.0
4 c2 80.0
5 d 34.0
回答2:
One option to flatten the dictionary is
flattened_data = {
k + str(i): x
for k, v in data.items()
for i, x in enumerate(v)
}
resulting in
{'a0': 80, 'a1': 130, 'b0': 64, 'c0': 58, 'c1': 80}
If you insist on 1-based indexing, you can use enumerate(v, 1)
instead of enumerate(v)
. If you want to omit the index in cases where the list has only a single entry, you should use a for loop instead of the dictionary comprehension.
回答3:
Using pd.DataFrame
constructor and GroupBy
+ cumcount
:
data = {'a': [80, 130], 'b': [64], 'c': [58,80]}
df = pd.DataFrame([[k, w] for k, v in data.items() for w in v],
columns=['Index', '0'])
df['Index'] = df['Index'] + (df.groupby('Index').cumcount() + 1).astype(str)
print(df)
Index 0
0 a1 80
1 a2 130
2 b1 64
3 c1 58
4 c2 80
回答4:
Another way is using from_dict
with orient
parameter set to 'index' and stack
, lastly flatten the multilevels in the index using map
and format
:
df = pd.DataFrame.from_dict(data, orient='index')
df_out = df.rename(columns=lambda x: x+1).stack()
df_out.index = df_out.index.map('{0[0]}{0[1]}'.format)
print(df_out)
Output:
a1 80.0
a2 130.0
b1 64.0
c1 58.0
c2 80.0
dtype: float64
回答5:
Using itertools
and pd.io._maybe_dedup_names
x = (itertools.product(s[0],s[1]) for s in data.items())
z = [item for z in x for item in z]
df = pd.DataFrame(z).set_index(0)
df.index = pd.io.parsers.ParserBase({'names':df.index})._maybe_dedup_names(df.index)
1
a 80
a.1 130
b 64
c 58
c.1 80
回答6:
I was having fun with variations on Sven Marnach's answer
defaultdict
and count
from collections import defaultdict
from itertools import count
c = defaultdict(lambda:count(1))
{f"{k}{['', next(c[k])][len(V) > 1]}": v for k, V in data.items() for v in V}
{'a1': 80, 'a2': 130, 'b': 64, 'c1': 58, 'c2': 80}
enumerate
{f"{k}{['', i][len(V) > 1]}": v for k, V in data.items() for i, v in enumerate(V, 1)}
{'a1': 80, 'a2': 130, 'b': 64, 'c1': 58, 'c2': 80}
回答7:
Imo you should first get the list of dict roots and list of dict leafs.
Like so : [a,b,c]
and [[80,130],[64],[58,80]]
Then just parallelize them with a loop to get
[a1,a2,b,c1,c2]
and [80,130,64,58,80]
(this should take only a few lines of code)
Then load it into a dataframe.
If you need more precise code you can ask :)
来源:https://stackoverflow.com/questions/51654012/flatten-of-dict-of-lists-into-a-dataframe