Creating dataframe from a dictionary where entries have different lengths

前端 未结 9 1587
生来不讨喜
生来不讨喜 2020-11-22 14:04

Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.

How can I create a

相关标签:
9条回答
  • 2020-11-22 14:27

    A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:

    >>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}
    
    >>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })
    
    >>> dict_df
    
       one  2    3
    0  1.0  4  8.0
    1  2.0  5  NaN
    2  3.0  6  NaN
    3  NaN  7  NaN
    

    A similar syntax exists for lists, too:

    >>> mylist = [ [1,2,3], [4,5], 6 ]
    
    >>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])
    
    >>> list_df
    
         0    1    2
    0  1.0  2.0  3.0
    1  4.0  5.0  NaN
    2  6.0  NaN  NaN
    

    Another syntax for lists is:

    >>> mylist = [ [1,2,3], [4,5], 6 ]
    
    >>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })
    
    >>> list_df
    
       0    1    2
    0  1  4.0  6.0
    1  2  5.0  NaN
    2  3  NaN  NaN
    

    You may additionally have to transpose the result and/or change the column data types (float, integer, etc).

    0 讨论(0)
  • 2020-11-22 14:27

    You can also use pd.concat along axis=1 with a list of pd.Series objects:

    import pandas as pd, numpy as np
    
    d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])}
    
    res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1)
    
    print(res)
    
         A  B
    0  1.0  1
    1  2.0  2
    2  NaN  3
    3  NaN  4
    
    0 讨论(0)
  • 2020-11-22 14:37

    pd.DataFrame([my_dict]) will do!

    0 讨论(0)
  • 2020-11-22 14:44

    While this does not directly answer the OP's question. I found this to be an excellent solution for my case when I had unequal arrays and I'd like to share:

    from pandas documentation

    In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
       ....:      'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
       ....: 
    
    In [32]: df = DataFrame(d)
    
    In [33]: df
    Out[33]: 
       one  two
    a    1    1
    b    2    2
    c    3    3
    d  NaN    4
    
    0 讨论(0)
  • 2020-11-22 14:44

    If you don't want it to show NaN and you have two particular lengths, adding a 'space' in each remaining cell would also work.

    import pandas
    
    long = [6, 4, 7, 3]
    short = [5, 6]
    
    for n in range(len(long) - len(short)):
        short.append(' ')
    
    df = pd.DataFrame({'A':long, 'B':short}]
    # Make sure Excel file exists in the working directory
    datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter')
    df.to_excel(datatoexcel,sheet_name = 'Sheet1')
    datatoexcel.save()
    
       A  B
    0  6  5
    1  4  6
    2  7   
    3  3   
    

    If you have more than 2 lengths of entries, it is advisable to make a function which uses a similar method.

    0 讨论(0)
  • 2020-11-22 14:49

    Use pandas.DataFrame and pandas.concat

    • The following code will create a list of DataFrames with pandas.DataFrame, from a dict of uneven arrays, and then concat the arrays together in a list-comprehension.
      • This is a way to create a DataFrame of arrays, that are not equal in length.
      • For equal length arrays, use df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
    import pandas as pd
    import numpy as np
    
    
    # create the uneven arrays
    mu, sigma = 200, 25
    np.random.seed(365)
    x1 = mu + sigma * np.random.randn(10, 1)
    x2 = mu + sigma * np.random.randn(15, 1)
    x3 = mu + sigma * np.random.randn(20, 1)
    
    data = {'x1': x1, 'x2': x2, 'x3': x3}
    
    # create the dataframe
    df = pd.concat([pd.DataFrame(v, columns=[k]) for k, v in data.items()], axis=1)
    

    Use pandas.DataFrame and itertools.zip_longest

    • For iterables of uneven length, zip_longest fills missing values with the fillvalue.
    • The zip generator needs to be unpacked, because the DataFrame constructor won't unpack it.
    from itertools import zip_longest
    
    # zip all the values together
    zl = list(zip_longest(*data.values()))
    
    # create dataframe
    df = pd.DataFrame(zl, columns=data.keys())
    

    plot

    df.plot(marker='o', figsize=[10, 5])
    

    dataframe

               x1         x2         x3
    0   232.06900  235.92577  173.19476
    1   176.94349  209.26802  186.09590
    2   194.18474  168.36006  194.36712
    3   196.55705  238.79899  218.33316
    4   249.25695  167.91326  191.62559
    5   215.25377  214.85430  230.95119
    6   232.68784  240.30358  196.72593
    7   212.43409  201.15896  187.96484
    8   188.97014  187.59007  164.78436
    9   196.82937  252.67682  196.47132
    10        NaN  223.32571  208.43823
    11        NaN  209.50658  209.83761
    12        NaN  215.27461  249.06087
    13        NaN  210.52486  158.65781
    14        NaN  193.53504  199.10456
    15        NaN        NaN  186.19700
    16        NaN        NaN  223.02479
    17        NaN        NaN  185.68525
    18        NaN        NaN  213.41414
    19        NaN        NaN  271.75376
    
    0 讨论(0)
提交回复
热议问题