List of LISTS of tuples to Pandas dataframe?

前端 未结 4 1908
花落未央
花落未央 2021-01-13 09:39

I have a list of lists of tuples, where every tuple is of equal length, and I need to convert the tuples to a Pandas dataframe in such a way that the columns of the datafram

相关标签:
4条回答
  • 2021-01-13 10:01
    tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
    print(pd.DataFrame(sum(tupList,[])))
    

    Output

               0             1     2
    0  commentID   commentText  date
    1     123456  blahblahblah  2019
    2      45678   hello world  2018
    3          0          text  2017
    
    0 讨论(0)
  • 2021-01-13 10:02

    Just flatten your list into a list of tuples (your initial list contains a sublists of tuples):

    In [1251]: tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
    
    In [1252]: pd.DataFrame([t for lst in tupList for t in lst])
    Out[1252]: 
               0             1     2
    0  commentID   commentText  date
    1     123456  blahblahblah  2019
    2      45678   hello world  2018
    3          0          text  2017
    
    0 讨论(0)
  • 2021-01-13 10:02

    A shorter code this:

    from itertools import chain
    import pandas as pd
    
    tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
    
    new_list = [x for x in chain.from_iterable(tupList)]
    df = pd.DataFrame.from_records(new_list)
    

    Edit

    You can make the list comprehension directly in the from_records function.

    0 讨论(0)
  • 2021-01-13 10:04

    You can do it like this :D

    tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
    
    # Trying list comprehension from previous stack question:
    df = pd.DataFrame([[y for y in x] for x in tupList])
    df_1 = df[0].apply(pd.Series).assign(index= range(0, df.shape[0]*2, 2)).set_index("index")
    df_2 = df[1].apply(pd.Series).assign(index= range(1, df.shape[0]*2, 2)).set_index("index")
    
    pd.concat([df_1, df_2], axis=0).sort_index()
    
    0 讨论(0)
提交回复
热议问题