Setting columns for an empty pandas dataframe

后端 未结 2 1245
心在旅途
心在旅途 2021-01-03 06:16

This is something that I\'m confused about...

import pandas as pd

# this works fine
df1 = pd.DataFrame(columns=[\'A\',\'B\'])

# but let\'s say I have this
d         


        
相关标签:
2条回答
  • 2021-01-03 06:25

    This looks like a bug in pandas. All of these work:

    pd.DataFrame(columns=['A', 'B'])
    pd.DataFrame({}, columns=['A', 'B'])
    pd.DataFrame(None, columns=['A', 'B'])
    

    but not this:

    pd.DataFrame([], columns=['A', 'B'])
    

    Until it's fixed, I suggest something like this:

    if len(data) == 0: data = None
    df2 = pd.DataFrame(data, columns=['A','B'])
    

    or:

    df2 = pd.DataFrame(data if len(data) > 0 else None, columns=['A', 'B'])
    
    0 讨论(0)
  • 2021-01-03 06:28

    Update: as of Pandas version 0.16.1, passing data = [] works:

    In [85]: df = pd.DataFrame([], columns=['a', 'b', 'c'])
    
    In [86]: df
    Out[86]: 
    Empty DataFrame
    Columns: [a, b, c]
    Index: []
    

    so the best solution is to update your version of Pandas.


    If data is an empty list of lists, then

    data = [[]]
    

    But then len(data) would equal 1, so len(data) > 0 is not the right condition to check to see if data is an empty list of lists.

    There are a number of values for data which could make

    pd.DataFrame(data, columns=['A','B'])
    

    raise an Exception. An AssertionError or ValueError is raised if data equals [] (no data), [[]] (no columns), [[0]] (one column) or [[0,1,2]] (too many columns). So instead of trying to check for all of these I think it is safer and easier to use try..except here:

    columns = ['A', 'B']
    try:
        df2 = pd.DataFrame(data, columns=columns)
    except (AssertionError, ValueError):
        df2 = pd.DataFrame(columns=columns)
    

    It would be nice if there is a DRY-er way to write this, but given that it's the caller's responsibility to check for this, I don't see a better way.

    0 讨论(0)
提交回复
热议问题