This is something that I\'m confused about...
import pandas as pd
# this works fine
df1 = pd.DataFrame(columns=[\'A\',\'B\'])
# but let\'s say I have this
d
This looks like a bug in pandas. All of these work:
pd.DataFrame(columns=['A', 'B'])
pd.DataFrame({}, columns=['A', 'B'])
pd.DataFrame(None, columns=['A', 'B'])
but not this:
pd.DataFrame([], columns=['A', 'B'])
Until it's fixed, I suggest something like this:
if len(data) == 0: data = None
df2 = pd.DataFrame(data, columns=['A','B'])
or:
df2 = pd.DataFrame(data if len(data) > 0 else None, columns=['A', 'B'])
Update: as of Pandas version 0.16.1, passing data = []
works:
In [85]: df = pd.DataFrame([], columns=['a', 'b', 'c'])
In [86]: df
Out[86]:
Empty DataFrame
Columns: [a, b, c]
Index: []
so the best solution is to update your version of Pandas.
If data
is an empty list of lists, then
data = [[]]
But then len(data)
would equal 1, so len(data) > 0
is not the right condition to check to see if data
is an empty list of lists.
There are a number of values for data
which could make
pd.DataFrame(data, columns=['A','B'])
raise an Exception. An AssertionError or ValueError is raised if data
equals []
(no data), [[]]
(no columns), [[0]]
(one column) or [[0,1,2]]
(too many columns). So instead of trying to check for all of these I think it is safer and easier to use try..except
here:
columns = ['A', 'B']
try:
df2 = pd.DataFrame(data, columns=columns)
except (AssertionError, ValueError):
df2 = pd.DataFrame(columns=columns)
It would be nice if there is a DRY-er way to write this, but given that it's the caller's responsibility to check for this, I don't see a better way.