Constructing pandas DataFrame from values in variables gives “ValueError: If using all scalar values, you must pass an index”

霸气de小男生 提交于 2019-12-17 03:45:46

问题


This may be a simple question, but I can not figure out how to do this. Lets say that I have two variables as follows.

a = 2
b = 3

I want to construct a DataFrame from this:

df2 = pd.DataFrame({'A':a,'B':b})

This generates an error:

ValueError: If using all scalar values, you must pass an index

I tried this also:

df2 = (pd.DataFrame({'a':a,'b':b})).reset_index()

This gives the same error message.


回答1:


The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns -- e.g. use a list:

>>> df = pd.DataFrame({'A': [a], 'B': [b]})
>>> df
   A  B
0  2  3

or use scalar values and pass an index:

>>> df = pd.DataFrame({'A': a, 'B': b}, index=[0])
>>> df
   A  B
0  2  3



回答2:


You can also use pd.DataFrame.from_records which is more convenient when you already have the dictionary in hand:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }])

You can also set index, if you want, by:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }], index='A')



回答3:


You need to create a pandas series first. The second step is to convert the pandas series to pandas dataframe.

import pandas as pd
data = {'a': 1, 'b': 2}
pd.Series(data).to_frame()

You can even provide a column name.

pd.Series(data).to_frame('ColumnName')



回答4:


You may try wrapping your dictionary in to list

my_dict = {'A':1,'B':2}

pd.DataFrame([my_dict])

   A  B
0  1  2



回答5:


You need to provide iterables as the values for the Pandas DataFrame columns:

df2 = pd.DataFrame({'A':[a],'B':[b]})



回答6:


Maybe Series would provide all the functions you need:

pd.Series({'A':a,'B':b})

DataFrame can be thought of as a collection of Series hence you can :

  • Concatenate multiple Series into one data frame (as described here )

  • Add a Series variable into existing data frame ( example here )




回答7:


You could try:

df2 = pd.DataFrame.from_dict({'a':a,'b':b}, orient = 'index')

From the documentation on the 'orient' argument: If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.




回答8:


I had the same problem with numpy arrays and the solution is to flatten them:

data = {
    'b': array1.flatten(),
    'a': array2.flatten(),
}

df = pd.DataFrame(data)



回答9:


If you intend to convert a dictionary of scalars, you have to include an index:

import pandas as pd

alphabets = {'A': 'a', 'B': 'b'}
index = [0]
alphabets_df = pd.DataFrame(alphabets, index=index)
print(alphabets_df)

Although index is not required for a dictionary of lists, the same idea can be expanded to a dictionary of lists:

planets = {'planet': ['earth', 'mars', 'jupiter'], 'length_of_day': ['1', '1.03', '0.414']}
index = [0, 1, 2]
planets_df = pd.DataFrame(planets, index=index)
print(planets_df)

Of course, for the dictionary of lists, you can build the dataframe without an index:

planets_df = pd.DataFrame(planets)
print(planets_df)



回答10:


This is because a DataFrame has two intuitive dimensions - the columns and the rows.

You are only specifying the columns using the dictionary keys.

If you only want to specify one dimensional data, use a Series!




回答11:


Pandas magic at work. All logic is out.

The error message "ValueError: If using all scalar values, you must pass an index" Says you must pass an index.

This does not necessarily mean passing an index makes pandas do what you want it to do

When you pass an index, pandas will treat your dictionary keys as column names and the values as what the column should contain for each of the values in the index.

a = 2
b = 3
df2 = pd.DataFrame({'A':a,'B':b}, index=[1])

    A   B
1   2   3

Passing a larger index:

df2 = pd.DataFrame({'A':a,'B':b}, index=[1, 2, 3, 4])

    A   B
1   2   3
2   2   3
3   2   3
4   2   3

An index is usually automatically generated by a dataframe when none is given. However, pandas does not know how many rows of 2 and 3 you want. You can however be more explicit about it

df2 = pd.DataFrame({'A':[a]*4,'B':[b]*4})
df2

    A   B
0   2   3
1   2   3
2   2   3
3   2   3

The default index is 0 based though.

I would recommend always passing a dictionary of lists to the dataframe constructor when creating dataframes. It's easier to read for other developers. Pandas has a lot of caveats, don't make other developers have to experts in all of them in order to read your code.




回答12:


This is a comment to the reply of @fAx: the input does not have to be a list of records - it can be a single dictionary as well:

pd.DataFrame.from_records({'a':1,'b':2}, index=[0])
   a  b
0  1  2

Which seems to be equivalent to:

pd.DataFrame({'a':1,'b':2}, index=[0])
   a  b
0  1  2



回答13:


If you have a dictionary you can turn it into a pandas data frame with the following line of code:

pd.DataFrame({"key": d.keys(), "value": d.values()})



回答14:


Convert Dictionary to Data Frame

col_dict_df = pd.Series(col_dict).to_frame('new_col').reset_index()

Give new name to Column

col_dict_df.columns = ['col1', 'col2']



回答15:


Just pass the dict on a list:

a = 2
b = 3
df2 = pd.DataFrame([{'A':a,'B':b}])


来源:https://stackoverflow.com/questions/17839973/constructing-pandas-dataframe-from-values-in-variables-gives-valueerror-if-usi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!