When using the idxmax()
function in Pandas, I keep receiving this error.
Traceback (most recent call last):
File \"/Users/username/College/yea
The type of the cell values are, by default, non-numeric. argmin()
, idxmin()
, argmax()
and other similar functions need the dtypes to be numeric.
The easiest solution is to use pd.to_numeric()
in order to convert your series (or columns) to numeric types. An example with a data frame df
with a column 'a'
would be:
df['a'] = pd.to_numeric(df['a'])
A more complete answer on type casting on pandas can be found here.
Hope that helps :)
In short, try this
best_c = results_table.loc[results_table['Mean recall score'].astype(float).idxmax()]['C_parameter']
instead of
best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']
#best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter']
1) the type of "mean recall score" is object, you can't use "idxmax()" to calculate the value 2) you should change "mean recall score" from "object " to "float" 3) you can use apply(pd.to_numeric, errors = 'coerce', axis = 0) to do such things.
best_c = results_table
best_c.dtypes.eq(object) # you can see the type of best_c
new = best_c.columns[best_c.dtypes.eq(object)] #get the object column of the best_c
best_c[new] = best_c[new].apply(pd.to_numeric, errors = 'coerce', axis=0) # change the type of object
best_c
best_c = results_table.loc[results_table['Mean recall score'].idxmax()]['C_parameter'] #calculate the mean values
If NaN are present (and we can sort of see this by the stack trace) then when you think you are working with a data frame of numerics, you could well have mixed types, and in particular, a string among numerics. Let me give you 3 code examples, the first 2 work, the last doesn't and is likely your case.
This represents all numeric data, it will work with idxmax
the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, 0.4]
the_df = pd.DataFrame(the_dict)
This represents a numeric nan, it will work idxmax
the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, np.NaN]
the_df = pd.DataFrame(the_dict)
This could be the exact problem reported by the OP, but if it turns out we have mixed types in any fashion, we will get the error the OP reported.
the_dict = {}
the_dict['a'] = [0.1, 0.2, 0.5]
the_dict['b'] = [0.3, 0.4, 0.6]
the_dict['c'] = [0.25, 0.3, 0.9]
the_dict['d'] = [0.2, 0.1, 'NaN']
the_df = pd.DataFrame(the_dict)