I am reading this Excel file United Nations Energy Indicators using the code snippet here:
def convert_energy(energy):
if isinstance(energy, float):
Let's remove the converters
argument for a moment -
c = ['Energy Supply', 'Energy Supply per Capita', '% Renewable']
df = pd.read_excel("Energy Indicators.xls",
skiprows=17,
skip_footer=38,
usecols=[2,3,4,5],
na_values=['...'],
names=c,
index_col=[0])
df.index.name = 'Country'
df.head()
Energy Supply Energy Supply per Capita % Renewable
Country
Afghanistan 321.0 10.0 78.669280
Albania 102.0 35.0 100.000000
Algeria 1959.0 51.0 0.551010
American Samoa NaN NaN 0.641026
Andorra 9.0 121.0 88.695650
df.dtypes
Energy Supply float64
Energy Supply per Capita float64
% Renewable float64
dtype: object
Your data loads just fine without a converter. There's a trick to understanding why this happens.
By default, pandas
will read in the column and try to "interpret" your data. By specifying your own converter, you override pandas conversion, so this does not happen.
pandas passes integer and string values to convert_energy
, so the isinstance(energy, float)
is never evaluated to True
. Instead, the else
runs, and these values are returned as is, so your resultant column is a mixture of strings and integers. If you put a print(type(energy))
inside your function, this becomes obvious.
Since you have mixtures of types, the resultant type is object
. However, if you do not use a converter, pandas will attempt to interpret your data, and will successfully parse it to numeric.
So, just doing -
df['Energy Supply'] *= 1000000
Would be more than enough.