The following is only the beginning for an Coursera assignment on Data Science. I hope this is not to trivial for. But I am lost on this and could not find an answer. I am a
I modified your code and was able to get the data into the dataframe. Instead of skipcolumns
(which did not work), I used the argument usecols
as follows
energy=pd.read_excel('Energy_Indicators.xls',
sheetname='Energy',
header=16,
skiprows=[16],
skipfooter=38,
usecols=[2,3,4,5]
)
Unnamed: 2 Petajoules Gigajoules %
0 Afghanistan 321 10 78.669280
1 Albania 102 35 100.000000
2 Algeria 1959 51 0.551010
3 American Samoa ... ... 0.641026
4 Andorra 9 121 88.695650
In order to make the countries as the index, you can do the following
# Rename the column Unnamed: 2 to Country
energy = energy.rename(columns={'Unnamed: 2':'Country'})
# Change the index to country column
energy.index = energy['Country']
# Drop the extra country column
energy = energy.drop('Country', axis=1)
I installed xlrd
package, with pip install xlrd
and then loaded the file successfully as follows:
In [17]: df = pd.read_excel(r"http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls",
...: sheetname='Energy',
...: header=16,
...: skiprows=[17],
...: skipfooter=38,
...: skipcolumns=2)
In [18]: df.shape
Out[18]: (227, 3)
In [19]: df.head()
Out[19]:
Energy Supply Energy Supply per capita \
NaN Afghanistan Afghanistan 321 10
Albania Albania 102 35
Algeria Algeria 1959 51
American Samoa American Samoa ... ...
Andorra Andorra 9 121
Renewable Electricity Production
NaN Afghanistan Afghanistan 78.669280
Albania Albania 100.000000
Algeria Algeria 0.551010
American Samoa American Samoa 0.641026
Andorra Andorra 88.695650
In [20]: pd.__version__
Out[20]: u'0.20.3'
In [21]: df.columns
Out[21]:
Index([u'Energy Supply', u'Energy Supply per capita',
u'Renewable Electricity Production'],
dtype='object')
Notice that I am using the last version of pandas 0.20.3
make sure you have the latest version on your system.
I think you need add parameters:
index_col
for convert column to indexusecols
- parse columns by positions15
energy=pd.read_excel('Energy Indicators.xls',
sheetname='Energy',
skiprows=[17],
skipfooter=38,
header=15,
index_col=[0],
usecols=[2,3,4,5]
)
print (energy.head())
Energy Supply Energy Supply per capita \
Afghanistan 321 10
Albania 102 35
Algeria 1959 51
American Samoa ... ...
Andorra 9 121
Renewable Electricity Production
Afghanistan 78.669280
Albania 100.000000
Algeria 0.551010
American Samoa 0.641026
Andorra 88.695650