Importing Excel into Panda Dataframe

后端 未结 3 1391
滥情空心
滥情空心 2020-12-21 23:10

The following is only the beginning for an Coursera assignment on Data Science. I hope this is not to trivial for. But I am lost on this and could not find an answer. I am a

相关标签:
3条回答
  • 2020-12-21 23:46

    I modified your code and was able to get the data into the dataframe. Instead of skipcolumns (which did not work), I used the argument usecols as follows

    energy=pd.read_excel('Energy_Indicators.xls',
                         sheetname='Energy',
                         header=16,
                         skiprows=[16],
                         skipfooter=38,
                         usecols=[2,3,4,5]
                        )
    
        Unnamed: 2      Petajoules  Gigajoules  %
    0   Afghanistan     321         10          78.669280
    1   Albania         102         35          100.000000
    2   Algeria         1959        51          0.551010
    3   American Samoa  ...         ...         0.641026
    4   Andorra         9           121         88.695650
    

    In order to make the countries as the index, you can do the following

    # Rename the column Unnamed: 2 to Country
    energy = energy.rename(columns={'Unnamed: 2':'Country'})
    
    # Change the index to country column    
    energy.index = energy['Country']
    
    # Drop the extra country column
    energy = energy.drop('Country', axis=1)
    
    0 讨论(0)
  • 2020-12-21 23:51

    I installed xlrd package, with pip install xlrd and then loaded the file successfully as follows:

    In [17]: df = pd.read_excel(r"http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls",
        ...:                      sheetname='Energy',
        ...:                      header=16,
        ...:                      skiprows=[17],
        ...:                      skipfooter=38,
        ...:                      skipcolumns=2)  
    
    In [18]: df.shape 
    Out[18]: (227, 3)
    
    In [19]: df.head() 
    Out[19]: 
                                      Energy Supply Energy Supply per capita  \
    NaN Afghanistan    Afghanistan              321                       10   
        Albania        Albania                  102                       35   
        Algeria        Algeria                 1959                       51   
        American Samoa American Samoa           ...                      ...   
        Andorra        Andorra                    9                      121   
    
                                       Renewable Electricity Production  
    NaN Afghanistan    Afghanistan                            78.669280  
        Albania        Albania                               100.000000  
        Algeria        Algeria                                 0.551010  
        American Samoa American Samoa                          0.641026  
        Andorra        Andorra                                88.695650  
    
    In [20]: pd.__version__ 
    Out[20]: u'0.20.3'
    
    In [21]: df.columns 
    Out[21]: 
    Index([u'Energy Supply', u'Energy Supply per capita',
           u'Renewable Electricity Production'],
          dtype='object')
    

    Notice that I am using the last version of pandas 0.20.3 make sure you have the latest version on your system.

    0 讨论(0)
  • 2020-12-22 00:00

    I think you need add parameters:

    • index_col for convert column to index
    • usecols - parse columns by positions
    • change header position to 15

    energy=pd.read_excel('Energy Indicators.xls',
                         sheetname='Energy',
                         skiprows=[17],
                         skipfooter=38,
                         header=15,
                         index_col=[0],
                         usecols=[2,3,4,5]
                        )
    print (energy.head())
    
                   Energy Supply Energy Supply per capita  \
    Afghanistan              321                       10   
    Albania                  102                       35   
    Algeria                 1959                       51   
    American Samoa           ...                      ...   
    Andorra                    9                      121   
    
                    Renewable Electricity Production  
    Afghanistan                            78.669280  
    Albania                               100.000000  
    Algeria                                 0.551010  
    American Samoa                          0.641026  
    Andorra                                88.695650  
    
    0 讨论(0)
提交回复
热议问题