Pandas: how to designate starting row to extract data

后端 未结 2 1234
日久生厌
日久生厌 2021-01-21 02:34

I am using Pandas library and Python.

I have an Excel file that has some heading information on the top of an Excel sheet which I do not need for data extraction.

相关标签:
2条回答
  • 2021-01-21 02:37

    You could manually check for the header line and then use read_csvs keyword argument skiprows.

    with open('data.csv') as fp:
        skip = next(filter(
            lambda x: x[1].startswith('ID'),
            enumerate(fp)
        ))[0]
    

    Then skip the rows:

    df = pandas.read_csv('data.csv', skiprows=skip)
    

    Like that you can support pre-header sections of arbitrary length.


    For Python 2:

    import itertools as it
    
    with open('data.csv') as fp:
        skip = next(it.ifilter(
            lambda x: x[1].startswith('ID'),
            enumerate(fp)
        ))[0]
    
    0 讨论(0)
  • 2021-01-21 02:55

    You can use pd.read_csv and specify skiprows=4:

    df = pd.read_csv('test.csv', skiprows=4)
    
    0 讨论(0)
提交回复
热议问题