问题
Using openpyxl
I tried to read from the fifth line for some files. The files' first four lines are the header. Then the main content has a different format from the header. And I tried the method:
import openpyxl
file_name="xxx.xlsx"
wb = openpyxl.load_workbook(filename=file_name, use_iterators = True)
first_sheet = workbook.get_sheet_names()[0]
ws = workbook.get_sheet_by_name(first_sheet)
for index, row in enumerate(ws.iter_rows()):
if start < index < stop:
for c in row:
print c.value
It will always have the error:
IndexError: list index out of range
If I delete the first four lines, the data can be read into Python easily. But I have hundreds of such files, each file has a header for four lines. It will take way much time to delete all the headers from the files.
How to skip first several lines when reading using openpyxl
correctly?
回答1:
You can pass a range into ws.iter_rows('A4:Z256')
but you're probably better off using ws.get_squared_range(1, 5,)
回答2:
You can skip the first N
rows by passing the optional min_row
argument. Note that this uses a 1-base index, so min_row=2
starts on the second row and min_row=5
skips the first four rows. You would be using something like this:
for index, row in enumerate(ws.iter_rows(min_row=5)):
Full iter_rows documentation.
来源:https://stackoverflow.com/questions/28929310/how-can-i-skip-first-several-lines-of-the-excel-sheet