问题
I have downloaded few sales dataset from a SAP application. SAP has automatically converted the data to .XLS file. Whenever I open it using Pandas
library I am getting the following error:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'
When I opened the .XLS file using MSEXCEL it is shows a popup saying that the file is corrupt or unsupported extension do you want to continue
when I clicked 'Yes' its showing the correct data. When I saved the file again as .xls using msexcel I am able to use it using Pandas
.
So, I tried renaming the file using os.rename()
but it dint work. I tried opening the file and removing \xff\xfe\r\x00\n\x00\r\x00
, but then also it dint work.
The solution is to open MSEXCEL and save the file again as .xls manually, is there any way to automate this. Kindly help.
回答1:
Finally I converted the corrupt .xls
to a correct .xls
file. The following is the code:
# Changing the data types of all strings in the module at once
from __future__ import unicode_literals
# Used to save the file as excel workbook
# Need to install this library
from xlwt import Workbook
# Used to open to corrupt excel file
import io
filename = r'SALEJAN17.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('myexcel.xls')
import pandas as pd
df = pd.ExcelFile('myexcel.xls').parse('Sheet1')
No errors.
回答2:
The other way to solve this problem is using win32com.client library:
import win32com.client
import os
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
filename = os.getcwd() + '/' + 'SALEJAN17.xls'
output = os.getcwd() + '/' + 'myexcel.xlsx'
wb = o.Workbooks.Open(filename)
wb.ActiveSheet.SaveAs(output,51)
In my example you save to .xlsx format but you can save as .xls as well.
来源:https://stackoverflow.com/questions/43985768/python-converting-corrupt-xls-file