How to convert OpenDocument spreadsheets to a pandas DataFrame?

前端 未结 11 1577
日久生厌
日久生厌 2020-12-23 19:40

The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame with pandas.read_excel(file) command. Under the hood,

相关标签:
11条回答
  • 2020-12-23 19:58

    Here is a quick and dirty hack which uses ezodf module:

    import pandas as pd
    import ezodf
    
    def read_ods(filename, sheet_no=0, header=0):
        tab = ezodf.opendoc(filename=filename).sheets[sheet_no]
        return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]]
                             for col in tab.columns()})
    

    Test:

    In [92]: df = read_ods(filename='fn.ods')
    
    In [93]: df
    Out[93]:
         a    b    c
    0  1.0  2.0  3.0
    1  4.0  5.0  6.0
    2  7.0  8.0  9.0
    

    NOTES:

    • all other useful parameters like header, skiprows, index_col, parse_cols are NOT implemented in this function - feel free to update this question if you want to implement them
    • ezodf depends on lxml make sure you have it installed
    0 讨论(0)
  • 2020-12-23 19:59

    There is support for reading Excel files in Pandas (both xls and xlsx), see the read_excel command. You can use OpenOffice to save the spreadsheet as xlsx. The conversion can also be done automatically on the command line, apparently, using the convert-to command line parameter.

    Reading the data from xlsx avoids some of the issues (date formats, number formats, unicode) that you may run into when you convert to CSV first.

    0 讨论(0)
  • 2020-12-23 20:05

    If you only have a few .ods files to read, I would just open it in openoffice and save it as an excel file. If you have a lot of files, you could use the unoconv command in Linux to convert the .ods files to .xls programmatically (with bash)

    Then it's really easy to read it in with pd.read_excel('filename.xls')

    0 讨论(0)
  • 2020-12-23 20:05

    Some responses have pointed out that odfpy or other external packages are needed to get this functionality, but note that in recent versions of Pandas (current is 1.1, August-2020) there is support for ODS format in functions like pd.ExcelWriter() and pd.read_excel(). You only need to specify the propper engine "odf" to be able of working with OpenDocument file formats (.odf, .ods, .odt).

    0 讨论(0)
  • 2020-12-23 20:08

    This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do

    pd.read_excel("the_document.ods", engine="odf")
    
    0 讨论(0)
提交回复
热议问题