Is it possible to get an Excel document's row count without loading the entire document into memory?

前端 未结 5 1192
暖寄归人
暖寄归人 2020-12-04 19:09

I\'m working on an application that processes huge Excel 2007 files, and I\'m using OpenPyXL to do it. OpenPyXL has two different methods of reading an Excel file - one \"no

相关标签:
5条回答
  • 2020-12-04 19:29

    This might be extremely convoluted and I might be missing the obvious, but without OpenPyXL filling in the column_dimensions in Iterable Worksheets (see my comment above), the only way I can see of finding the column size without loading everything is to parse the xml directly:

    from xml.etree.ElementTree import iterparse
    from openpyxl import load_workbook
    wb=load_workbook("/path/to/workbook.xlsx", use_iterators=True)
    ws=wb.worksheets[0]
    xml = ws._xml_source
    xml.seek(0)
    
    for _,x in iterparse(xml):
    
        name= x.tag.split("}")[-1]
        if name=="col":
            print "Column %(max)s: Width: %(width)s"%x.attrib # width = x.attrib["width"]
    
        if name=="cols":
            print "break before reading the rest of the file"
            break
    
    0 讨论(0)
  • 2020-12-04 19:30

    https://pythonhosted.org/pyexcel/iapi/pyexcel.sheets.Sheet.html see : row_range() Utility function to get row range

    if you use pyexcel, can call row_range get max rows.

    python 3.4 test pass.

    0 讨论(0)
  • 2020-12-04 19:34

    The solution suggested in this answer has been deprecated, and might no longer work.


    Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:

    wb = load_workbook(path, use_iterators=True)
    sheet = wb.worksheets[0]
    
    row_count = sheet.get_highest_row() - 1
    column_count = letter_to_index(sheet.get_highest_column()) + 1
    

    IterableWorksheet.get_highest_column returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:

    def letter_to_index(letter):
        """Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based
        column index.
    
        A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc.
    
        Args:
            letter (str): The column index letter.
        Returns:
            The column index as an integer.
        """
        letter = letter.upper()
        result = 0
    
        for index, char in enumerate(reversed(letter)):
            # Get the ASCII number of the letter and subtract 64 so that A
            # corresponds to 1.
            num = ord(char) - 64
    
            # Multiply the number with 26 to the power of `index` to get the correct
            # value of the letter based on it's index in the string.
            final_num = (26 ** index) * num
    
            result += final_num
    
        # Subtract 1 from the result to make it zero-based before returning.
        return result - 1
    

    I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.

    0 讨论(0)
  • 2020-12-04 19:43

    Python 3

    import openpyxl as xl
    
    wb = xl.load_workbook("Sample.xlsx", enumerate)
    
    #the 2 lines under do the same. 
    sheet = wb.get_sheet_by_name('sheet') 
    sheet = wb.worksheets[0]
    
    row_count = sheet.max_row
    column_count = sheet.max_column
    
    #this works fore me.
    
    0 讨论(0)
  • 2020-12-04 19:53

    Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:

        wb = load_workbook(path, use_iterators=True)
        sheet = wb.worksheets[0]
    
        row_count = sheet.max_row
        column_count = sheet.max_column
    
    0 讨论(0)
提交回复
热议问题