Writing array to Excel in Python with win32com

前端 未结 3 821
鱼传尺愫
鱼传尺愫 2021-01-01 01:02

I\'m making a Python script which parses data files. The parsed data is then sent to an Excel file. The data can be rather huge. I\'m looking at 10 to 20 columns, but the nu

相关标签:
3条回答
  • 2021-01-01 01:17

    Using the COM for reading files from a excel file is a extreme waste of time. It's like killing flyes with a tank. Take into account that the win32com does complicated calls with the windows API, that talk to excel, retrieves the data and sends it back to python. Why do that when the information is already there as a file?

    There are libraries that parse directly the excel file, and as you can imagine they can be x100 times faster, since there is no over-complex calls to the win API.

    I've worked a lot successfully with openpyxl, but there are others libraries out there that can be as good or even better.

    Just an example for huge data (uses generators instead of loading everything into memory):

    from openpyxl import load_workbook
    wb = load_workbook(filename='large_file.xlsx', use_iterators=True)
    ws = wb.get_sheet_by_name(name='big_data') # ws is now an IterableWorksheet
    
    for row in ws.iter_rows(): # it brings a new method: iter_rows()
         for cell in row:
             print cell.internal_value
    

    Equivalent methods are available to write into cells. You can even format them, although it is not (or used to be) very complete.

    EDIT

    Example on how to write a big amount of info to a xlsx file:

    from openpyxl import Workbook
    from openpyxl.cell import get_column_letter
    
    wb = Workbook()
    dest_filename = r'empty_book.xlsx'
    ws = wb.active
    ws.title = "range names"
    
    for col_idx in xrange(1, 40):
        col = get_column_letter(col_idx)
        for row in xrange(1, 600):
            ws.cell('%s%s'%(col, row)).value = '%s%s' % (col, row)
    
    ws = wb.create_sheet()
    ws.title = 'Pi'
    ws['F5'] = 3.14
    wb.save(filename=dest_filename)
    
    0 讨论(0)
  • 2021-01-01 01:19

    I know the post is a bit old.
    However, being stuck with win32com for some other reason (macro executions) and needing a similar solution to your 3rd sheet way (complete array at once), I tried the code of your initial post and found a small mistake that prevented it to work.
    So to answer your original question "What am I doing wrong?":
    You forgot to re-initialize the row variable to 1 before using it in the '#no loop, write array to range' part.
    Here is the win32com part of your comparison post updated then:

    print "Writing using win32com.client"
    start = time.time()
    row = 1
    sheet.Range(sheet.Cells(row,1), sheet.Cells(row+len(data_array)-1, len(data_array[0]))).Value = data_array
    print "Processing time: " + str(time.time() - start) + " seconds."
    
    book.SaveAs(Filename="C:\Temp\Temp.xls", FileFormat=56)
    print "Completed: " + str(time.time() - start) + " seconds."
    

    Then, timing is no so bad:
    Writing using win32com.client
    Processing time: 0.322000026703 seconds.
    Completed: 1.73699998856 seconds.
    Pretty fast then. Maybe your comparison can be updated since the PyExcelerate numbers are not so different anymore (and my computer is slower).

    0 讨论(0)
  • 2021-01-01 01:25

    I have been doing some research on this. And have been coming to some interesting conclusions.

    There are indeed multiple solutions around to write data to excel from python. I ended up focussing on three modules.

    win32com.client

    Works slow. However the document can be opened in excel. So the end result is available for the user in excel to start working on. Not great for large amounts of data.

    Writing 10,000 rows with 10 columns takes on my computer (core i5) app. 70 seconds.

    import win32com.client
    import time
    
    #create data array
    row = range(0,10)
    i = 0
    data_array = []
    while i < 10000:
        data_array.append(row)
        i += 1
    
    #write the array to an excel file
    excel = win32com.client.Dispatch("Excel.Application")
    excel.Visible = True
    excel.ScreenUpdating = False
    book = excel.Workbooks.Add()
    sheet = book.Worksheets(1)
    
    #single loop, writing a row to a range
    print "Writing using win32com.client"
    start = time.time()
    row = 0
    for line in data_array:
        row += 1
        sheet.Range(sheet.Cells(row,1), sheet.Cells(row, len(line))).Value = line
    print "Processing time: " + str(time.time() - start) + " seconds."
    
    print "Completed: " + str(time.time() - start) + " seconds."
    excel.ScreenUpdating = True
    sheet = None
    book = None
    excel.Quit()
    excel = None
    

    openpyxl

    Is a bit faster, but still not great. This module is slow on transfering the data to the excel memory object, but save it really fast. It creates the 10,000 lines with 10 columns in 22.3 seconds, and saves the file in another 0.5 seconds. When I test this with 100,000 rows and 10 columns. The data is created in 228.3 seconds, and saving the file is done in another 2.9 seconds. Fairly slow, but the file saving is fast. Hence openpyxl is probably good for making changes (formatting) to existing data, although I have not testing this yet. Another advantage is that coding with openpyxl is easier then using win32com.client.

    import openpyxl
    import sys
    import time
    
    #create data array
    row = range(0,10)
    i = 0
    data_array = []
    while i < 10000:
        data_array.append(row)
        i += 1
    
    #create an excel workbook and sheet object
    book = openpyxl.Workbook(optimized_write = True)
    
    #single loop, writing rows
    start = time.time()
    print "Writing with single loop using openpyxl"
    sheet = book.create_sheet()
    for line in data_array:
        sheet.append(line)
    print "Processing time: " + str(time.time() - start) + " seconds."
    
    #close and save the file.
    book.save('openpyxl.xlsx')
    print "Total time: " + str(time.time() - start) + " seconds."
    

    I am facing another issue with openpyxl. In my real tool, openpyxl is having problems with saving big amounts of data (> 10,000 rows). This I have not figured out yet, and maybe I won't bother to look further in to it.

    PyExcelerate

    This one is first of all fast. It creates the 10,000 rows and 10 columns in just 0.17 seconds. Saving the file however takes 2.2 seconds. By far the fastest option of the three. When I try to save 100,000 rows and 10 columns with this module, the excel data is created in just 1.8 seconds. However saving the file now takes 21.7 seconds. Hence, this module is really fast, but there is a penalty in writing the file. Overall still the fastest of them all. Another advantage of PyExcelerate is that coding is very easy, easier again as openpyxl.

    import pyexcelerate
    import sys
    import time
    
    #create data array
    row = range(0,10)
    i = 0
    data_array = []
    while i < 10000:
        data_array.append(row)
        i += 1
    
    print sys.version
    
    #create an excel workbook and sheet object
    book = pyexcelerate.Workbook()
    
    #single loop, writing rows
    start = time.time()
    print "Writing with single loop using PyExcelerate"
    book.new_sheet("data", data = data_array)
    
    print "Processing time: " + str(time.time() - start) + " seconds."
    
    #close and save the file.
    book.save('pyexcelerate.xlsx')
    print "Total time: " + str(time.time() - start) + " seconds."
    

    So my conclusion is that PyExcelerate is by far the fastest. The advantage of win32com.client is that the created excel file can be openend in excel to make the created data available for the user to start working with it. Openpyxl could be interesting to make style changes after creating it. However, I have not tested this yet. Hence, it could be beneficial to combine win32com.client, openpyxl and PyExcelerate in one application.

    0 讨论(0)
提交回复
热议问题