Sorting with xlwings (pywin32)

Deadly 提交于 2019-12-24 06:32:24

问题


I need to use python to sort an excel spreadsheet by a given row. For the testing, I'm using this data (in a file named xlwings sorting.xlsx):

Numbers Letters Letters_2
7   A   L
6   B   K
5   C   M
4   D   J
3   E   N
2   F   I
1   G   H

Which should be sorted into this:

Numbers Letters Letters_2
1   G   H
2   F   I
3   E   N
4   D   J
5   C   M
6   B   K
7   A   L

One would think this to be a trivial task, but there seems to be nothing in the way of any documentation (if there is something, it's buried so deep that two days of reading hasn't uncovered it) in either the xlwings docs or the pywin32 ones regarding column sorting.

The closest thing I could find anywhere online was this question, which has no answer and just redirects to a github bug thread that had no resolution.

Still, I have managed to cobble together the following code based on the questioner's:

import xlwings as xw
from xlwings.constants import SortOrder

bk = xw.Book(r'C:\Users\username\Documents\Test Files\xlwings sorting.xlsx')

sht = bk.sheets['Sheet1']

def xl_col_sort(sht,col_num):
    sht.range('a2').api.Sort(sht.range((2,col_num)).api,SortOrder.xlAscending)
    return

xl_col_sort(sht,1)

This runs, but I have no idea how the syntax is working. I can't even tell why the first range('a2') call is necessary, but it throws an exception if I try directly calling sht.api.Sort. I tried looking directly at the code with ipython's ?? feature but it just gives me <xlwings._xlwindows.COMRetryObjectWrapper object at 0x0000001375A459E8> with no docstring. I then tried to actually ctrl+F through the .py files for the Sort() function, but ran into a dead end in a huge list of COM wrappers and couldn't track down the actual module containing the function.

At any rate, even if I haven't a clue how, the test case works; so the next step is putting this function into a class that contains an excel workbook and sheet to use the function as a method. I rewrite the code both to be used as a method and to take strings instead of column numbers (new columns are added to the middle of the worksheet frequently, so the number would change often):

class Metrics:

    # self.sheet is a sheet object based on self.book opened with xlwings
    # a bunch of other methods and attributes

    def xl_col_sort(self,col):

        # +2 because excel starts at 1 (+1) and the dataframe self.df
        # uses a data column as the index (+1)
        col_num = np.where(self.df.columns == col)[0][0] + 2

        so = xw.constants.SortOrder

        self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
        return

I can't see that anything has functionally changed, here. It's still receiving the same arguments, even if they go through an additional step to be created. Yet attempting to run this produces a MemoryError:

In[1]:    metrics.xl_col_sort('Exp. Date')
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-3-f1de8b0e8e98> in <module>()
----> 1 metrics.xl_col_sort('Exp. Date')

C:\Users\username\Documents\Projects\PyBev\pyBev_0-3-1\pybev\metricsobj.py in xl_col_sort(self, col)
    146         so = xw.constants.SortOrder
    147 
--> 148         self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
    149         return
    150     # def monday_backup(self):
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\main.py in range(self, cell1, cell2)
    818                 raise ValueError("Second range is not on this sheet")
    819             cell2 = cell2.impl
--> 820         return Range(impl=self.impl.range(cell1, cell2))
    821 
    822     @property
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in range(self, arg1, arg2)
    576             if 0 in arg1:
    577                 raise IndexError("Attempted to access 0-based Range. xlwings/Excel Ranges are 1-based.")
--> 578             xl1 = self.xl.Cells(arg1[0], arg1[1])
    579         elif isinstance(arg1, numbers.Number) and isinstance(arg2, numbers.Number):
    580             xl1 = self.xl.Cells(arg1, arg2)
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in __call__(self, *args, **kwargs)
    149         for i in range(N_COM_ATTEMPTS + 1):
    150             try:
--> 151                 v = self._inner(*args, **kwargs)
    152                 t = type(v)
    153                 if t is CDispatch:
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\win32com\client\dynamic.py in __call__(self, *args)
    190                 if invkind is not None:
    191                         allArgs = (dispid,LCID,invkind,1) + args
--> 192                         return self._get_good_object_(self._oleobj_.Invoke(*allArgs),self._olerepr_.defaultDispatchName,None)
    193                 raise TypeError("This dispatch object does not define a default method")
    194 
MemoryError: CreatingSafeArray

Does anyone know how the syntax of this thing works or why it's breaking when put inside the method?


回答1:


This turned out to be an incredibly subtle error, so I figured I'd post the answer in case someone ends up googling this in a year trying to do something similar.

In short, the sheet.range() method only accepts coordinates that are integers, and the expression:

col_num = np.where(self.df.columns == col)[0][0] + 2

produces a floating point number. Why this produces a MemoryError instead of a syntax error is beyond me, but probably an oversight. The devs do seem to know about it, though.

Additionally, the syntax is not listed in the aforementioned docs because it is actually VBA code, as found here. The Sort() method only works on Range objects, hence the first sht.range() call requirement.

And finally, in case anyone wants a simplified function to encapsulate all this nonsense:

import xlwings as xw


bk = xw.Book(file_path)
sheet = bk.sheets['Sheet1'] # or whatever the sheet is named

def xl_col_sort(sheet,col_num):
    sheet.range((2,col_num)).api.Sort(Key1=sheet.range((2,col_num)).api, Order1=1)
return


来源:https://stackoverflow.com/questions/45223182/sorting-with-xlwings-pywin32

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!