问题
I need to use python to sort an excel spreadsheet by a given row. For the testing, I'm using this data (in a file named xlwings sorting.xlsx):
Numbers Letters Letters_2
7 A L
6 B K
5 C M
4 D J
3 E N
2 F I
1 G H
Which should be sorted into this:
Numbers Letters Letters_2
1 G H
2 F I
3 E N
4 D J
5 C M
6 B K
7 A L
One would think this to be a trivial task, but there seems to be nothing in the way of any documentation (if there is something, it's buried so deep that two days of reading hasn't uncovered it) in either the xlwings docs or the pywin32 ones regarding column sorting.
The closest thing I could find anywhere online was this question, which has no answer and just redirects to a github bug thread that had no resolution.
Still, I have managed to cobble together the following code based on the questioner's:
import xlwings as xw
from xlwings.constants import SortOrder
bk = xw.Book(r'C:\Users\username\Documents\Test Files\xlwings sorting.xlsx')
sht = bk.sheets['Sheet1']
def xl_col_sort(sht,col_num):
sht.range('a2').api.Sort(sht.range((2,col_num)).api,SortOrder.xlAscending)
return
xl_col_sort(sht,1)
This runs, but I have no idea how the syntax is working. I can't even tell why the first range('a2')
call is necessary, but it throws an exception if I try directly calling sht.api.Sort
. I tried looking directly at the code with ipython's ?? feature but it just gives me <xlwings._xlwindows.COMRetryObjectWrapper object at 0x0000001375A459E8>
with no docstring. I then tried to actually ctrl+F through the .py files for the Sort()
function, but ran into a dead end in a huge list of COM wrappers and couldn't track down the actual module containing the function.
At any rate, even if I haven't a clue how, the test case works; so the next step is putting this function into a class that contains an excel workbook and sheet to use the function as a method. I rewrite the code both to be used as a method and to take strings instead of column numbers (new columns are added to the middle of the worksheet frequently, so the number would change often):
class Metrics:
# self.sheet is a sheet object based on self.book opened with xlwings
# a bunch of other methods and attributes
def xl_col_sort(self,col):
# +2 because excel starts at 1 (+1) and the dataframe self.df
# uses a data column as the index (+1)
col_num = np.where(self.df.columns == col)[0][0] + 2
so = xw.constants.SortOrder
self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
return
I can't see that anything has functionally changed, here. It's still receiving the same arguments, even if they go through an additional step to be created. Yet attempting to run this produces a MemoryError
:
In[1]: metrics.xl_col_sort('Exp. Date')
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-3-f1de8b0e8e98> in <module>()
----> 1 metrics.xl_col_sort('Exp. Date')
C:\Users\username\Documents\Projects\PyBev\pyBev_0-3-1\pybev\metricsobj.py in xl_col_sort(self, col)
146 so = xw.constants.SortOrder
147
--> 148 self.sheet.range('a2').api.Sort(self.sheet.range((2,col_num)).api, so.xlAscending)
149 return
150 # def monday_backup(self):
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\main.py in range(self, cell1, cell2)
818 raise ValueError("Second range is not on this sheet")
819 cell2 = cell2.impl
--> 820 return Range(impl=self.impl.range(cell1, cell2))
821
822 @property
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in range(self, arg1, arg2)
576 if 0 in arg1:
577 raise IndexError("Attempted to access 0-based Range. xlwings/Excel Ranges are 1-based.")
--> 578 xl1 = self.xl.Cells(arg1[0], arg1[1])
579 elif isinstance(arg1, numbers.Number) and isinstance(arg2, numbers.Number):
580 xl1 = self.xl.Cells(arg1, arg2)
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\xlwings\_xlwindows.py in __call__(self, *args, **kwargs)
149 for i in range(N_COM_ATTEMPTS + 1):
150 try:
--> 151 v = self._inner(*args, **kwargs)
152 t = type(v)
153 if t is CDispatch:
C:\Users\username\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\win32com\client\dynamic.py in __call__(self, *args)
190 if invkind is not None:
191 allArgs = (dispid,LCID,invkind,1) + args
--> 192 return self._get_good_object_(self._oleobj_.Invoke(*allArgs),self._olerepr_.defaultDispatchName,None)
193 raise TypeError("This dispatch object does not define a default method")
194
MemoryError: CreatingSafeArray
Does anyone know how the syntax of this thing works or why it's breaking when put inside the method?
回答1:
This turned out to be an incredibly subtle error, so I figured I'd post the answer in case someone ends up googling this in a year trying to do something similar.
In short, the sheet.range()
method only accepts coordinates that are integers, and the expression:
col_num = np.where(self.df.columns == col)[0][0] + 2
produces a floating point number. Why this produces a MemoryError
instead of a syntax error is beyond me, but probably an oversight. The devs do seem to know about it, though.
Additionally, the syntax is not listed in the aforementioned docs because it is actually VBA code, as found here. The Sort()
method only works on Range
objects, hence the first sht.range()
call requirement.
And finally, in case anyone wants a simplified function to encapsulate all this nonsense:
import xlwings as xw
bk = xw.Book(file_path)
sheet = bk.sheets['Sheet1'] # or whatever the sheet is named
def xl_col_sort(sheet,col_num):
sheet.range((2,col_num)).api.Sort(Key1=sheet.range((2,col_num)).api, Order1=1)
return
来源:https://stackoverflow.com/questions/45223182/sorting-with-xlwings-pywin32