Fastest way to sort a large number of arrays in python

前端未结

关注

 3  2041

I am trying to sort a large number of arrays in python. I need to perform the sorting for over 11 million arrays at once.

Also, it would be nice if I could directly get

相关标签:

3条回答

南方客

2021-01-21 14:58

Your input and output is a bit confusing. Please provide some sample data.

But look into: http://pandas.pydata.org/pandas-docs/stable/api.html#reshaping-sorting-transposing Pandas sorting is as optimized as it gets. Focus on the series sort as each column of the DataFrame is more accurately represented as a series.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2021-01-21 14:59

The reason python is so much slower than R is that by python does not typecast variables (i.e. int, string, float), so part of each comparison to determine which value is larger is spent determining the variable type.

You can't fix this problem using python alone, but you can include type definitions using cython (ctypes and psyco also can perform the same function, but I prefer cython). An simple example of how this works is on http://docs.cython.org/src/quickstart/cythonize.html

Cython compiles a .c version of your python file, that can be imported instead of the .py to reduce the runtime. All the possible ways to compile using cython are shown on http://docs.cython.org/src/reference/compilation.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2021-01-21 15:09
Well for cases like those where you are interested in partial sorted indices, there's NumPy's argpartition.

You have the troublesome np.argsort in : w[np.argsort(z)[::-1]][:7], which is essentially w[idx], where idx = np.argsort(z)[::-1][:7].

So, idx could be calculated with np.argpartition, like so -
```
idx = np.argpartition(-z,np.arange(7))[:7]
```
That -z is needed because by default np.argpartition tries to get sorted indices in ascending order. So, to reverse it, we have negated the elements.

Thus, the proposed change in the original code would be :
```
func = w[np.argpartition(-z,np.arange(7))[:7]]
```
Runtime test -
```
In [162]: z = np.random.randint(0,10000000,(1100000)) # Random int array

In [163]: idx1 = np.argsort(z)[::-1][:7]
     ...: idx2 = np.argpartition(-z,np.arange(7))[:7]
     ...: 

In [164]: np.allclose(idx1,idx2) # Verify results
Out[164]: True

In [165]: %timeit np.argsort(z)[::-1][:7]
1 loops, best of 3: 264 ms per loop

In [166]: %timeit np.argpartition(-z,np.arange(7))[:7]
10 loops, best of 3: 36.5 ms per loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...