Why is processing a sorted array not faster than an unsorted array in Python?

后端未结

关注

 5  2249

In this post Why is processing a sorted array faster than random array, it says that branch predicton is the reason of the performance boost in sorted arrays.

But I just

相关标签:

5条回答

傲寒

2021-02-13 10:36
Two reasons:
- Your array size is much too small to show the effect.
- Python has more overhead than C so the effect will be less noticeable overall.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2021-02-13 10:40

Click here to see more answers and similar question. The reason why the performance improves drastically when the data are sorted is that the branch prediction penalty is removed, as explained beautifully in Mysticial's answer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-02-13 10:42

sorted() returns a sorted array rather than sorting in place. You're actually measuring the same array twice.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2021-02-13 10:44

I may be wrong, but I see a fundamental difference between the linked question and your example: Python interprets bytecode, C++ compiles to native code.

In the C++ code that if translates directly to a cmp/jl sequence, that can be considered by the CPU branch predictor as a single "prediction spot", specific to that cycle.

In Python that comparison is actually several function calls, so there's (1) more overhead and (2) I suppose the code that performs that comparison is a function into the interpreter used for every other integer comparison - so it's a "prediction spot" not specific to the current block, which gives the branch predictor a much harder time to guess correctly.

Edit: also, as outlined in this paper, there are way more indirect branches inside an interpreter, so such an optimization in your Python code would probably be buried anyway by the branch mispredictions in the interpreter itself.

0 讨论(0)
发布评论:

提交评论
- 加载中...

时光说笑

2021-02-13 10:49

I ported the original code to Python and ran it with PyPy. I can confirm that sorted arrays are processed faster than unsorted arrays, and that the branchless method also works to eliminate the branch with running time similar to the sorted array. I believe this is because PyPy is a JIT compiler and so branch prediction is happening.

[edit]

Here's the code I used:

import random
import time

def runme(data):
  sum = 0
  start = time.time()

  for i in xrange(100000):
    for c in data:
      if c >= 128:
        sum += c

  end = time.time()
  print end - start
  print sum

def runme_branchless(data):
  sum = 0
  start = time.time()

  for i in xrange(100000):
    for c in data:
      t = (c - 128) >> 31
      sum += ~t & c

  end = time.time()
  print end - start
  print sum

data = list()

for i in xrange(32768):
  data.append(random.randint(0, 256))

sorted_data = sorted(data)
runme(sorted_data)
runme(data)
runme_branchless(sorted_data)
runme_branchless(data)

0 讨论(0)